Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos

نویسندگان

  • Ionut C. Duta
  • Bogdan Ionescu
  • Kiyoharu Aizawa
  • Nicu Sebe
چکیده

Encoding is one of the key factors for building an effective video representation. In the recent works, super vector-based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial information captured from the data. This is critical, especially when dealing with video information. In this work, we propose Spatio-temporal VLAD (ST-VLAD), an extended encoding method which incorporates spatio-temporal information within the encoding process. This is carried out by proposing a video division and extracting specific information over the feature group of each video split. Experimental validation is performed using both hand-crafted and deep features. Our pipeline for action recognition with the proposed encoding method obtains state-of-the-art performance over three challenging datasets: HMDB51 (67.6%), UCF50 (97.8%) and UCF101 (91.5%).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Human Action Recognition using Improved Vector of Locally Aggregated Descriptors

Recently, two high-dimensional encoding techniques for human action recognition, namely, Fisher vector (FV) and vector of locally aggregated descriptors (VLAD), are widely employed. In this study, a new human action recognition approach using improved VLAD with localized soft assignment (LSA) and second-order statistics is proposed. When encoding videos into VLAD, instead of considering only th...

متن کامل

Extreme Learning Machine for Large-Scale Action Recognition

In this paper, we describe the method we applied for the action recognition task on the THUMOS 2014 challenge dataset. We study human action recognition in RGB videos through low-level features by focusing on improved trajectory features that are densely extracted from the spatio-temporal volume. We represent each video with Fisher vector encoding and additional mid-level feautures. Finally, we...

متن کامل

Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition

Human skeleton joints are popular for action analysis since they can be easily extracted from videos to discard background noises. However, current skeleton representations do not fully benefit from machine learning with CNNs. We propose “Skepxels” a spatio-temporal representation for skeleton sequences to fully exploit the “local” correlations between joints using the 2D convolution kernels of...

متن کامل

Human Action Recognition Based on 3D Edge Oriented Gradient Histogram of Slide Blocks

In this paper, a new feature called 3D edge oriented gradient histogram of slide blocks is proposed for human action recognition, based on the idea that the slide area of human body edge can be seen as a spatio-temporal silhouette surface when human performing a certain action in video. This feature is processed by defining dense 3D spatio-temporal slide blocks on the spatio-temporal silhouette...

متن کامل

Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition

We address the problem of generating video features for action recognition. The spatial pyramid and its variants have been very popular feature models due to their success in balancing spatial location encoding and spatial invariance. Although it seems straightforward to extend spatial pyramid to the temporal domain (spatio-temporal pyramid), the large spatio-temporal diversity of unconstrained...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017