End-to-End Learning of Motion Representation for Video Understanding

نویسندگان

Lijie Fan

Wenbing Huang

Chuang Gan

Stefano Ermon

Boqing Gong

Junzhou Huang

چکیده

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond exact optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach. Our TVNet achieves better accuracies than all compared methods, while being competitive with the fastest counterpart in terms of features extraction time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action Change Detection in Video Based on HOG

Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...

متن کامل

Comparative Textbook Evaluation: Representation of Learning Objectives in Locally and Internationally Published ELT Textbooks

The present study evaluated the learning objectives represented in the recent Iranian nation-wide ELT textbooks, i.e. Prospect and Vision series, and compared them to those in the internationally-published textbook of Four Corners. To this end, Bloom’s revised taxonomy of learning objectives was utilized as the analytical framework to scrutinize the tasks and exercises of the textbooks using a ...

متن کامل

Learning End-to-end Video Classification with Rank-Pooling

We introduce a new model for representation learning and classification of video sequences. Our model is based on a convolutional neural network coupled with a novel temporal pooling layer. The temporal pooling layer relies on an inner-optimization problem to efficiently encode temporal semantics over arbitrarily long video clips into a fixed-length vector representation. Importantly, the repre...

متن کامل

Video Subject Inpainting: A Posture-Based Method

Despite recent advances in video inpainting techniques, reconstructing large missing regions of a moving subject while its scale changes remains an elusive goal. In this paper, we have introduced a scale-change invariant method for large missing regions to tackle this problem. Using this framework, first the moving foreground is separated from the background and its scale is equalized. Then, a ...

متن کامل

End-to-end Video-level Representation Learning for Action Recognition

From the frame/clip-level feature learning to the videolevel representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on. In this paper, we build upon two-stream ConvN...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

End-to-End Learning of Motion Representation for Video Understanding

نویسندگان

چکیده

منابع مشابه

Action Change Detection in Video Based on HOG

Comparative Textbook Evaluation: Representation of Learning Objectives in Locally and Internationally Published ELT Textbooks

Learning End-to-end Video Classification with Rank-Pooling

Video Subject Inpainting: A Posture-Based Method

End-to-end Video-level Representation Learning for Action Recognition

عنوان ژورنال:

اشتراک گذاری