A Multi-Modal Transformer network for action detection

نویسندگان

چکیده

This paper proposes a novel multi-modal transformer network for detecting actions in untrimmed videos. To enrich the action features, our utilizes new attention mechanism that computes correlations between different spatial and motion modalities combinations. Exploring such has not been attempted previously. use modality more effectively, we suggest an algorithm corrects distortion caused by camera movement. Such distortion, common videos, severely reduces expressive power of features as optical flow fields. Our proposed outperforms state-of-the-art methods on two public benchmarks, THUMOS14 ActivityNet. We also conducted comparative experiments instructional activity dataset, including large set challenging classroom videos captured from elementary schools.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised Transformer Network for Efficient Face Detection

Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions ...

متن کامل

Driver Mirror-Checking Action Detection Using Multi-Modal Signals

Studies on driver distraction aim to identify features extracted from various sensory signals that can be used to distinguish between normal and distracted driving behaviors. A major challenge in these studies is to determine whether the observed behaviors are associated with the primary driving tasks (checking mirrors, monitoring speed, changing lines) or secondary tasks that deviate the atten...

متن کامل

Complex Event Detection in Multi-Modal Sensor Network

The Global War on Terror (GWOT) presents unique challenges in its intelligence requirements and the need exists to monitor at-risk individuals, groups, and installations. To detect these threats and create actionable intelligence to support expeditionary war fighting, there is a requirement for a network of sensors which can provide the required situational awareness. To that end, the goal of t...

متن کامل

A Computational Framework for Multi-Modal Social Action Identification

We create a computational framework for understanding social action and demonstrate how this framework can be used to build an open-source event detection tool with scalable statistical machine learning algorithms and a subsampled database of over 600 million geo-tagged Tweets from around the world. These Tweets were collected between April 1st, 2014 and April 30th, 2015, most notably when the ...

متن کامل

Multi-modal Diagnostics for Vehicle Fault Detection

On-board vehicle diagnostic systems must have low development and hardware costs in order to be viable. Modelbased methods have shown promise since they use analytical redundancy to reduce costly physical redundancy. However, these methods must also be computationally efficient and function accurately even with simple, low-cost models. The approach presented in this paper uses multiple simple m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2023

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2023.109713