A Multi-Modal Transformer network for action detection
نویسندگان
چکیده
This paper proposes a novel multi-modal transformer network for detecting actions in untrimmed videos. To enrich the action features, our utilizes new attention mechanism that computes correlations between different spatial and motion modalities combinations. Exploring such has not been attempted previously. use modality more effectively, we suggest an algorithm corrects distortion caused by camera movement. Such distortion, common videos, severely reduces expressive power of features as optical flow fields. Our proposed outperforms state-of-the-art methods on two public benchmarks, THUMOS14 ActivityNet. We also conducted comparative experiments instructional activity dataset, including large set challenging classroom videos captured from elementary schools.
منابع مشابه
Supervised Transformer Network for Efficient Face Detection
Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions ...
متن کاملDriver Mirror-Checking Action Detection Using Multi-Modal Signals
Studies on driver distraction aim to identify features extracted from various sensory signals that can be used to distinguish between normal and distracted driving behaviors. A major challenge in these studies is to determine whether the observed behaviors are associated with the primary driving tasks (checking mirrors, monitoring speed, changing lines) or secondary tasks that deviate the atten...
متن کاملComplex Event Detection in Multi-Modal Sensor Network
The Global War on Terror (GWOT) presents unique challenges in its intelligence requirements and the need exists to monitor at-risk individuals, groups, and installations. To detect these threats and create actionable intelligence to support expeditionary war fighting, there is a requirement for a network of sensors which can provide the required situational awareness. To that end, the goal of t...
متن کاملA Computational Framework for Multi-Modal Social Action Identification
We create a computational framework for understanding social action and demonstrate how this framework can be used to build an open-source event detection tool with scalable statistical machine learning algorithms and a subsampled database of over 600 million geo-tagged Tweets from around the world. These Tweets were collected between April 1st, 2014 and April 30th, 2015, most notably when the ...
متن کاملMulti-modal Diagnostics for Vehicle Fault Detection
On-board vehicle diagnostic systems must have low development and hardware costs in order to be viable. Modelbased methods have shown promise since they use analytical redundancy to reduce costly physical redundancy. However, these methods must also be computationally efficient and function accurately even with simple, low-cost models. The approach presented in this paper uses multiple simple m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2023
ISSN: ['1873-5142', '0031-3203']
DOI: https://doi.org/10.1016/j.patcog.2023.109713