LPR: learning point-level temporal action localization through re-training

نویسندگان

چکیده

Abstract Point-level temporal action localization (PTAL) aims to locate instances in untrimmed videos with only one timestamp annotation for each instance. Existing methods adopt the localization-by-classification paradigm boundaries class activation map (TCAM) by thresholding, also known as TCAM-based method. However, are limited gap between classification and tasks, since TCAM is generated a network. To address this issue, we propose re-training framework PTAL task, LPR. This consists of two stages: pseudo-label generation re-training. In stage, feature embedding module based on transformer encoder capture global context features optimize pseudo-labels’ quality leveraging point-level annotations. LPR uses above pseudo-labels supervision network rather than generating TCAMs. Furthermore, alleviate effects label noise pseudo-labels, joint learning (JLCM) stage. contains sub-modules that simultaneously predict categories guided jointly determined clean set training. The proposed achieves state-of-the-art performance both THUMOS’14 BEOID datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Re ning Action Theories through

Reasoning about actions and changes often starts with an action theory which is then used for planning, prediction or explanation. In practice it is sometimes not simple to give an immediately available action theory. In this paper we will present an abductive methodology for describing action domains. We start with an action theory which is not complete, i.e., has more than one model. Then, af...

متن کامل

Exploring Temporal Preservation Networks for Precise Temporal Action Localization

Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action b...

متن کامل

Reducing backward masking through action game training.

Action video game play enhances basic visual skills such as crowding acuity and contrast sensitivity (C. S. Green & D. Bavelier, 2007; R. Li, U. Polat, W. Makous, & D. Bavelier, 2009). Here, we ask whether the dynamics of perception may also be altered as a result of playing action games. A backward masking paradigm was used to test the hypothesis that action video game play also alters the tem...

متن کامل

Learning Physiotherapy through Virtual Action

We describe a research framework for virtualizing documented physiotherapy instructions. Our approach bridges the gap between human understanding and the written manuals of instructions for physiotherapy. Techniques of Natural Language Processing involving semantic and spatial information processing are important in this approach. We have also explained the physiotherapy considerations that we ...

متن کامل

Robot training through incremental learning

The real world is too complex and variable to directly program an autonomous ground robot’s control system to respond to the inputs from its environmental sensors such as LIDAR and video. The need for learning incrementally, discarding prior data, is important because of the vast amount of data that can be generated by these sensors. This is crucial because the system needs to generate and upda...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Multimedia Systems

سال: 2023

ISSN: ['1432-1882', '0942-4962']

DOI: https://doi.org/10.1007/s00530-023-01128-4