Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

نویسندگان

چکیده

Figure skating scoring is challenging because it requires judging players’ technical moves as well coordination with the background music. Most learning-based methods struggle for two reasons: 1) each move in figure changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially 3 to 5 minutes lasting videos; 2) prior rarely considered critical audio-visual relationship their models. Due these reasons, we introduce novel architecture, named Skating-Mixer. It extends MLP framework into multimodal fashion and effectively learns long-term representations through our designed memory recurrent unit (MRU). Aside from model, collected high-quality FS1000 dataset, which contains over 1000 videos on 8 types programs 7 different rating metrics, overtaking other datasets both quantity diversity. Experiments show proposed method achieves SOTAs all major metrics public Fis-V dataset. In addition, include an analysis recent competitions Beijing 2022 Winter Olympic Games, proving has strong applicability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EEG responses to long-term audio-visual stimulation.

In this study, linear and nonlinear electroencephalogram (EEG) changes due to long-term audio-visual stimulation (AVS) were investigated. In the course of 2 months, 25 repetitions of a 20-min AVS program with stimulation frequencies in the range 2-18 Hz were applied to six healthy volunteers. EEG data were recorded from six head locations during relaxed wakefulness prior to AVS. Then linear spe...

متن کامل

Roller skating--is it a dangerous sport?

A prospective survey of 111 cases of roller skating injuries within one year are reported. Males were more commonly injured than females. There was a high incidence (86%) of serious injuries, 28% of which required surgical treatment. The wrist (23%) was the commonest region involved, followed by the shoulder (20%), the elbow (15%) and the ankle (12%). Collision with other skaters and loss of co...

متن کامل

Modeling Audio and Visual Cues

Audio-visual event detection aims to identify semantically defined events that reveal human activities. Most previous literature focused on restricted highlight events, and depended on highly ad-hoc detectors for these events. This research emphasizes generalizable robust modeling of single-microphone audio cues and/or single-camera visual cues for the detection of real-world events, requiring ...

متن کامل

Long-Term Reverberation Modeling for Under-Determined Audio Source Separation with Application to Vocal Melody Extraction

In this paper, we present a way to model long-term reverberation effects in under-determined source separation algorithms based on a non-negative decomposition framework. A general model for the sources affected by reverberation is introduced and update rules for the estimation of the parameters are presented. Combined with a wellknown source-filter model for singing voice, an application to th...

متن کامل

Learning to score and summarize figure skating sport videos

This paper focuses on fully understanding the figure skating sport videos. In particular, we present a large-scale figure skating sport video dataset, which include 500 figure skating videos. On average, the length of each video is 2 minute and 50 seconds. Each video is annotated by three scores from nine different referees, i.e., Total Element Score(TES), Total Program Component Score (PCS), a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i3.25392