Audiovisual Fusion with Segment Models for Video Structure Analysis

نویسندگان

  • M. Delakis
  • G. Gravier
  • P. Gros
چکیده

Hidden Markov Models provide a powerful framework for bridging the semantic gap between low-level video features and high-level user needs by taking full advantage of our prior knowledge on the video structure. A serious flaw of HMMs is that they require all the modalities of a video document to be strictly synchronous before their fusion. Taking as a case study tennis broadcasts analysis, we introduce video indexing using Segment Models, a generalization of Hidden Markov Models, where the fusion of different modalities can be performed in a more flexible way. Operating essentially as a layered topology they allow the fusion of asynchronous modalities but do not rely on synchronization points fixed a priori. They also facilitate the fusion of audio models of high-level semantics, like the content of a complete scene, on top of the raw lowlevel audio frames. Segment Models provide encouraging experimental results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audiovisual Attention Modeling and Salient Event Detection

Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. The focus here is to e...

متن کامل

Audiovisual Information Fusion in Human-Computer Interfaces and Intelligent Environments: A Survey

Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent systems with audio-visual sensors should be capable of achieving similar goals. The audio-visual i...

متن کامل

Audio-visual mutual dependency models for biometric liveness checks

In this paper we propose liveness checking technique for multimodal biometric authentication systems based on audiovisual mutual dependency models. Liveness checking ensures that biometric cues are acquired from a live person who is actually present at the time of capture for authenticating the identity. The liveness check based on mutual dependency models is performed by fusion of acoustic and...

متن کامل

Dynamique temporelle du liage dans la fusion de la parole audiovisuelle (Temporal dynamics of binding in audiovisual speech fusion) [in French]

________________________________________________________________________________________________________ Temporal dynamics of binding in audiovisual speech fusion The McGurk effect demonstrates the phenomenon of audiovisual fusion: a sound "ba" mounted on a video "ga" is often perceived as "da". In a previous work we showed that audiovisual fusion might be modulated by a precedent binding proce...

متن کامل

The AIT Multimodal Person Identification System for CLEAR 2007

This paper presents the person identification system developed at Athens Information Technology and its performance in the CLEAR 2007 evaluations. The system operates on the audiovisual information (speech and faces) collected over the duration of gallery and probe videos. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem. Audio recognition is based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005