Human audio-visual consonant recognition analyzed with three bimodal integration models

نویسندگان

Zhanyu Ma

Arne Leijon

چکیده

With A-V recordings, ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent Advances in the Automatic Recognition of Audio-Visual Speech

Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design,...

متن کامل

Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables

Subjects naturally integrate auditory and visual information in bimodal speech perception. To assess the robustness of the integration process, the relative onset time of the audible and visible sources was systematically varied. In the first experiment, bimodal syllables composed of the auditory and visible sy l l ab le s /ba / a n d / d a / w e r e present at five different onset asynchronies...

متن کامل

Joint audio-visual speech processing for recognition and enhancement

Visual speech information present in the speaker’s mouth region has long been viewed as a source for improving the robustness and naturalness of human-computer-interfaces (HCI). Such information can be particularly crucial in realistic HCI environments, where the acoustic channel is corrupted, and as a result, the performance of traditional automatic speech recognition (ASR) systems falls below...

متن کامل

The effect of visual information on word initial consonant perception of dysarthric speech

Disabled individuals will realize many benefits from automatic speech recognition. To date, most automatic speech recognition research has focused on normal speech. However, many individuals with physical disabilities also exhibit speech disorders. While limited research has been conducted focusing on dysarthric speech recognition, the preliminary results indicate that additional study is neces...

متن کامل

Bimodal Emotion Recognition

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing emotions. This paper describes the cha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Human audio-visual consonant recognition analyzed with three bimodal integration models

نویسندگان

چکیده

منابع مشابه

Recent Advances in the Automatic Recognition of Audio-Visual Speech

Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables

Joint audio-visual speech processing for recognition and enhancement

The effect of visual information on word initial consonant perception of dysarthric speech

Bimodal Emotion Recognition

عنوان ژورنال:

اشتراک گذاری