Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

نویسندگان

  • Zibo Meng
  • Shizhong Han
  • Ping Liu
  • Yan Tong
چکیده

It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network (DBN) is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. A pilot audiovisual AU-coded database has been collected to evaluate the proposed framework, which consists of a “clean” subset containing frontal faces under well controlled circumstances and a challenging subset with large head movements and occlusions. Experiments on this database have demonstrated that the proposed framework yields significant improvement in recognizing speech-related AUs compared to the state-of-the-art visual-based methods especially for those AUs whose visual observations are impaired during speech, and more importantly also outperforms feature-level fusion methods by explicitly modeling and exploiting physiological relationships between AUs and phonemes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audiovisual Facial Action Unit Recognition using Feature Level Fusion

Recognizing facial actions is challenging, especially when they are accompanied with speech. Instead of employing information solely from the visual channel, this work aims to exploit information from both visual and audio channels in recognizing speech-related facial action units (AUs). In this work, two feature-level fusion methods are proposed. The first method is based on a kind of human-cr...

متن کامل

A multilevel fusion approach for audiovisual emotion recognition

The human computer interaction will be more natural if computers are able to perceive and respond to human nonverbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, improve the accuracy and robustness of the emotion recognition system. This p...

متن کامل

A simplified audiovisual fusion model with application to large-vocabulary recognition of French Canadian speech Un modèle simplifié de fusion audio-visuelle avec application à la reconnaissance à grand vocabulaire du français canadien parlé

A new, simple and practical way of fusing audio and visual information to enhance audiovisual automatic speech recognition within the framework of an application of large-vocabulary speech recognition of French Canadian speech is presented, and the experimental methodology is described in detail. The visual information about mouth shape is extracted off-line using a cascade of weak classifiers ...

متن کامل

Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information Criterion

A key preprocessing step in multimodal interfaces is to detect when a user is speaking to the system. While push-to-talk approaches are effective, its use limits the flexibility of the system. Solutions based on speech activity detection (SAD) offer more intuitive and user-friendly alternatives. A limitation in current SAD solutions is the drop in performance observed in noisy environments or w...

متن کامل

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Extensive efforts have been devoted to recognizing facial action units (AUs). However, it is still challenging to recognize AUs from spontaneous facial displays especially when they are accompanied with speech. Different from all prior work that utilized visual observations for facial AU recognition, this paper presents a novel approach that recognizes speech-related AUs exclusively from audio ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.10197  شماره 

صفحات  -

تاریخ انتشار 2017