Chapter 16 JOINT AUDIO - VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR 1

نویسندگان

  • Engin Erzin
  • Yücel Yemez
  • A. Murat Tekalp
چکیده

In this chapter, we present our recent results on the multilevel Bayesian decision fusion scheme for multimodal audio-visual speaker identification problem. The objective is to improve the recognition performance over conventional decision fusion schemes. The proposed system decomposes the information existing in a video stream into three components: speech, lip trace and face texture. Lip trace features are extracted based on 2D-DCT transform of the successive active lip frames. The mel-frequency cepstral coefficients (MFCC) of the corresponding speech signal are extracted in parallel to the lip features. The resulting two parallel and synchronous feature vectors are used to train and test a two stream Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Reliability based ordering in multilevel decision fusion is observed to be significantly robust at all SNR

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint audio-video processing for biometric speaker identification

In this paper we present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in speech and video signals of a speaker, but also the crosscorrelation between these two modalities. Lip images extracted for each video fra...

متن کامل

Speaker and Speech recognition by Audio-Visual lip biometrics

This paper proposes a new robust bi-modal audio visual speech and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of speech and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for speech recognition a...

متن کامل

Robust Iris Recognition in Unconstrained Environments

A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by him/her. Iris recognition (IR) is known to be the most reliable and accurate biometric identification system. The iris recognition system (IRS) consists of an automatic segmentation mechanism which is based on the Hough transform (HT). This paper presents a robust IRS i...

متن کامل

Robust Speaker Recognition Biometric System a Detailed Review

his paper reviews Biometric based Speaker Recognition and presents brief about various algorithms and techniques used at various stages of Speaker Recognition and development of Attendance System as application of Speaker Recognition. The research is being carried out in this area for many years. However, the accuracy of system depends upon speaker’s variability and environmental conditions. Va...

متن کامل

Analysis of i-vector framework for speaker identification in TV-shows

Inspired from the Joint Factor Analysis, the I-vector-based analysis has become the most popular and state-of-the-art framework for the speaker verification task. Mainly applied within the NIST/SRE evaluation campaigns, many studies have been proposed to improve more and more performance of speaker verification systems. Nevertheless, while the i-vector framework has been used in other speech pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010