Investigating the COG ratio as feature for speaker verification on high-effort speech

نویسنده

  • Corinna Harwardt
چکیده

Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%. When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cascading appearance-based features for visual speaker verification

The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we w...

متن کامل

Dynamic visual features for audio-visual speaker verification

The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we w...

متن کامل

Exploring Features for Text-dependent Speaker Verification in Distant Speech Signals

Automatic speaker verification (ASV) is the task of verifying a person’s claimed identity from his/her voice using a digital computer. The existing ASV systems perform with high accuracy of verification when the speech signal is collected close to the mouth of the speaker (< 1 ft). However, the performance of the ASV systems reduces significantly for speech signals collected at a distance from ...

متن کامل

Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by usin...

متن کامل

Voice Activated E-Learning System for the Visually Impaired

E-learning has become an important tool for learners to acquire information and knowledge. However visually impaired people have no or very little access to this tool, since interface suitable to them are unavailable. The Voice Activated E learning System can provide a solution to this problem. Developing this system is meant to assist visually impaired students in learning, a desired subject, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010