Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
نویسندگان
چکیده
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identification systems trained with neutral speech degrades significantly. In order to better understand these differences and to further develop efficient model adaptation and feature compensation methods, this study first analyzes the speaker and phoneme dependency of these differences by a maximum likelihood transformation estimation from neutral speech towards whispered speech. Based on analysis results, this study then considers a feature transformation method in the training phase that leads to a more robust speaker model for speaker ID on whispered speech without using whispered adaptation data from test speakers. Three estimation methods that model the transformation from neutral to whispered speech are applied, including convolutional transformation (ConvTran), constrained maximum likelihood linear regression (CMLLR), and factor analysis (FA). a speech mode independent (SMI) universal background model (UBM) is trained using collected real neutral features and transformed pseudo-whisper features generated with the estimated transformation. Text-independent closed set speaker ID results using the UT-VocalEffort II corpus show performance improvement by using the proposed training framework. The best performance of 88.87% is achieved by using the ConvTran model, which represents a relative improvement of 46.26% compared to the 79.29% accuracy of the GMM-UBM baseline system. This result suggests that synthesizing pseudo-whispered speaker and background training data with the ConvTran model results in improved speaker ID robustness to whispered speech. 2012 Elsevier B.V. All rights reserved.
منابع مشابه
An Entropy based Feature for Whisper-Island Detection within Audio Streams
Non-neutral speech, especially whispered speech, has strong negative impact on speech system performance. It is therefore necessary to detect whisper-islands embedded within neutral speech prior to subsequent processing steps. Detecting whisper-islands in speech audio streams can contribute to improved modeling, speech analysis, and understanding. Speech technology can also benefit by allowing ...
متن کاملAn entropy based feature for whisper-island detection within audio streams
Non-neutral speech, especially whispered speech, has strong negative impact on speech system performance. It is therefore necessary to detect whisper-islands embedded within neutral speech prior to subsequent processing steps. Detecting whisper-islands in speech audio streams can contribute to improved modeling, speech analysis, and understanding. Speech technology can also benefit by allowing ...
متن کاملSpeaker Identification for Whispered Speech Using a Training Feature Transformation from Neutral to Whisper
A number of research studies in speaker recognition have recently focused on robustness due to microphone and channel mismatch(e.g., NIST SRE). However, changes in vocal effort, especially whispered speech, present significant challenges in maintaining system performance. Due to the mismatch spectral structure resulting from the different production mechanisms, performance of speaker identifica...
متن کاملSpeaker identification for whispered speech based on frequency warping and score competition
In certain situations, talkers will intentionally use whisper instead of neutral speech for the sake of privacy or confidentiality, which severely degrades the performance of speaker identification systems trained with only neutral speech. There are considerable differences in the spectral structure between whisper and neutral speech due to an absence of voice harmonic excitation. This study in...
متن کاملAcoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency
Whisper is used by speakers in certain circumstances to protect personal information. Due to the differences in production mechanisms between neutral and whispered speech, there are considerable differences between the spectral structure of neutral and whispered speech, such as formant shifts and shifts in spectral slope. This study analyzes the dependency of these differences on speakers and p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 55 شماره
صفحات -
تاریخ انتشار 2013