MLLR-like speaker adaptation based on linearization of VTLN with MFCC features
نویسندگان
چکیده
In this paper, an MLLR-like adaptation approach is proposed whereby the transformation of the means is performed deterministically based on linearization of VTLN. Biases and adaptation of the variances are estimated statistically by the EM algorithm. In the discrete frequency domain, we show that under certain approximations, frequency warping with Mel-£lterbank-based MFCCs equals a linear transformation in the cepstral domain. Utilizing the deduced linear relationship, the transformation matrix is generated by formant-like peak alignment. Experimental results using children’s speech show improvements over traditional MLLR and VTLN. The improvements occur even with limited amounts of adaptation data.
منابع مشابه
Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion (Lee and Rose, 1998). A linear transform (LT) equivalent for frequency warping (FW) would enable more efficie...
متن کاملUsing VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
In this paper, we propose to combine the rapid adaptation capability of conventional Vocal Tract Length Normalization (VTLN) with the computational efficiency of transform-based adaptation such as MLLR or CMLLR. VTLN requires the estimation of only one parameter and is, therefore, most suited for the cases where there is little adaptation data (i.e. rapid adaptation). In contrast, transform-bas...
متن کاملRapid Speaker Adaptation using Regression-
In this paper, regression-tree based spectral peak alignment is proposed for rapid speaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on bot...
متن کاملRapid speaker adaptation using regression-tree based spectral peak alignment
In this paper, regression-tree based spectral peak alignment is proposed for rapid speaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on bot...
متن کاملSpeaker normalization and speaker adaptation - a combination for conversational speech recognition
Speaker normalization and speaker adaptation are two strategies to tackle the variations from speaker, channel, and environment. The vocal tract length normalization (VTLN) is an e ective speaker normalization approach to compensate for the variations of vocal tract shapes. The Maximum Likelihood Linear Regression(MLLR) is a recent proposed method for speaker-adaptation. In this paper, we propo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005