Rapid vocal tract length normalization using maximum likelihood estimation

نویسندگان

Tadashi Emori

Koichi Shinoda

چکیده

Recently, vocal tract length normalization (VTLN) techniques have been developed for speaker normalization in speech recognition. This paper proposes a new VTLN method, in which the vocal tract length is normalized in the cepstrum space by means of linear mapping whose parameter is derived using maximumlikelihood estimation. The computational costs of this method are much lower than that of such conventional methods as ML-VTLN, in which the parameter for mapping is selected from among several parameters. Further, the new method offers greater precision in determining parameters for individual speakers. Experimental use of the method resulted in an error reduction rate of 7.1%. A combination of the proposed method with cepstrum mean normalization (CMN) method was also examined and found to reduce the error rate even more, by 14.6%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian Approach to Estimation of Speaker Normalization Parameters

In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special typ...

متن کامل

Estimating VTLN warping factors by distribution matching

Several methods exist for estimating the warping factors for vocal tract length normalization (VTLN), most of which rely on an exhaustive search over the warping factors to maximize the likelihood of the adaptation data. This paper presents a method for warping factor estimation that is based on matching Gaussian distributions by Kullback-Leibler divergence. It is computationally more efficient...

متن کامل

Efficient pitch-based estimation of VTLN warp factors

To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...

متن کامل

Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech

The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a...

متن کامل