Speaking Rate Compensation Based in Acoustic Model Trainin

نویسندگان

Kozo Okuda

Tatsuya Kawahara

Satoshi Nakamura

چکیده

In this paper, we propose a speaking rate compensation method using frame period and frame length adaptation. Our method decodes an input utterance using several sets of frame period and frame length parameters for speech analysis. Then, this method selects the best set with the highest score which consists of the acoustic likelihood normalized by frame period, language likelihood and insertion penalty. Furthermore, we apply this approach to the training of the acoustic model. We calculate the acoustic likelihood for each frame period and frame length using Viterbi alignment and select the best one for each training utterance. The proposed speaking rate compensation applied to both the acoustic model creation process and decoding process resulted in accuracy improvement of 2.9% (absolute) for spontaneous lecture speech recognition task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Speaking rate effects in a landmark-based phonetic exemplar model

In this study we describe a model of speech perception in which neither speaking rate nor lower level temporal cues are considered explicitly. Instead, newly encountered speech signals are encoded as sequences of detailed acoustic events specified in real time at salient landmarks and compared directly with previously heard patterns. When presented with obstruent-vowel sequences occurring in th...

متن کامل

A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models

We conducted a comparative analytic study on the contextdependent Gaussian mixture hiddenMarkov model (CD-GMMHMM) and deep neural network hidden Markov model (CDDNN-HMM) with respect to the phone discrimination and the robustness performance. We found that the DNN can significantly improve the phone recognition performance for every phoneme with 15.6% to 39.8% relative phone error rate reductio...

متن کامل

Modeling speaking rate for voice fonts

Voice fonts are created and stored for a speaker, to be used to synthesize speech in the speaker’s voice. The most important descriptors of voice fonts are spectral envelope for acoustic units and prosodic features such as fundamental frequency and average speaking rate. In this paper, we present a new approach to model the speaking rate so that it can be easily incorporated in voice fonts and ...

متن کامل

Duration modeling using cumulative duration probability and speaking rate compensation

A duration modeling scheme and a speaking rate compensation technique are presented for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability also can be used to obtain the duration bounds for the bounded duration modeling. One of the advantages of proposed technique is that the cumulative d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Speaking Rate Compensation Based in Acoustic Model Trainin

نویسندگان

چکیده

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speaking rate effects in a landmark-based phonetic exemplar model

A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models

Modeling speaking rate for voice fonts

Duration modeling using cumulative duration probability and speaking rate compensation

عنوان ژورنال:

اشتراک گذاری