Speaking Rate Compensation Based in Acoustic Model Trainin
نویسندگان
چکیده
In this paper, we propose a speaking rate compensation method using frame period and frame length adaptation. Our method decodes an input utterance using several sets of frame period and frame length parameters for speech analysis. Then, this method selects the best set with the highest score which consists of the acoustic likelihood normalized by frame period, language likelihood and insertion penalty. Furthermore, we apply this approach to the training of the acoustic model. We calculate the acoustic likelihood for each frame period and frame length using Viterbi alignment and select the best one for each training utterance. The proposed speaking rate compensation applied to both the acoustic model creation process and decoding process resulted in accuracy improvement of 2.9% (absolute) for spontaneous lecture speech recognition task.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملSpeaking rate effects in a landmark-based phonetic exemplar model
In this study we describe a model of speech perception in which neither speaking rate nor lower level temporal cues are considered explicitly. Instead, newly encountered speech signals are encoded as sequences of detailed acoustic events specified in real time at salient landmarks and compared directly with previously heard patterns. When presented with obstruent-vowel sequences occurring in th...
متن کاملA comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models
We conducted a comparative analytic study on the contextdependent Gaussian mixture hiddenMarkov model (CD-GMMHMM) and deep neural network hidden Markov model (CDDNN-HMM) with respect to the phone discrimination and the robustness performance. We found that the DNN can significantly improve the phone recognition performance for every phoneme with 15.6% to 39.8% relative phone error rate reductio...
متن کاملModeling speaking rate for voice fonts
Voice fonts are created and stored for a speaker, to be used to synthesize speech in the speaker’s voice. The most important descriptors of voice fonts are spectral envelope for acoustic units and prosodic features such as fundamental frequency and average speaking rate. In this paper, we present a new approach to model the speaking rate so that it can be easily incorporated in voice fonts and ...
متن کاملDuration modeling using cumulative duration probability and speaking rate compensation
A duration modeling scheme and a speaking rate compensation technique are presented for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability also can be used to obtain the duration bounds for the bounded duration modeling. One of the advantages of proposed technique is that the cumulative d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002