Modeling and efficient decoding of large vocabulary conversational speech
نویسندگان
چکیده
Capturing the large variability of conversational speech in the framework of purely phone based speech recognizers is virtually impossible. It has been shown earlier that suprasegmental features such as speaking rate, duration and syllabic, syntactic and semantic structure are important predictors of pronunciation variation. In order to allow for a tighter coupling of these predictors of pronunciation, duration and acoustic modeling a new recognition toolkit has been developed. The phonetic transcription of speech has been generalized to an attribute based representation, thus enabling the integration of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute transcription to mark possible pronunciation e ects which are then taken into account by the acoustic model induction algorithm. A nite state machine single-pre x-tree, one-pass, time-synchronous decoder is presented that efciently decodes highly spontaneous speech within this new representational framework.
منابع مشابه
Rate-of-speech Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملRate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملModeling word-level rate-of-speech variation in large vocabulary conversational speech recognition
Variations in rate of speech (ROS) produce variations in both spectral features and word pronunciations that affect automatic speech recognition systems. To deal with these ROS effects, we propose to use a set of parallel rate-specific acoustic and pronunciation models. Rate switching is permitted at word boundaries, to allow within-sentence speech rate variation, which is common in conversatio...
متن کاملEnhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...
متن کاملEffective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition
We investigate several variants of speech-rate-dependent acoustic models for large-vocabulary conversational speech recognition, in the framework of combining rate-specific models in decoding to compensate for speech rate variation. We study two basic approaches to combining rate-specific models: one combines models at the pronunciation level and the other at the HMM state level. Furthermore, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999