Prosody-enriched lattices for improved syllable recognition
نویسندگان
چکیده
Automatic recognition of syllables is useful for many spoken language applications such as speech recognition and spoken document retrieval. Short-term spectral properties (such as melfrequency cepstral coefficients, or MFCCs) are usually the features of choice for such systems, which typically ignore suprasegmental (prosodic) cues that manifest themselves at the syllable, word and utterance level. Previous work has shown that categorical representations of prosody correlate well with lexical entities. In this paper, we attempt to exploit this relationship by enriching syllable-level lattices, generated by a standard speech recognizer, with categorical prosodic events for improved syllable recognition performance. With the enriched lattices, we obtain a 2% relative improvement in syllable error rate over the baseline system on a read speech task (the Boston University Radio News Corpus).
منابع مشابه
Using prosody to improve Mandarin automatic speech recognition
In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal sy...
متن کاملA New Model-Based Mandarin-Speech Coding System
In this paper, a new model-based Mandarin-speech coding system is proposed. It employs a prosody-enriched ASR with a hierarchical prosodic model (HPM) to generate from the input speech enriched transcriptions, including linguistic features, prosodic tags and spectral parameters in the encoder. By sending these features to the decoder, we can first reconstruct the prosodic-acoustic features of s...
متن کاملProsody-dependent Acoustic Modeling for Mandarin Speech Recognition
A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and t...
متن کاملProsody Modeling of Spontaneous Mandarin Speech and Its Application to Automatic Speech Recognition
A prosody-assisted ASR approach for spontaneous Mandarin speech is proposed. It employs the joint prosody labeling and modeling algorithm proposed previously to construct a hierarchical prosodic model (HPM) and uses it in two-stage speech recognition. A word lattice is first generated by the HMM method using tri-phone AM and bigram LM. Then, the lattice is extended by replacing LM to a trigram ...
متن کاملModeling Prosody Pattern of Chinese Expressive Speech and Its Application in Personalized Speech Conversion
This paper proposes an approach for modeling prosody patterns of acoustic features of Chinese expressive speech. In a Chinese multi-syllabic prosodic word, a syllable is identified as the core syllable based on the observation that speaker usually puts more emphasis on such syllable. The variations of the acoustic features migrating from neutral to expressive speech are then analyzed for both t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007