A multi-pass error detection and correction framework for Mandarin LVCSR
نویسندگان
چکیده
We previously proposed a multi-pass framework for Large Vocabulary Continuous Speech Recognition (LVCSR). The objective of this framework is to apply sophisticated linguistic models for recognition, while maintaining a balance between complexity and efficiency. The framework is composed of three passes: initial recognition, error detection and error correction. This paper presents and evaluates a prototype of the multi-pass framework based on Mandarin dictation. In this prototype, the first pass recognizes speech with a well-trained state-of-the-art recognizer incorporating an efficient language model; the second pass detects recognition errors by a new three-step error detection procedure; and the third pass corrects errors detected in those lightly erroneous utterances by a novel error correction approach. The error correction algorithm corrects recognition errors by first creating candidate lists for errors, and then re-ranking the candidates with a combined model of mutual information and trigram. Mandarin dictation experiments show a relative reduction of 4% in character error rate (CER) over the initial recognition performance based on those light erroneous utterances detected.
منابع مشابه
A New Framework for Mandarin Lvcsr Based on One-pass Decoder
This paper describes a new framework based on one-pass and decision tree based class-triphone acoustic modeling for Mandarin LVCSR. Compared with the multi-pass decoder, it should be more knowledgeable and efficient as all sources are used at the same time when the decoder could be well organized and optimized. We give a detail about the organization of our one-pass decoder and how to handle th...
متن کاملUpdate progress of Sinohear: advanced Mandarin LVCSR system at NLPR
NLPR has been with long efforts on Mandarin speech recognition. This paper reports our recent process in this field with several significant novel characteristics: 1) Very large speech databases are used to learn more robust acoustic model; 2) Acoustic model has evolved from non-tonal class-triphone to tonal class-triphone based on tone-embedded decision tree, namely unified tone & triphone mod...
متن کاملIntegrating Multi-level Linguistic Knowledge with a Unified Framework for Mandarin Speech Recognition
To improve the Mandarin large vocabulary continuous speech recognition (LVCSR), a unified framework based approach is introduced to exploit multi-level linguistic knowledge. In this framework, each knowledge source is represented by a Weighted Finite State Transducer (WFST), and then they are combined to obtain a so-called analyzer for integrating multi-level knowledge sources. Due to the unifo...
متن کاملAn Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition
This paper presents an empirical study of word error minimization approaches for Mandarin large vocabulary continuous speech recognition (LVCSR). First, the minimum phone error (MPE) criterion, which is one of the most popular discriminative training criteria, is extensively investigated for both acoustic model training and adaptation in a Mandarin LVCSR system. Second, the word error minimizat...
متن کاملOnline speaker adaptation and tracking for real-time speech recognition
This paper describes a low-latency online speaker adaptation framework. The main objective is to apply fast speaker adaptation to a real-time (RT) large vocabulary continuous speech recognition (LVCSR) engine. In this framework, speaker adaptation is performed on speaker turns generated by online speaker change detection and speaker clustering. To maximize long-term system performance, the adap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006