Noise Robust Speech Recognition Using Prosodic Information
نویسندگان
چکیده
This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency (F0) contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using the Hough transform, which achieves high extraction accuracy under various noise environments. Then it proposes a robust speech recognition method using syllable HMMs which model both segmental spectral features and F0 contours. We use two prosodic features combined with ordinary cepstral parameters: a derivative of the time function of log F0 (∆ log F0) and a maximum accumulated voting value of the Hough transform representing a measure of F0 continuity. Speaker-independent experiments were conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. It was confirmed that both prosodic features improve the recognition accuracy in all noise conditions, and the effects are additive. When using both prosodic features, the best absolute improvement of digit accuracy is about 4.5%. This improvement was achieved by improving the digit boundary detection by using the robust prosodic information.
منابع مشابه
Noise Robust Speech Recognitio Extracted by Hough Tr
This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, fundamental frequency (F0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using Hough transform, which achieves high extraction rates ...
متن کاملNoise robust speech recognition using F0 contour extracted by hough transform
This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, fundamental frequency (F0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using Hough transform, which achieves high extraction rates ...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSpeaker Recognition System Using the Prosodic Information
In the most of speaker recognition system, speaker's characteristics is extracted from acoustic parameter by speech analysis and we make speaker's reference pattern. we obtain more exact performance by using dynamic characteristic and constant characteristic by speaking habit. Therefore we suggest following to solve this problem. First thing is using prosodic information by characteristic vecto...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003