Noise Robust Speech Recognition Using Prosodic Information

نویسندگان

Koji Iwano

Takahiro Seki

Sadaoki Furui

چکیده

This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency (F0) contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using the Hough transform, which achieves high extraction accuracy under various noise environments. Then it proposes a robust speech recognition method using syllable HMMs which model both segmental spectral features and F0 contours. We use two prosodic features combined with ordinary cepstral parameters: a derivative of the time function of log F0 (∆ log F0) and a maximum accumulated voting value of the Hough transform representing a measure of F0 continuity. Speaker-independent experiments were conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. It was confirmed that both prosodic features improve the recognition accuracy in all noise conditions, and the effects are additive. When using both prosodic features, the best absolute improvement of digit accuracy is about 4.5%. This improvement was achieved by improving the digit boundary detection by using the robust prosodic information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noise Robust Speech Recognitio Extracted by Hough Tr

This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, fundamental frequency (F0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using Hough transform, which achieves high extraction rates ...

متن کامل

Noise robust speech recognition using F0 contour extracted by hough transform

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Speaker Recognition System Using the Prosodic Information

In the most of speaker recognition system, speaker's characteristics is extracted from acoustic parameter by speech analysis and we make speaker's reference pattern. we obtain more exact performance by using dynamic characteristic and constant characteristic by speaking habit. Therefore we suggest following to solve this problem. First thing is using prosodic information by characteristic vecto...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Noise Robust Speech Recognition Using Prosodic Information

نویسندگان

چکیده

منابع مشابه

Noise Robust Speech Recognitio Extracted by Hough Tr

Noise robust speech recognition using F0 contour extracted by hough transform

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Speaker Recognition System Using the Prosodic Information

Improving the performance of MFCC for Persian robust speech recognition

عنوان ژورنال:

اشتراک گذاری