Integration of acoustic and articulatory information with application to speech recognition

نویسندگان

  • Ka-Yee Leung
  • Man-Hung Siu
چکیده

In speech recognition, fusion of multiple systems often results in improved recognition accuracy or robustness. All the previously suggested system fusions mainly focused on the recognition process. Training, on the other hand, are performed independently across different systems. In this paper, we investigated the combination of a Mel frequency cepstral coefficients (MFCC) based acoustic feature (ACF) system and an articulatory feature (AF) based system. In addition to proposing an asynchronous combination during the recognition process that makes the state combination more flexible during recognition, we proposed an efficient combination approach during the model training stage. We show that combining the models during training not only improved performance but also simplified fusion process during recognition. Because fusion during training removes inconsistency between the individual models, such as in state or phoneme alignments, it is particularly useful for highly constrained recognition fusion such as synchronous models combination. Comparing fusion of separately trained AF and ACF systems, fusion of jointly trained AF and ACF models resulted in more than 3% absolute phoneme recognition error reduction on the TIMIT corpus for synchronous and 1% for asynchronous combination. 2003 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Integration of articulatory dynamic parameters in HMM/BN based speech recognition system

In this paper, we describe several approaches to integration of the articulatory dynamic parameters along with articulatory position data into a HMM/BN model based automatic speech recognition system. This work is a continuation of our previous study, where we have successfully combined speech acoustic features in form of MFCC with articulatory position observations. Articulatory dynamic parame...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Acoustic feature combination for speech recognition

In this thesis, the use of multiple acoustic features of the speech signal is considered for speech recognition. The goals of this thesis are twofold: on the one hand, new acoustic features are developed, on the other hand, feature combination methods are investigated in order to find an effective integration of the newly developed features into state-of-the-art speech recognition systems. The ...

متن کامل

Combining acoustic and articulatory feature information for robust speech recognition

The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...

متن کامل

Integrating Articulatory Features into Acoustic Models for Speech Recognition

It is often assumed that acoustic-phonetic or articulatory features can be beneficial for automatic speech recognition (ASR), e.g. because of their supposedly greater noise robustness or because they provide a more convenient interface to higher-level components of ASR systems such as pronunciation modeling. However, the success of these features when used as an alternative to standard acoustic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Information Fusion

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2004