Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environments for Multi-stream Automatic Speech Recognition

نویسندگان

  • Michael Lee Shire
  • David Wessel
  • Jitendra Malik
چکیده

Automatic Speech Recognition (ASR) still poses a problem to researchers. In particular, most ASR systems have not been able to fully handle adverse acoustic environments. Although a large number of modi cations have resulted in increased levels of performance robustness, ASR systems still fall short of human recognition ability in a large number of environments. A possible shortcoming of the typical ASR system is the reliance on a single stream of front-end acoustic features and acoustic modeling feature probabilities. A single front-end feature extraction algorithm may not be capable of maintaining robustness to arbitrary acoustic environments. Acoustic modeling will also degrade due to distributional changes caused by the acoustic environment. This thesis explores the parallel use of multiple front-end and acoustic modeling elements to improve upon this shortcoming. Each ASR acoustic modeling component is trained to estimate class posterior probabilities in a particular acoustic environment. In addition to discriminative training of the probability estimator, existing feature extraction algorithms are modi ed in such a way as to improve class discrimination in the training environment. More speci cally, Linear Discriminant Analysis provides a mechanism for obtaining discriminant temporal basis functions that can replace components of the existing algorithms that were designed in either an empirical or intuitive manner. Probability streams are generated using multiple front-end acoustic modeling stages trained to heterogeneous acoustic environments. In new sample acoustic environments, simple combinations of these probability streams give rise to word recognition rates that are superior to the individual streams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-stream ASR trained with heterogeneous reverberant environments

A common problem with current automatic speech recognition (ASR) systems is that the performance degrades when it is presented with speech from a different acoustic environment than the one used during training. An important cause is that the feature distribution to which the ASR system is trained no longer matches that of a new environment. Reverberant environments can be especially harmful. I...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Matching the Acoustic Model to Front-End Signal Processing for ASR in Noisy and Reverberant Environments

Distant-talking automatic speech recognition (ASR) represents an extremely challenging task. The major reason is that unwanted additive interference and reverberation are picked up by the microphones besides the desired signal. A hands-free human-machine interface should therefore comprise a powerful acoustic preprocessing unit in line with a robust ASR back-end. However, since perfect speech e...

متن کامل

Towards Natural Acoustic Interfaces for Automatic Speech Recognition

Aiming at ’natural’ hands-free acoustic human/machine interfaces, the need for according distant-talking automatic speech recognition (ASR) systems increases and presents us with major signal processing challenges at the acoustic front-end. Considering interactive TV as a challenging exemplary application scenario, we investigate the structural problems presented by noisy and reverberant multi-...

متن کامل

Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks

Modeling heterogeneous data sources remains a fundamental challenge of acoustic modeling in speech recognition. We call this the multi-condition problem because the speech data come from many different conditions. In this paper, we introduce the fundamental confusability problem in multi-condition learning, then discuss the problem formalization, the taxonomy, and the architectures for multi-co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000