On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones

نویسندگان

  • Yanhui Tu
  • Jun Du
  • Lei Sun
  • Feng Ma
  • Chin-Hui Lee
چکیده

We design a novel deep learning framework for multi-channel speech recognition in two aspects. First, for the front-end, an iterative mask estimation (IME) approach based on deep learning is presented to improve the beamforming approach based on the conventional complex Gaussian mixture model (CGMM). Second, for the back-end, deep convolutional neural networks (DCNNs), with augmentation of both noisy and beamformed training data, are adopted for acoustic modeling while the forward and backward long short-term memory recurrent neural networks (LSTM-RNNs) are used for language modeling. The proposed framework can be quite effective to multi-channel speech recognition with random combinations of fixed microphones. Testing on the CHiME-4 Challenge speech recognition task with a single set of acoustic and language models, our approach achieves the best performance of all three tracks (1channel, 2-channel, and 6-channel) among submitted systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Beamforming and Data Augmentation for Robust Speech Recognition: Results of the 4th CHiME Challenge

Robust automatic speech recognition in adverse environments is a challenging task. We address the 4 CHiME challenge [1] multi-channel tracks by proposing a deep eigenvector beamformer as front-end. To train the acoustic models, we propose to supplement the beamformed data by the noisy audio streams of the individual microphones provided in the real set. Furthermore, we perform data augmentation...

متن کامل

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR t...

متن کامل

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend

This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front-end speech enhancement to the language modeling. Three different types of beamforming are used...

متن کامل

The I2R system for CHiME-4 challenge

The industrial applications of speech recognition have seen a shifting from closed talk microphones to daily real life scenarios thanks to booming developments in robotic and artificial intelligence (AI) areas. The task, however, is remained challenging due to the effects of attenuation, noise, distortion, and reverberation. Following the success of the CHiME-3 Challenge which attracted 25 inte...

متن کامل

Multi-Channel Speech Recognition: LSTMs All the Way Through

Long Short-Term Memory recurrent neural networks (LSTMs) have demonstrable advantages on a variety of sequential learning tasks. In this paper we demonstrate an LSTM “triple threat” system for speech recognition, where LSTMs drive the three main subsystems: microphone array processing, acoustic modeling, and language modeling. This LSTM trifecta is applied to the CHiME-4 distant recognition cha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017