Separation of Desired Speech from Interfering Speech Reverberating in a Room
نویسنده
چکیده
A new algorithm has been developed to achieve the separation of desired speech from interfering speech reverberating in a room. The algorithm is developed by viewing this problem as a two-data channel adaptive noise cancellation problem. Using a least squares error criteria in the adaptive noise canceler approach leads to classical system identification problem. The solution estimates the room reverberant system and restores the desired speech by filtering the interfering speech with the estimated filter. The system identification problem is a well-known research field; however, this thesis project is characterized by two major features. One is the fact that it deals with a system involving a large number ( more than 1000 ) of parameters to be estimated. The other is that it handles non-stationary speech data signals, rather than just stationary noise signal. Of the many available techniques for system identification, a spectral analysis estimation method is best suited to this particular problem by virtue of its efficient computation, using the FFT, and the flexibility for applying it to any type of room. The proposed new algorithm is a spectral analysis estimation method, a modified technique for dealing with non-stationary speech signal. The Maximum Likelihood estimation technique directly derives this new algorithm by assuming a Gaussian colored noise for the desired speech, and by allowing the colored noise power spectral density to change frame by frame. The filtering process is realized by the overlap-save method. The algorithm uses finite length frames and is implemented in a recursive fashion, providing the capability of real-time processing and adaptability to possible changes of the room acoustic environment. In each frame, the filter estimate is updated on a frequency-by-frequency basis. The algorithm was implemented in a computer software program. Experiments using synthetic speech data were performed with two different types of room transfer functions, a 32 point delayed delta function, and a 1024 point room response experimentally measured in a room. The algorithm achieved above 20 DB signal-to-noise ratio in the output speech, when original speech signals with three different signal-tonoise ratio ( 6 DB, 0 DB, -12 DB ) were input into the algorithm. For actually recorded speech data, however, the result is too obscure to present in this thesis. More work needs to be done for this case. Thesis Supervisor: Bruce R. Musicus Title: Assistant Professor of Electrical Engineering & Computer Science
منابع مشابه
On the Use of Artificial Reverberation for Asr in Highly Reverberant Environments
In this paper, we discuss the use of artificial room reverberation methods to increase the performance of automatic speech recognition (ASR) systems in highly reverberant enclosures. Our approach consists in training acoustic models on artificially reverberated speech material. In order to obtain the desired reverberated speech training database, we propose to use a reverberating filter whose i...
متن کاملSignal enhancement using beamforming and nonstationarity with applications to speech
We consider a sensor array located in an enclosure, where arbitrary transfer functions (TFs) relate the source signal and the sensors. The array is used for enhancing a signal contaminated by interference. Constrained minimum power adaptive beamforming, which has been suggested by Frost and, in particular, the generalized sidelobe canceler (GSC) version, which has been developed by Griffiths an...
متن کاملA corpus-based approach for robust ASR in reverberant environments
In this paper, we discuss the use of artificial room reverberation to increase the performance of automatic speech recognition (ASR) systems in reverberant enclosures. Our approach consists in training acoustic models on artificially reverberated speech material. In order to obtain the desired reverberated speech training database, we propose to use a reverberating filter whose impulse response...
متن کاملA spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial pro...
متن کاملMulti-Channel l1 Regularized Convex Speech Enhancement Model and Fast Computation by the Split Bregman Method
A convex speech enhancement (CSE) method is presented based on convex optimization and pause detection of the speech sources. Channel spatial difference is identified for enhancing each speech source individually while suppressing other interfering sources. Sparse unmixing filters indicating channel spatial differences are sought by l1 norm regularization and the split Bregman method. A subdivi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015