ideal binary mask

Binaural Source Separation in Non-ideal Reverberant Environments

2007

Sylvia Schulz Thorsten Herfet

This paper proposes a framework for separating several speech sources in non-ideal, reverberant environments. A movable human dummy head residing in a normal office room is used to model the conditions humans experience when listening to complex auditory scenes. Before the source separation takes place the human dummy head explores the auditory scene and extracts characteristics the same way as...

متن کامل

Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions

2017

Xu Li Junfeng Li Yonghong Yan

Monaural speech segregation is an important problem in robust speech processing and has been formulated as a supervised learning problem. In supervised learning methods, the ideal binary mask (IBM) is usually used as the target because of its simplicity and large speech intelligibility gains. Recently, the ideal ratio mask (IRM) has been found to improve the speech quality over the IBM. However...

متن کامل

Binaural segregation in multisource reverberant environments.

Journal: :The Journal of the Acoustical Society of America 2006

Nicoleta Roman Soundararajan Srinivasan DeLiang Wang

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a ba...

متن کامل

Outcome measures based on classification performance fail to predict the intelligibility of binary-masked speech.

Journal: :The Journal of the Acoustical Society of America 2016

Abigail Anne Kressner Tobias May Christopher J Rozell

To date, the most commonly used outcome measure for assessing ideal binary mask estimation algorithms is based on the difference between the hit rate and the false alarm rate (H-FA). Recently, the error distribution has been shown to substantially affect intelligibility. However, H-FA treats each mask unit independently and does not take into account how errors are distributed. Alternatively, a...

متن کامل

Real-Time Speech Separation & Intelligibility Enhancement on a Field Programmable Gate Array Platform

2012

Valerie S. Hanson Kofi M. Odame

In this paper, we present a real-time implementation of the ideal binary-mask algorithm, which is a promising approach for enhancing speech intelligibility. Our implementation is hardware efficient, making it suitable for embedded biomedical devices such as hearing aids and cochlear implants. We tested our algorithm implementation on an FPGA platform, and produced results that verify that it ef...

متن کامل

On the Role of Binary Mask Pattern in Automatic Speech Recognition

2012

Arun Narayanan DeLiang Wang

Processing noisy signals using the ideal binary mask has been shown to improve automatic speech recognition (ASR) performance. In this paper, we present the first study that investigates the role of mask patterns in ASR under varying signalto-noise ratios (SNR), noise conditions and mask definitions. Binary masks are typically computed either by comparing the local SNR within a time-frequency u...

متن کامل

Binary Mask Programmable Hologram

Journal: :Optics Express 2012

متن کامل

Binary and ratio time-frequency masks for robust speech recognition

Journal: :Speech Communication 2006

Soundararajan Srinivasan Nicoleta Roman DeLiang Wang

A time-varying Wiener filter extracts a speech signal from a mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this s...

متن کامل

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

2015

Andrew J. R. Simpson Gerard Roma Mark D. Plumbley

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of vo...

متن کامل

On binary and ratio time-frequency masks for robust speech recognition

2004

Soundararajan Srinivasan Nicoleta Roman DeLiang Wang

A time-varying Weiner filter extracts the speech signal from a noisy mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the perfor...

متن کامل