ideal binary mask

A Mask Estimation Method Integrating Data Field Model for Speech Enhancement

2017

Xianyun Wang Changchun Bao Feng Bao

In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (TF) units. To reduce the error, the ideal ratio ...

متن کامل

Blind Dereverberation of Audio Signals

2008

Graham Grindlay

This project examines the problem of single channel blind dereverberation. After estimating the T60 value, a time-domain binary masking approach was used to remove regions of the signal that were largely dominated by reverberant energy. Performance of the system was examined for several different classes of audio (hand clapping, drums, and speech) and for varying amounts of reverberation. In ad...

متن کامل

A data-driven approach for estimating the time-frequency binary mask

2009

Gibak Kim Philipos C. Loizou

The ideal binary mask, often used in robust speech recognition applications, requires an estimate of the local SNR in each timefrequency (T-F) unit. A data-driven approach is proposed for estimating the instantaneous SNR of each T-F unit. By assuming that the a priori SNR and a posteriori SNR are uniformly distributed within a small region, the instantaneous SNR is estimated by minimizing the l...

متن کامل

Time-Frequency Trade-offs for Audio Source Separation with Binary Masks

Journal: :CoRR 2015

Andrew J. R. Simpson

The short-time Fourier transform (STFT) provides the foundation of binary-mask based audio source separation approaches. In computing a spectrogram, the STFT window size parameterizes the trade-off between time and frequency resolution. However, it is not yet known how this parameter affects the operation of the binary mask in terms of separation quality for real-world signals such as speech or...

متن کامل

Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments

2012

Oldooz Hazrati Jaewook Lee Philipos C. Loizou

A blind (non-ideal) time-frequency (T-F) masking technique is proposed for suppressing reverberation. A binary mask is estimated at each T-F unit by extracting a single variance-based feature from the reverberant signal and comparing its value against an adaptive threshold. The performance of the estimated binary mask is evaluated using intelligibility listening tests with hearing impaired list...

متن کامل

Speaker separation using visually-derived binary masks

2013

Faheem Khan Ben P. Milner

This paper is concerned with the problem of single-channel speaker separation and exploits visual speech information to aid the separation process. Audio from a mixture of speakers is received from a single microphone and to supplement this, video from each speaker in the mixture is also captured. The visual features are used to create a time-frequency binary mask that identifies regions where ...

متن کامل

Classification based binaural dereverberation

2013

Nicoleta Roman Michael I. Mandel

Reverberation has a detrimental effect on speech perception both in terms of quality as well as intelligibility, as late reflections smear temporal and spectral cues. The ideal binary mask, which is an established computational approach to sound separation, was recently extended to remove reverberation. Experiments with both normal hearing and hearing impaired listeners have shown significant i...

متن کامل

On the Window-disjoint-orthogonality of Speech Sources in Reverberant Humanoid Scenarios

2008

Sylvia Schulz Thorsten Herfet

Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how ...

متن کامل

Blind Adaptive Mask to Improve Intelligibility of Non-Stationary Noisy Speech

Journal: :IEEE Signal Processing Letters 2021

This letter proposes a novel blind acoustic mask (BAM) designed to adaptively detect noise components and preserve target speech segments in time-domain. A robust standard deviation estimator is applied the non-stationary noisy identify masking elements. The main contribution of proposed solution use this statistics derive an adaptive information define select samples with lower proportion. Thu...

متن کامل

Dual-Channel Cosine Function Based ITD Estimation for Robust Speech Separation

2017

Xuliang Li Zhaogui Ding Weifeng Li Qingmin Liao

In speech separation tasks, many separation methods have the limitation that the microphones are closely spaced, which means that these methods are unprevailing for phase wrap-around. In this paper, we present a novel speech separation scheme by using two microphones that does not have this restriction. The technique utilizes the estimation of interaural time difference (ITD) statistics and bin...

متن کامل