Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions
نویسندگان
چکیده
In this paper, we formulate a blind source separation (BSS) framework, which allows integrating U-Net based deep learning network with probabilistic spatial machine expectation maximization (EM) algorithm for separating speech in reverberant conditions. Our proposed model uses pre-trained convolutional neural network, U-Net, clustering the interaural level difference (ILD) cues and phase (IPD) cues. The integrated exploits complementary strengths of two approaches to BSS: strong modeling power supervised networks ease unsupervised algorithms, whose few parameters can be estimated on as little single segment an audio mixture. results show average improvement 4.3 dB signal distortion ratio (SDR) 4.3% short time intelligibility (STOI) over EM MESSL-GS (model-based expectation–maximization localization garbage source) 4.5 SDR 8% STOI (U-Net) SONET under conditions ranging from anechoic those mostly encountered real world.
منابع مشابه
Binaural Reverberant Speech Separation Based on Deep Neural Networks
Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaura...
متن کاملDeep Ensemble Learning for Monaural Speech Separation
Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences be...
متن کاملA Feature Study for Masking-Based Reverberant Speech Separation
Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety ...
متن کاملSeparation of Underdetermined Reverberant Speech Mixtures by Monaural, Binaural and Statistical Cue Combination
Underdetermined reverberant speech separation is a challenging problem in source separation that has received considerable attention in both computational auditory scene analysis (CASA) and blind source separation (BSS). Recent studies suggest that, in general, the performance of frequency domain BSS methods suffer from the permutation problem across frequencies which degrades in high reverbera...
متن کاملExpectation-maximization analysis of spatial time series
Expectation maximization (EM) is used to estimate the parameters of a Gaussian Mixture Model for spatial time series data. The method is presented as an alternative and complement to Empirical Orthogonal Function (EOF) analysis. The resulting weights, associating time points with component distributions, are used to distinguish physical regimes. The method is applied to equatorial Pacific sea s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Acoustics
سال: 2021
ISSN: ['0003-682X', '1872-910X']
DOI: https://doi.org/10.1016/j.apacoust.2021.108048