Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation
نویسندگان
چکیده
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful modern automatic recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters adopted remove distortions, however, conventional mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved solution is sometimes numerically unstable during joint training with networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target separation, which extends our preliminary ADL-MVDR approach. The MCMF system addresses linear and distortions. Spatio-temporal cross correlations also fully utilized evaluated using Mandarin audio-visual corpus compared several state-of-the-art approaches. Experimental results demonstrate superiority under different scenarios across evaluation metrics, including ASR performance.
منابع مشابه
tight frame approximation for multi-frames and super-frames
در این پایان نامه یک مولد برای چند قاب یا ابر قاب تولید شده تحت عمل نمایش یکانی تصویر برای گروه های شمارش پذیر گسسته بررسی خواهد شد. مثال هایی از این قاب ها چند قاب های گابور، ابرقاب های گابور و قاب هایی برای زیرفضاهای انتقال پایاست. نشان می دهیم که مولد چند قاب تنک نرمال شده (ابرقاب) یکتا وجود دارد به طوری که مینیمم فاصله را از ان دارد. همچنین مسایل مشابه برای قاب های دوگان مطرح شده و برخی ...
15 صفحه اولTwo-stage multi-target joint learning for monaural speech separation
Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods in...
متن کاملMulti-Target Ensemble Learning for Monaural Speech Separation
Speech separation can be formulated as a supervised learning problem where a machine is trained to cast the acoustic features of the noisy speech to a time-frequency mask, or the spectrum of the clean speech. These two categories of speech separation methods can be generally referred as the masking-based and the mapping-based methods, but none of them can perfectly estimate the clean speech, si...
متن کاملA Multi-Stage, Multi-Channel Processing System for Overlapping Speech Separation in a Real Scenario
This paper addresses the problem of overlapping speech separation in a noisy room using a microphone array. The presented approach proposes a multistage processing framework to separate the desired sources and reduce the corruptive effects of noise, reverberation and interference. More specifically, 1) a beamformer separates the sources based on their location diversities, 2) a postfilter maxim...
متن کاملMulti-channel speech separation with soft time-frequency masking
This paper addresses the problem of separating concurrent speech through a spatial filtering stage and a subsequent time-frequency masking stage. These stages complement each other by first exploiting the spatial diversity and then making use of the fact that different speech signals rarely occupy the same frequency bins at a time. The novelty of the paper consists in the use of auditorymotivat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3129335