Towards single-channel unsupervised source separation of speech mixtures: the layered harmonics/formants separation-tracking model
نویسندگان
چکیده
Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of states to capture source spectral variation, and trained on large amounts of isolated speech. Since observations can be similar between sources, inference relies on sequential constraints from the state transition matrix which are, however, quite weak. To avoid these problems, we propose a strategy of capturing local deformations of the time-frequency energy distribution. Since consecutive spectral frames are highly correlated, each frame can be accurately described as a nonuniform deformation of its predecessor. A smooth pattern of deformations is indicative of a single speaker, and the cliffs in the deformation fields may indicate a speaker switch. Further, the log-spectrum of speech can be decomposed into two additive layers, separately describing the harmonics and formant structure. We model smooth deformations as hidden transformation variables in both layers, using MRFs with overlapping subwindows as observations, assumed to be a noisy sum of the two layers. Loopy belief propagation provides for efficient inference. Without any pre-trained speech or speaker models, this approach can be used to fill in missing time-frequency observations, and the local entropy of the deformation fields indicate source boundaries for separation.
منابع مشابه
Reconstructing individual monophonic instruments from musical mixtures using scene completion
Monaural sound source separation is the process of separating sound sources from a single channel mixture. In mixtures of pitched musical instruments, the problem of overlapping harmonics poses a significant challenge to source separation and reconstruction. One standard method to resolve overlapped harmonics is based on the assumption that harmonics of the same source have correlated amplitude...
متن کاملDenoising through source separation and minimum tracking
In this paper, we develop a multi-channel noise reduction algorithm based on blind source separation (BSS). In contrast to general BSS algorithms that attempt to recover all the signals, we explicitly estimate only the speech signal. By tracking the minimum of the spectral density of the microphone signals, noise-only segments are identified. The coefficients of the unmixing matrix that are nec...
متن کاملSelf-adaption in single-channel source separation
Single-channel source separation (SCSS) usually uses pre-trained source-specific models to separate the sources. These models capture the characteristics of each source and they perform well when matching the test conditions. In this paper, we extend the applicability of SCSS. We develop an EM-like iterative adaption algorithm which is capable to adapt the pre-trained models to the changed char...
متن کاملMulti-channel Source Separation by Beamforming Trained with Factorial Hmms
Speaker separation has conventionally been treated as a problem of Blind Source Separation (BSS). This approach does not utilize any knowledge of the statistical characteristics of the signals to be separated, relying mainly on the independence between the various signals to separate them. Maximum-likelihood techniques, on the other hand, utilize knowledge of the a priori probability distributi...
متن کاملSource-Filter-Based Single-Channel Speech Separation Using Pitch Information
In this paper, we investigate the source–filter-based approach for single-channel speech separation. We incorporate source-driven aspects by multi-pitch estimation in the model-driven method. For multi-pitch estimation, the factorial HMM is utilized. For modeling the vocal tract filters either vector quantization (VQ) or non-negative matrix factorization are considered. For both methods, the fi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004