Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models

نویسندگان

  • Michael K. Jensen
  • Søren Skou Nielsen
چکیده

Several source separation techniques use binary masking on spectrograms to separate two or more speakers from each other. In this thesis, the possibilities for obtaining the best quality signal, reconstructed from masked spectrograms through vector quantized models of speakers, is investigated. The advantages and disadvantages of such an approach are examined. Additionally, the task of signal reestimation from a spectrogram is investigated using several algorithms. Vector quantization of speakers can be used to improve on binary masked spectrograms but the approach is not shown to produce high quality speech. It is also concluded that phase information is very important for high quality speech reconstruction, and parameters for optimal phase reestimation are suggested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of EMD-Based Speaker Recognition Using ISCSLP2006 Chinese Speaker Recognition Evaluation Corpus

In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover’s Distance (EMD) using ISCSLP2006 Chinese speaker recognition evaluation corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which ...

متن کامل

Maximum Likelihood and Maximum a Posteriori Adaptation for Distributed Speaker Recognition Systems

We apply the ETSI’s DSR standard to speaker verification over telephone networks and investigate the effect of extracting spectral features from different stages of the ETSI’s front-end on speaker verification performance. We also evaluate two approaches to creating speaker models, namely maximum likelihood (ML) and maximum a posteriori (MAP), in the context of distributed speaker verification....

متن کامل

Speaker Identification using Spectrograms of Varying Frame Sizes

In this paper, a text dependent speaker recognition algorithm based on spectrogram is proposed. The spectrograms have been generated using Discrete Fourier Transform for varying frame sizes with 25% and 50% overlap between speech frames. Feature vector extraction has been done by using the row mean vector of the spectrograms. For feature matching, two distance measures, namely Euclidean distanc...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Speaker Identification Method Using Earth Mover's Distance for CCC Speaker Recognition Evaluation 2006

In this paper, we present a non-parametric speaker identification method using Earth Mover’s Distance (EMD) designed for text-indepedent speaker identification and its evaluation results for CCC Speaker Recognition Evaluation 2006, organized by the Chinese Corpus Consortium (CCC) for the th International Symposium on Chinese Spoken Language Processing (ISCSLP 2006). EMD based speaker identifica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006