gmm model

Efficient Gaussian mixture model evaluation in voice conversion

2006

Jilei Tian Jani Nurminen Victor Popa

Voice conversion refers to the adaptation of the characteristics of a source speaker's voice to those of a target speaker. Gaussian mixture models (GMM) have been found to be efficient in the voice conversion task. The GMM parameters are estimated from a training set with the goal to minimize the mean squared error (MSE) between the transformed and target vectors. Obviously, the quality of the ...

متن کامل

Deep bottleneck network based i-vector representation for language identification

2015

Yan Song Xinhai Hong Bing Jiang Ruilian Cui Ian Vince McLoughlin Li-Rong Dai

This paper presents a unified i-vector framework for language identification (LID) based on deep bottleneck networks (DBN) trained for automatic speech recognition (ASR). The framework covers both front-end feature extraction and back-end modeling stages.The output from different layers of a DBN are exploited to improve the effectiveness of the i-vector representation through incorporating a mi...

متن کامل

Estimating Mutual Information Using Gaussian Mixture Model for Feature Ranking and Selection [IJCNN2046]

2006

Tian Lan Deniz Erdogmus Umut Ozertem Yonghong Huang

Feature selection is a critical step for pattern recognition and many other applications. Typically, feature selection strategies can be categorized into wrapper and filter approaches. Filter approach has attracted much attention because of its flexibility and computational efficiency. Previously, we have developed an ICA-MI framework for feature selection, in which the Mutual Information (MI) ...

متن کامل

Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems

2004

Arthur Chan Mosur Ravishankar Alexander I. Rudnicky Jahanzeb Sherwani

Large vocabulary continuous speech recognition systems are known to be computationally intensive. A major bottleneck is the Gaussian mixture model (GMM) computation and various techniques have been proposed to address this problem. We present a systematic study of fast GMM computation techniques. As there are a large number of these and it is impractical to exhaustively evaluate all of them, we...

متن کامل

Formant frequency prediction from MFCC vectors in noisy environments

2005

Jonathan Darch Ben P. Milner Saeed Vaseghi

This paper proposes a method of predicting the formant frequencies of a frame of speech from its mel-frequency cepstral coefficient (MFCC) representation. Prediction is achieved through the creation of a Gaussian mixture model (GMM) which models the joint density of formant frequencies and MFCCs. Using this GMM and an input MFCC vector, a maximum a posteriori (MAP) prediction of the formant fre...

متن کامل

Acoustic-to-articulatory inversion mapping with Gaussian mixture model

2004

Tomoki Toda Alan W. Black Keiichi Tokuda

This paper describes the acoustic-to-articulatory inversion mapping using a Gaussian Mixture Model (GMM). Correspondence of an acoustic parameter and an articulatory parameter is modeled by the GMM trained using the parallel acousticarticulatory data. We measure the performance of the GMMbased mapping and investigate the effectiveness of using multiple acoustic frames as an input feature and us...

متن کامل

Two-class signal segmentation for speech/music detection in audio tracks

1999

Mouhamadou Seck Frédéric Bimbot Didier Zugaj Bernard Delyon

We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distribution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is selected using two criteria : the Minimum Description Length (MDL) criterion ...

متن کامل

Natural-emotion GMM transformation algorithm for emotional speaker recognition

2007

Zhenyu Shan Yingchun Yang Ruizhi Ye

One of the largest challenges in speaker recognition is dealing with speaker-emotion variability problem. Nowadays, compensation techniques are the main solutions to this problem. In these methods, all kinds of speakers’ emotion speech should be elicited thus it is not user-friendly in the application. Therefore the basic problem is how to get the distribution of speakers’ emotion speech and ho...

متن کامل

Supervised/Unsupervised Voice Activity Detectors for Text- dependent Speaker Recognition on the RSR2015 Corpus

2014

Md Jahangir Alam Patrick Kenny Pierre Ouellet Themos Stafylakis Pierre Dumouchel

Voice activity detection, i.e., discrimination of the speech/nonspeech segments in a speech signal, is an important enabling technology for a variety of speech-based applications including the speaker recognition. In this work we provide a performance evaluation of the following supervised and unsupervised VAD algorithms in the context of text-dependent speaker recognition on the RSR2015 (Robus...

متن کامل

A Transformed System GMM Estimator for Dynamic Panel Data Models ∗

2014

Xiaojin Sun Richard A. Ashley Suqin Ge Kazuhiko Hayakawa Kwok Ping Tsang

The system GMM estimator developed by Blundell and Bond (1998) for dynamic panel data models has been widely used in empirical work; however, it does not perform well with weak instruments. This paper proposes a variation on the system GMM estimator, based on a simple transformation of the dependent variable. Simulation results indicate that, in finite samples, this transformed system GMM estim...

متن کامل