speaker transformation

On-line incremental speaker adaptation with automatic speaker change detection

2000

Zhipeng Zhang Sadaoki Furui Katsutoshi Ohtsuki

In order to improve the performance of speech recognition systems when speakers change frequently and each of them utters a series of several sentences, a new unsupervised, online and incremental speaker adaptation technique combined with automatic detection of speaker changes is proposed. The speaker change is detected by comparing likelihoods using speaker-independent and speaker-adaptive GMM...

متن کامل

Speaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms

1998

John W. McDonough Alan V. Oppenheim Philip E. Gill Walter Murray John R. Deller John G. Proakis

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...

متن کامل

Probabilistic feature-based transformation for speaker verification over telephone networks

Journal: :Neurocomputing 2007

Man-Wai Mak Kwok-Kwong Yiu Sun-Yuan Kung

Feature transformation aims to reduce the effects of channeland handset-distortion in telephone-based speaker verification. This paper compares several feature transformation techniques and evaluates their verification performance and computation time under the 2000 NIST speaker recognition evaluation protocol. Techniques compared include feature mapping (FM), stochastic feature transformation ...

متن کامل

Online Bayesian tree-structured transformation of HMMs with optimal model selection for speaker adaptation

Journal: :IEEE Trans. Speech and Audio Processing 2001

Shaojun Wang Yunxin Zhao

This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear...

متن کامل

Average-Voice-Based Speech Synthesis

2006

Junichi Yamagishi

This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...

متن کامل

Robustness in ASR: An Experimental Study of the Interrelationship between Discriminant Feature-Space Transformation, Speaker Normalization and Environment Compensation

2007

Alireza Keyvani

This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These int...

متن کامل

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks

2016

Wonkyum Lee Kyu J. Han Ian R. Lane

In this paper, we present a new i-vector based speaker adaptation method for automatic speech recognition with deep neural networks, focusing on in-vehicle scenarios. Our proposed method is, rather than augmenting i-vectors to acoustic feature vectors to form concatenated input vectors for adapting neural network acoustic model parameters, is to perform featurespace transformation with smaller ...

متن کامل

Speaker adaptation in transformation space using two-dimensional PCA

2010

Yongwon Jeong Young Rok Song Hyung Soon Kim

This paper describes a principled application of twodimensional principal component analysis (2DPCA) to the decomposition of transformation matrices of maximum likelihood linear regression (MLLR) and its application to speaker adaptation using the bases derived from the analysis. Our previous work applied 2DPCA to speaker-dependent (SD) models to obtain the bases for state space. In this work, ...

متن کامل

Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models

2017

Lahiru Samarakoon Brian Kan-Wing Mak Khe Chai Sim

Factorized Hidden Layer (FHL) adaptation has been proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In FHL adaptation, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vector...

متن کامل

On the use of orthogonal GMM in speaker recognition

1999

Li Liu Jialong He

The Gaussian mixture modeling (GMM) techniques are increasingly being used for both speaker identification and verification. Most of these models assume diagonal covariance matrices. Although empirically any distribution can be approximated with a diagonal GMM, a large number of mixture components are usually needed to obtain a good approximation. A consequence of using a large GMM is that its ...

متن کامل