نتایج جستجو برای: speaker transformation

تعداد نتایج: 242055  

2000
Zhipeng Zhang Sadaoki Furui Katsutoshi Ohtsuki

In order to improve the performance of speech recognition systems when speakers change frequently and each of them utters a series of several sentences, a new unsupervised, online and incremental speaker adaptation technique combined with automatic detection of speaker changes is proposed. The speaker change is detected by comparing likelihoods using speaker-independent and speaker-adaptive GMM...

1998
John W. McDonough Alan V. Oppenheim Philip E. Gill Walter Murray John R. Deller John G. Proakis

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...

Journal: :Neurocomputing 2007
Man-Wai Mak Kwok-Kwong Yiu Sun-Yuan Kung

Feature transformation aims to reduce the effects of channeland handset-distortion in telephone-based speaker verification. This paper compares several feature transformation techniques and evaluates their verification performance and computation time under the 2000 NIST speaker recognition evaluation protocol. Techniques compared include feature mapping (FM), stochastic feature transformation ...

Journal: :IEEE Trans. Speech and Audio Processing 2001
Shaojun Wang Yunxin Zhao

This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear...

2006
Junichi Yamagishi

This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...

2007
Alireza Keyvani

This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These int...

2016
Wonkyum Lee Kyu J. Han Ian R. Lane

In this paper, we present a new i-vector based speaker adaptation method for automatic speech recognition with deep neural networks, focusing on in-vehicle scenarios. Our proposed method is, rather than augmenting i-vectors to acoustic feature vectors to form concatenated input vectors for adapting neural network acoustic model parameters, is to perform featurespace transformation with smaller ...

2010
Yongwon Jeong Young Rok Song Hyung Soon Kim

This paper describes a principled application of twodimensional principal component analysis (2DPCA) to the decomposition of transformation matrices of maximum likelihood linear regression (MLLR) and its application to speaker adaptation using the bases derived from the analysis. Our previous work applied 2DPCA to speaker-dependent (SD) models to obtain the bases for state space. In this work, ...

2017
Lahiru Samarakoon Brian Kan-Wing Mak Khe Chai Sim

Factorized Hidden Layer (FHL) adaptation has been proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In FHL adaptation, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vector...

1999
Li Liu Jialong He

The Gaussian mixture modeling (GMM) techniques are increasingly being used for both speaker identification and verification. Most of these models assume diagonal covariance matrices. Although empirically any distribution can be approximated with a diagonal GMM, a large number of mixture components are usually needed to obtain a good approximation. A consequence of using a large GMM is that its ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید