speaker transformation

Connectionist Speaker Normalization and Its Applications to Speech Recognition

2013

X. D. Huang K. F. Lee

Speaker normalization may have a significant impact on both speakeradaptive and speaker-independent speech recognition. In this paper, a codeworddependent neural network (CDNN) is presented for speaker normalization. The network is used as a nonlinear mapping function to transform speech data between two speakers. The mapping function is characterized by two important properties. First, the ass...

متن کامل

Speaker normalization through constrained MLLR based transforms

2004

Diego Giuliani Matteo Gerosa Fabio Brugnara

In this paper, a novel speaker normalization method is presented and compared to a well known vocal tract length normalization method. With this method, acoustic observations of training and testing speakers are mapped into a normalized acoustic space through speaker-specific transformations with the aim of reducing inter-speaker acoustic variability. For each speaker, an affine transformation ...

متن کامل

Speaker Adaptation Using Multiple Reference Speakers

1989

Francis Kubala Richard M. Schwartz Chris Barry

We introduce a new technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large vocabulary continuous speech recognition. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech can...

متن کامل

Linear transformation approaches to many-to-one voice conversion

2010

Chie Hayashida Tomoki Toda Yamato Ohtani Hiroshi Saruwatari Kiyohiro Shikano

In this paper, we present linear transformation algorithms for many to one voice conversion (VC). Many to one VC is a tech nique for converting an arbitrary source speaker’s voice into the target speaker’s voice. A conversion model previously devel oped between many prestored source speakers and the target speaker is adapted into a new source speaker in an unsuper vised manner. In this study, w...

متن کامل

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION by

2009

Jidong Tao

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION Jidong Tao, B.Eng., M.S. Marquette University, 2009 Automatic speech recognition (ASR) converts human speech to readable text. Acoustic model adaptation, also called speaker adaptation, is one of the most promising techniques in ASR for improving recognition accuracy. Adaptation works by tuning a g...

متن کامل

Regularized-MLLR speaker adaptation for computer-assisted language learning system

2010

Dean Luo Yu Qiao Nobuaki Minematsu Yutaka Yamauchi Keikichi Hirose

In this paper, we propose a novel speaker adaptation technique, regularized-MLLR, for Computer Assisted Language Learning (CALL) systems. This method uses a linear combination of a group of teachers’ transformation matrices to represent each target learner’s transformation matrix, thus avoids the over-adaptation problem that erroneous pronunciations come to be judged as good pronunciations afte...

متن کامل

Speaker Adaptation Using Multiple Reference Speakers

2006

We introduce a new technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large vocabulary continuous speech recognition. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech can...

متن کامل

Multi - Grained Modeling with Pattern Speci cMaximum

2003

Upendra V. Chaudhari

| We present a transformation based, multi-grained data modeling technique in the context of text independent speaker recognition, aimed at mitigating diicul-ties caused by sparse training and test data. Both identi-cation and veriication are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. F...

متن کامل

A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis

2005

Gabriel Webster Tina Burrows Kate Knill

Unit-based text-to-speech (TTS) systems typically use a set of speech recordings that have been phonetically transcribed to create a large set of phonetic units. During synthesis, pronunciations for input text are generated and used to guide the selection of a sequence of phonetic units. The style of these system pronunciations must match the style of the phonetic transcriptions of the recorded...

متن کامل

Structural speaker adaptation using maximum a posteriori approach and a Gaussian distributions merging technique

2003

Olivier Bellot Driss Matrouf Pascal Nocera Georges Linarès Jean-François Bonastre

The aim of speaker adaptation techniques is to enhance the speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. Recently, a technique based on hierarchical structure and the maximum a posteriori criterion was proposed (SMAP). In this paper, like in SMAP, we assume that the acoustic models parameters are o...

متن کامل