Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System
نویسندگان
چکیده
The BBN BYBLOS continuous speech recognition system has been used to develop a method of speaker adaptation from limited training. The key step in the method is the estimation of a probabilistic spectral mapping between a prototype speaker, for whom there exists a well-trained speaker-dependent hidden Markov model (HMM), and a target speaker for whom there is only a small amount of training speech available. The mapping defines a set of transformation matrices which are used to modify the parameters of the prototype model. The resulting transformed model is then used as an approximation to a well-trained model for the target speaker. We review the techniques employed to accomplish this transformation and present experimental results conducted on the DARPA Resource Management database. 1. I n t r o d u c t i o n Soon after a speech recognition system begins operation, small amounts of new speech data become available to the system as spoken utterances are successfully transcribed to text. This data is of potentially great value to the system because it contains detailed information on the current state of the speaker and the environment. The purpose of rapid speaker adaptation is to utilize such small samples of speech to improve the recognition performance of the system. Speaker adaptation offers other benefits as well. For applications which cannot tolerate the initial training expense of high performance speaker-dependent models, adaptation can trade-off peak performance for rapid training of the system. For typical experimental systems being investigated today on a 1000-word continuous speech task domain, speaker-dependent training uses 30 minutes of speech (600 sentences), while the adaptation methods described here use only 2 minutes (40 sentences). For applications in which an initial speaker-independent model fails to perform adequately due to a change in the environment or the task domain not represented in the training data, adaptation can utilize an economical initial model generated from the speaker-dependent training of a single prototype speaker. Again, looking at typical systems today, speaker-independent models train on 3 1/2 hours of speech (4200 sentences), while adaptation can use a speaker-dependent model trained from 30 minutes (600 sentences). In this paper, we describe the speaker adaptive capabilities of the BBN BYBLOS continuous speech recognition system. Our basic approach to the problem is described first in section 2. Two methods for estimating the speaker transformation are described in section 3. In section 4 we present our latest results on a standard testbed database.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملBYBLOS Speech Recognition Benchmark Results
This paper presents speech recognition test results from the BBN BYBLOS system on the Feb 91 DARPA benchmarks in both the Resource Management (RM) and the Air Travel Information System (ATIS) domains. In the RM test, we report on speaker-independent (SI) recognition performance for the standard training condition using 109 speakers and for our recently proposed SI model made from only 12 traini...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملThe BBN BYBLOS Continuous Speech Recognition System
In this paper we describe the algorithms used in the BBN BYBLOS Continuous Speech Recognition system. The BYBLOS system uses context-dependent hidden Markov models of phonemes to provide a robust model of phonetic coarticulation. We provide an update of the ongoing research aimed at improving the recognition accuracy. In the first experiment we confirm the large improvement in accuracy that can...
متن کاملThe 2000 BBN Byblos LVCSR system
This paper describes the 2000 BBN Byblos Large Vocabulary Continuous Speech Recognition (LVCSR) system. We briefly outline the training and decoding procedures used in the system, and explain in detail the new features we have added to the system in the past year. These new features include multiple adaptation stages, parallel path rescoring, and a new word confidence system. Word error rate re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1989