SRI November 1993 CSR Spoke Evaluation
نویسنده
چکیده
In this paper we present SRI’s results on the 1993 ARPA CSR Spoke Evaluations. This evaluation used the same HMM acoustic models as those used in SRI’s hub system: gender-dependent Genonic HMM’s. The system was made robust by modifying the front end algorithms to estimate the cepstral features (the HMM models were not modified). The robust front-end used a wide bandwidth (100-6400Hz) and estimated the cepstral coefficients using a series of algorithms that had little effect on the Sennheiser features while making the secondary microphone features look more like the Sennheiser features. The decoder used SRI’s DECIPHERTM speech recognition system [1-5] with a progressive search multipass HMM system, and used the Lincoln Lab 5K NVP trigram language model. 1. SYSTEM DESIGN HIGHLIGHTS An overview of the SRI robust system design used for spokes S5, S6 and S7 is shown below: Figure 1: SRI’s Robust CSR Block Diagram SRI NOVEMBER 1993 CSR SPOKE EVALUATION Mitchel Weintraub, Leonardo Neumeyer and Vassilios Digalakis SRI International Speech Technology and Research Laboratory Menlo Park, CA, 94025 2. RESULTS HIGHLIGHTS SRI’s results on Spoke S5 demonstrated that for unknown workstation microphones, there was an overall increase in the word-error rate of 27% over the Sennheiser microphone. For the Audio-Technica Microphone in Spoke S6, there was an 8% increase in word-error rate over the Sennheiser microphone. There was no significant difference between the performance using the Sennheiser microphone and the Audio-Technica Microphone. This is the first time that no significant increase in worderror was observed when training with a high-quality close-talking microphone and testing with a secondary microphone. SRI’s experiments demonstrated that unknown microphone algorithms can outperform known-microphone algorithms. By designing speech-recognition systems that use information about many different microphones, a recognition system can be designed so that small amounts of information about a new environment will not be sufficient to improve performance. The recognition analogy is that speaker-independent system with lots of training can outperform a speaker dependent system with only limited training. For this reason, SRI’s system for Spokes S6 and S7 used our best robust-system from Spoke S5. This has led to some confusion for the P0 condition for these spokes. In summary, for Spokes S6 and S7 we did not think that it was necessary to adapt to the new microphone conditions as our microphoneindependent system was robust to channel and noise conditions. 3. GENDER/MICROPHONE SELECTION The gender selection algorithm consisted of a two stage process. The first stage was a fast initial gender decision which used a single state HMM model for each gender (one state for male speech, one state for female speech). Each state used 256 Gaussian mixtures to represent the speech features. The features used for initial gender determination was the baseline zero-mean cepstral features C1-C12 augmented with pitch information. After the initial gender selection, progressive-search word lattices [3] were generated with speech-recognition models of the initial gender. These word-lattices were used to score the input utterance with a full-HMM system of each gender. The full-HMM models were then used to make the final gender selection based on HMM probability. If the HMM models reversed the decision of the earlier classifier, then new progressive-search word-lattices were generated for this sentence.
منابع مشابه
The Hub and Spoke Paradigm for CSR Evaluation
In this paper, we introduce the new paradigm used in the most recent ARPA-sponsored Continuous Speech Recognition (CSR) evaluation and then discuss the important features of the test design. The 1993 CSR evaluation was organized in a novel fashion in an attempt to accomodate research over a broad variety of important problems in CSR while maintaining a clear program-wide research focus. Further...
متن کاملNIST-ARPA Interagency Agreement: Human Language Technology Program
PROJECT GOALS 1. To coordinate the design, development and distribution of speech and natural language corpora for the ARPA Spoken Language research community, and the use of these corpora for technology development and evaluation. 2. To design, coordinate the implementation of, and analyze the results of performance assessment benchmark tests for ARPA's speech recognition and spoken language u...
متن کاملCSR Data Collection
The CSR Development and Evaluation Spokes data collection task yielded 4435 development test utterances fr~n 30 speakers and 4878 evaluation test utterances from a different set of 30 speakers. The development test data covered eight different spoke conditions, each of which had its own distinct combination of subject, prompt text, microphone and recording environment requirements. Similarly, t...
متن کاملSocially Responsible Investment-based Portfolio Selection Problems with Fuzziness
This paper considers several portfolio selection problems considering Socially Responsible Investment (SRI), which is the most important measure to sustain continuous developments of companies by performing environment-friendliness and suitable social activity, and which is also essential for avoiding the latent risk. Corporate Social Responsibility (CSR) is presented as linguistic and ambiguou...
متن کاملDevelopments in Large Vocabulary Dictation : The LIMSI Nov 94 NAB System yJ
In this paper we report on our development work in large vocabulary , American English continuous speech dictation on the ARPA NAB task in preparation for the November 1994 evaluation. We have experimented with (1) alternative analyses for the acoustic front end, (2) the use of an enlarged vocabulary of 65k words so as to reduce the number of errors due to out-of-vocabulary words, (3) extension...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993