reporting error

Combined Systems for Automatic Phonetic Transcription of Proper Nouns

2008

Antoine Laurent Téva Merlin Sylvain Meignier Yannick Estève Paul Deléglise

Large vocabulary automatic speech recognition (ASR) technologies perform well in known, controlled contexts. However recognition of proper nouns is commonly considered as a difficult task. Accurate phonetic transcription of a proper noun is difficult to obtain, although it can be one of the most important resources for a recognition system. In this article, we propose methods of automatic phone...

متن کامل

Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation

Journal: :Speech Communication 1999

Judith M. Kessens Mirjam Wester Helmer Strik

This article describes how the performance of a Dutch continuous speech recognizer was improved by modeling pronunciation variation. We propose a general procedure for modeling pronunciation variation. In short, it consists of adding pronunciation variants to the lexicon, retraining phone models and using language models to which the pronunciation variants have been added. First, within-word pr...

متن کامل

The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu

2016

Ewald van der Westhuizen Thomas Niesler

We consider the phenomenon of postlexical deletion in fast spontaneously spoken isiZulu speech and its implication for automatic speech recognition (ASR). Analysis of hand-crafted transcripts of fast spontaneous speech recorded from broadcast media indicates that postlexical deletion, especially of vowels, is common in isiZulu. We show that ASR performance can be increased by inclusion of pronu...

متن کامل

Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera

2014

Patrick Cardinal Ahmed M. Ali Najim Dehak Yu Zhang Tuka Al Hanai Yifan Zhang James R. Glass Stephan Vogel

This paper describes a detailed comparison of several state-ofthe-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vectorbased speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteri...

متن کامل

Time-dependent cross-probability model for multi-environment model based LInear normalization

2006

Luis Buera Eduardo Lleida Juan Arturo Nolazco-Flores Antonio Miguel Alfonso Ortega

In a previous work, Multi-Environment Model based LInear Normalization, MEMLIN, was presented and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs). In this algorithm, the probability of the clean model Gaussian, given the noisy model one and the noisy featur...

متن کامل

An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition

2007

Takanobu Oba Takaaki Hori Atsushi Nakamura

This paper focuses on an error-corrective method through reranking of hypotheses in speech recognition. Some recent work investigated corrective models that can be used to rescore hypotheses so that a hypothesis with a smaller error rate has a higher score. Discriminative training such as perceptron algorithm can be used to estimate such corrective models. In discriminative training, how to cho...

متن کامل

Measuring the acceptable word error rate of machine-generated webcast transcripts

2006

Cosmin Munteanu Gerald Penn Ronald Baecker Elaine Toms David James

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text transcripts of the audio channel of the webcast archive. In this paper, we proposed a procedure f...

متن کامل

Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation

2012

Ramya Rasipuram Mathew Magimai-Doss

In a recent work, we proposed an acoustic data-driven grapheme-to-phoneme (G2P) conversion approach, where the probabilistic relationship between graphemes and phonemes learned through acoustic data is used along with the orthographic transcription of words to infer the phoneme sequence. In this paper, we extend our studies to under-resourced lexicon development problem. More precisely, given a...

متن کامل

Estimating speech recognition error rate without acoustic test data

2003

Yonggang Deng Milind Mahajan Alex Acero

We address the problem of estimating the word error rate (WER) of an automatic speech recognition (ASR) system without using acoustic test data. This is an important problem which is faced by the designers of new applications which use ASR. Quick estimate of WER early in the design cycle can be used to guide the decisions involving dialog strategy and grammar design. Our approach involves estim...

متن کامل

Robust Online Multi-Channel Speech Recognition

2016

Markus Kitza Albert Zeyer Ralf Schlüter Jahn Heymann Reinhold Häb-Umbach

In this paper we present a system for robust online far-field multi-channel speech recognition with minimal assumptions on microphone configuration and target location. We employ an online-enabled Generalized Eigenvalue (GEV) beamformer and a Long Short-TermMemory (LSTM) network to robustly calculate the signal statistics necessary for the beamforming operation in the front-end. After multiple ...

متن کامل