Extracting word-pronunciation pairs from comparable set of text and speech

نویسندگان

  • Tetsuro Sasada
  • Shinsuke Mori
  • Tatsuya Kawahara
چکیده

One of the problems in text-to-speech (TTS) systems and speech-to-text (STT) systems is pronunciation estimation of unknown words. In this paper, we propose a method for extracting unknown words and their pronunciations from similar sets of Japanese text data and speech data. Out-of-vocabulary words are extracted from text with a stochastic model and pronunciations hypotheses are generated. These entries are verified by conducting automatic speech recognition on audio data. In this work, we use news articles and broadcast TV news covering similar topics. Most extracted pairs turned out to be correct according to a human judges. We also tested the TTS frontend enhanced with these entries on other web news articles, and observed an improvement in the pronunciation estimation accuracy of 9.2% (relative). The proposed method can be used to realize a spoken language processing system that acquires and updates its lexicon automatically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora

A novel approach to automatically extracting transliterated-term pairs from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking pronunciation variation into account. Pronunciation variation is a phenomenon of pronunciation ambiguity that seriously affects the term transliteration and hence affects those results produced by transliteration processe...

متن کامل

Pronunciation dependent language models

Speech recognition systems are conventionally broken up into phonemic acoustic models, pronouncing dictionaries in terms of the phonemic units in the acoustic model and language models in terms of lexical units from the pronouncing dictionary. Here we explore a new method for incorporating pronunciation probabilities into recognition systems by moving them from the pronouncing lexicon into the ...

متن کامل

New word learning for spoken document processing through discovery of comparable texts from external resources

This paper presents a new out-of-vocabulary (OOV) word learning approach that dynamically extends the pronunciation lexicon and the language model for large vocabulary continuous speech recognition (LVCSR) in spoken document retrieval (SDR) systems. Based on the assumption that the graphemes as well as the n-gram statistics of the OOV words can be effectively learned from other contemporary or ...

متن کامل

Predicting Word Pronunciation in Japanese

This paper addresses the problem of predicting the pronunciation of Japanese words, especially those that are newly created and therefore not in the dictionary. This is an important task for many applications including text-to-speech and text input method, and is also challenging, because Japanese kanji (ideographic) characters typically have multiple possible pronunciations. We approach this p...

متن کامل

Korean large vocabulary continuous speech recognition with morpheme-based recognition units

In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morphe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008