The use of subword linguistic modeling for multiple tasks in speech recognition

نویسنده

  • Stephanie Seneff
چکیده

Over the past several years, I have been conducting research on subword modeling in speech recognition. The research is most specifically aimed at the difficult task of identifying and characterizing unknown words, although the proposed framework also has utility in other recognition tasks such as phonological and prosodic modeling. The approach exploits the linguistic substructure of words by describing graphemic, phonemic, phonological, syllabic, and morphemic constraints through a set of context-free rules, and supporting the resulting parse trees with a corpustrained probability model. A derived finite state transducer representation forms a natural means for integrating the trained model into a recognizer search. This paper describes several research projects I have been engaged in, together with my students and associates, aimed at exploring ways in which recognition tasks can benefit from such formal modeling of word substructure. These include phonological modeling, hierarchical duration modeling, sound-to-letter and letter-to-sound mapping, and automatic acquisition of unknown words in a speech understanding system. Results of several experiments in these areas are summarized here. 2003 Published by Elsevier B.V. E

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

An STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units

We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR (Automatic Speech Recognition) is performed for all spoken documents and subword recognition results are generate...

متن کامل

Improved Subword Modeling for WFST-Based Speech Recognition

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexic...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic

Developing a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be available, particularly for under-resourced languages. To avoid the need for the linguistic knowledge, acoustic information can be used to automatically obtain the subword units and the associated pronunciations. Towards that, the present paper investigates the potential of a rec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2004