Automatically Generated Models for Unknown Words

نویسنده

  • G. Sagerer
چکیده

Especially in recognition of spontaneous speech it is necessary to cope with the occurrence of unknown words. We present an approach to unknown word detection which is integrated into a standard HMM speech recognizer. From the context dependent sub-word units, e.g. triphones, that can be found in the training database a generic word model can be derived automatically using the context restrictions to form valid sequences of sub-word units. This generic word model combines automatically derived knowledge about the phonotactics of the language considered with the modelling quality of context dependent acoustic units. Detection of unknown words is achieved adding this model to the recognizer's lexicon. We present results of experiments carried out on a large German spontaneous speech recognition task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Post Mortem Parsing with Unknown Lexical Items using Morphological Recognition Syntactic Information and a Closed Class Lexicon

The importance of dealing with unknown words in natural language processing NLP is growing as NLP systems are used in more and more applications The ability to parse sentences containing unknown words will make a parsing system more robust and exible The use of syntactic parsing rules provides constraints on the possible lexical categories of unknown words A lexicon of closed class words also o...

متن کامل

Statistical Identification of English Loanwords in Korean Using Automatically Generated Training Data

This paper describes an accurate, extensible method for automatically classifying unknown foreign words that requires minimal monolingual resources and no bilingual training data (which is often difficult to obtain for an arbitrary language pair). We use a small set of phonologically-based transliteration rules to generate a potentially unlimited amount of pseudo-data that can be used to train ...

متن کامل

Tauira: A tool for acquiring unknown words in a dialogue context

This paper describes a tool for acquiring unknown words, which operates in a bilingual human-machine dialogue system. When the user’s utterance includes a word which is not in the system’s lexicon, the system initiates a subdialogue to find out about the new word, by querying the user about the syntactic validity of a number of example sentences generated automatically from the grammar’s test s...

متن کامل

An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method

Since Japanese and Chinese languages have too many characters to be input directly using a standard keyboard, input methods for these languages that enable users to input the characters are required. Recently, input methods based on statistical models have become popular because of their accuracy and ease of maintenance. Most of them adopt word-based models because they utilize word-segmented c...

متن کامل

Grapheme-to-phoneme Conv Morphologica

This paper presents a new approach for grapheme-to-phoneme conversion based on morphology. With this approach, a high accuracy can be obtained, although not for all words a transcription is achieved. The principle of this approach is to automatically decompose an existing pronunciation lexicon into morpheme-similar units called pseudo-morphological units. The pronunciation of the pseudo-morphol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996