Automatically Generated Models for Unknown Words
نویسنده
چکیده
Especially in recognition of spontaneous speech it is necessary to cope with the occurrence of unknown words. We present an approach to unknown word detection which is integrated into a standard HMM speech recognizer. From the context dependent sub-word units, e.g. triphones, that can be found in the training database a generic word model can be derived automatically using the context restrictions to form valid sequences of sub-word units. This generic word model combines automatically derived knowledge about the phonotactics of the language considered with the modelling quality of context dependent acoustic units. Detection of unknown words is achieved adding this model to the recognizer's lexicon. We present results of experiments carried out on a large German spontaneous speech recognition task.
منابع مشابه
Post Mortem Parsing with Unknown Lexical Items using Morphological Recognition Syntactic Information and a Closed Class Lexicon
The importance of dealing with unknown words in natural language processing NLP is growing as NLP systems are used in more and more applications The ability to parse sentences containing unknown words will make a parsing system more robust and exible The use of syntactic parsing rules provides constraints on the possible lexical categories of unknown words A lexicon of closed class words also o...
متن کاملStatistical Identification of English Loanwords in Korean Using Automatically Generated Training Data
This paper describes an accurate, extensible method for automatically classifying unknown foreign words that requires minimal monolingual resources and no bilingual training data (which is often difficult to obtain for an arbitrary language pair). We use a small set of phonologically-based transliteration rules to generate a potentially unlimited amount of pseudo-data that can be used to train ...
متن کاملTauira: A tool for acquiring unknown words in a dialogue context
This paper describes a tool for acquiring unknown words, which operates in a bilingual human-machine dialogue system. When the user’s utterance includes a word which is not in the system’s lexicon, the system initiates a subdialogue to find out about the new word, by querying the user about the syntactic validity of a number of example sentences generated automatically from the grammar’s test s...
متن کاملAn Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method
Since Japanese and Chinese languages have too many characters to be input directly using a standard keyboard, input methods for these languages that enable users to input the characters are required. Recently, input methods based on statistical models have become popular because of their accuracy and ease of maintenance. Most of them adopt word-based models because they utilize word-segmented c...
متن کاملGrapheme-to-phoneme Conv Morphologica
This paper presents a new approach for grapheme-to-phoneme conversion based on morphology. With this approach, a high accuracy can be obtained, although not for all words a transcription is achieved. The principle of this approach is to automatically decompose an existing pronunciation lexicon into morpheme-similar units called pseudo-morphological units. The pronunciation of the pseudo-morphol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996