A multi-class approach for modelling out-of-vocabulary words

نویسندگان

  • Issam Bazzi
  • James R. Glass
چکیده

In this paper we present a multi-class extension to our approach for modelling out-of-vocabulary (OOV) words [1]. Instead of augmenting the word search space with a single OOV model, we add several OOV models, one for each class of words. We present two approaches for designing the OOV word classes. The first approach relies on using common part-of-speech tags. The second approach is a data-driven two-step clustering procedure, where the first step uses agglomerative clustering to derive an initial class assignment, while the second step uses iterative clustering to move words from one class to another in order to reduce the model perplexity. We present experiments within the JUPITER weather information domain. Results show that the multi-class model significantly improves performance over using a single OOV class. For an OOV detection rate of 70%, the false alarm rate is reduced from 5.3% for a single class to 2.9% for an eight-class model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-class Approach for Modelling

In this paper we present a multi-class extension to our approach for modelling out-of-vocabulary (OOV) words [1]. Instead of augmenting the word search space with a single OOV model, we add several OOV models, one for each class of words. We present two approaches for designing the OOV word classes. The first approach relies on using common part-of-speech tags. The second approach is a data-dri...

متن کامل

Modelling Out-of-Vocabulary Words for Robust Speech Recognition

This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthe...

متن کامل

Investigation on language modelling approaches for open vocabulary speech recognition

By definition, words that are not present in a recognition vocabulary are called out-of-vocabulary (OOV) words. Recognition of unseen or new words is an important feature that is always desired in any real-world large vocabulary continuous speech recognition (LVCSR) system. However, human languages are complex in nature due to wide varieties of morphological richness such as inflections, deriva...

متن کامل

Multi Class-based n-gram Language Model for New Words Using Web Data

Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. Language models (LMs) for most of open vocabulary ASR systems treat OOV words as one entity, ignoring the linguistic information. In this paper we pres...

متن کامل

Word Type Effects on L2 Word Retrieval and Learning: Homonym versus Synonym Vocabulary Instruction

The purpose of this study was twofold: (a) to assess the retention of two word types (synonyms and homonyms) in the short term memory, and (b) to investigate the effect of these word types on word learning by asking learners to learn their Persian meanings. A total of 73 Iranian language learners studying English translation participated in the study. For the first purpose, 36 freshmen from an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002