word class

An information theoretic approach for using word cluster information in natural language call routing

2003

Li Li Feng Liu Wu Chou

In this paper, an information theoretic approach for using word clusters in natural language call routing (NLCR) is proposed. This approach utilizes an automatic word class clustering algorithm to generate word classes from the word based training corpus. In our approach, the information gain (IG) based term selection is used to combine both word term and word class information in NLCR. A joint...

متن کامل

Class-based language model adaptation using mixtures of word-class weights

2000

Gareth Moore Steve J. Young

This paper describes the use of a weighted mixture of classbased n-gram language models to perform topic adaptation. By using a fixed class n-gram history and variable word-given-class probabilities we obtain large improvements in the performance of the class-based language model, giving it similar accuracy to a word n-gram model, and an associated small but statistically significant improvemen...

متن کامل

Features for factored language models for code-Switching speech

2014

Heike Adel Katrin Kirchhoff Dominic Telaar Ngoc Thang Vu Tim Schlippe Tanja Schultz

This paper presents investigations of features which can be used to predict Code-Switching speech. For this task, factored language models are applied and implemented into a state-of-the-art decoder. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and open class word clusters are explored. We find that Brown word clusters, part-of-speech tag...

متن کامل

Hierarchical clustering of word class distributions

2012

Grzegorz Chrupała

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Hierarchical clustering of word class distributions

2012

Grzegorz Chrupala

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Nomen est omen: Investigating the dominance of nouns in word comprehension with eye movement analyses.

2009

Marco R. Furtner John F. Rauthmann Pierre Sachse

Although nouns are easily learned in early stages of lexical development, their role in adult word and text comprehension remains unexplored thus far. To investigate the role of different word classes (open-class words: nouns, adjectives, verbs; closed-class words: pronouns, prepositions, conjunctions, etc.), 141 participants read a transposed German text while recording eye movements. Subseque...

متن کامل

Word Class Functions for Syntactic-Semantic Analysis

1997

Hermann Helbig Sven Hartrumpf

Appeared in Proceedings of the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP’97), pp. 312–317, 1997. In this paper, Analysis with Word Class Functions (WCFA) is presented as a paradigm for syntactic-semantic analysis of natural language. The main characteristics of this approach are: word-orientation, the central role of word class functions, two phases o...

متن کامل

Class-specific Word Embedding through Linear Compositionality

2017

Sicong Kuang Brian D. Davison

English linguist John Rupert Firth has a famous saying “you shall know a word by the company it keeps.” Most word representation learning models are based on this assumption that a word’s semantic meaning can be learned from the context in which it resides. The context is defined as a small unordered number of words surrounding the target word. Research has shown that context alone provides lim...

متن کامل

Word class driven synthesis of prosodic annotations

1996

Simon Arnfield

Prosody is an important aspect of speech that current text to speech synthesis systems fail to mimic in a convincing or natural way[1, 2, 3, 4]. This paper describes research on a partial system for prosodic synthesis using easily derived low level syntactic information. A computer program has been developed that can annotate unseen text with prosodic stress and tone marks using the sequence of...

متن کامل

A Class-based Approach to Word Alignment

Journal: :Computational Linguistics 1997

Sue J. Ker Jason S. Chang

This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. However, words that are less frequent or exhibit diverse translations generally do not have statistical...

متن کامل