نتایج جستجو برای: word clustering

تعداد نتایج: 205729  

2010
Andrew Skabar Khaled Abdalgader

Measuring similarity between sentences plays an important role in textual applications such as document summarization and question answering. While various sentence similarity measures have recently been proposed, these measures typically only take into account word importance by virtue of inverse document frequency (IDF) weighting. IDF values are based on global information compiled over a lar...

2015
Ines Turki Khemakhem Salma Jamoussi Abdelmajid Ben Hamadou

Clustering words is a widely used technique in statistical natural language processing. It requires syntactic, semantic, and contextual features. Especially, semantic clustering is gaining a lot of interest. It consists in grouping a set of words expressing the same idea or sharing the same semantic properties. In this paper, we present a new method to integrate semantic classes in a Statistica...

2006
Yehang Zhu Guanzhong Dai Benjamin C. M. Fung Dejun Mu

This paper presents a new document clustering method based on frequent co-occurring words. We first employ the Singular Value Decomposition, and then group the words into clusters called word representatives as substitution of the corresponding words in the original documents. Next, we extract the frequent word representative sets by Apriori. Subsequently, each document is designated to a basic...

2011
Valentin I. Spitkovsky Hiyan Alshawi Angel X. Chang Daniel Jurafsky

We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requir...

2004
Ronald N. Kostoff Ronald Kostoff Joel Block

The presence of trivial words in text databases can impact record or concept (words/ phrases) clustering adversely. Additionally, the determination of whether a word/ phrase is trivial is context-dependent. The objective of the present paper is to demonstrate a context-dependent trivial word filter to improve clustering quality. Factor analysis was used as a context-dependent trivial word filte...

2015
Lihua Xue Guiping Zhang Qiaoli Zhou Na Ye

Since the professional technical literature include amounts of complex noun phrases, identifying those phrases has an important practical value for such tasks as machine translation. Through analysis of those phrases in Chinese-English bilingual sentence pairs from the aircraft technical publications, we present an annotation specification based on the existing specification to label those phra...

Journal: :JASIS 1992
James A. Thom Justin Zobel

It is common to model the distribution of words in text by measures such as the Poisson approximation. However, these measures ignore effects such as clustering: our analysis of document collections demonstrates that the Poisson approximation can significantly overestimate the probability that a document contains a word. Based on our analysis, we propose a new model for distribution of words in...

2010
Tynan Smith

The Restaurant Game is part of a project to develop an AI system that can play a video game with a human or another AI just by using annotated recordings of humans playing the game as examples. The Restaurant Game is a simple two-player restaurant simulation in which character are instructed to act out a typical interaction between a customer and a waitress. We have collected about 10,000 recor...

2016
Anne Cocos Chris Callison-Burch

Automatically generated databases of English paraphrases have the drawback that they return a single list of paraphrases for an input word or phrase. This means that all senses of polysemous words are grouped together, unlike WordNet which partitions different senses into separate synsets. We present a new method for clustering paraphrases by word sense, and apply it to the Paraphrase Database ...

2006
Alan L. Ritter James W. Hearne Philip A. Nelson

We discuss various methods which have been applied to grouping words into syntactic and semantic categories, primarily how they deal with the problems of sparsity and computational complexity. We then present a method of distributional clustering, and discuss the parallelization of the most computationally intensive part of this process.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید