نتایج جستجو برای: word clustering

تعداد نتایج: 205729  

2010
Hua Xu Bing Liu Longhua Qian Guodong Zhou

Recent studies on word sense induction (WSI) mainly concentrate on European languages, Chinese word sense induction is becoming popular as it presents a new challenge to WSI. In this paper, we propose a feature-based approach using the spectral clustering algorithm to this problem. We also compare various clustering algorithms and similarity metrics. Experimental results show that our system ac...

2006
Jinxiu Chen Dong-Hong Ji Chew Lim Tan Zheng-Yu Niu

We present an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. It works by calculating eigenvectors of an adjacency graph’s Laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. This method can address ...

2004
Amruta Purandare Ted Pedersen

Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context represent...

2004
Amruta Purandare Ted Pedersen

SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured corpus, and clusters instances of a given target word based only on their mutual contextual similarities. It is a complete system that provides support for feature selection from large corpora, several ...

2010
Yuxiang Jia Shiwen Yu Zhengyan Chen

Word Sense Induction (WSI) is an important topic in natural langage processing area. For the bakeoff task Chinese Word Sense Induction (CWSI), this paper proposes two systems using basic clustering algorithms, k-means and agglomerative clustering. Experimental results show that k-means achieves a better performance. Based only on the data provided by the task organizers, the two systems get FSc...

2009
Marie Candito Benoît Crabbé

We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexiconaided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clustering...

Journal: :IJIRR 2012
Manjeet Rege Josan Koruthu Reynold Bailey

In text analytics (Srivastava & Sahami 2009), document clustering refers to the problem of automatically grouping documents into different groups (known as clusters), such that documents in one cluster are similar to each other while being dissimilar from the ones in a different cluster. Typically, the dataset is represented using the vector model in which a set of m documents with n unique wor...

2016
Leon Derczynski Sean Chester

Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really b...

2016
Jing Kong Alex Scott Georg M. Goerg

Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) wo...

2014
PARUHUM SILALAHI

Clustering techniques are often used to cluster grouping text documents. Modeling and graph-based representation of the document clustering process can be done by using algorithms Document Index Graph (DIG). This study aims to implement the DIG algorithm for designing the structure digraphs used for graphical representation of web document clustering process. The data used is the REUTERS-21578 ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید