word clustering

نتایج جستجو برای: word clustering

تعداد نتایج: 205729 فیلتر نتایج به سال:

Soochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010

2010

Hua Xu Bing Liu Longhua Qian Guodong Zhou

Recent studies on word sense induction (WSI) mainly concentrate on European languages, Chinese word sense induction is becoming popular as it presents a new challenge to WSI. In this paper, we propose a feature-based approach using the spectral clustering algorithm to this problem. We also compare various clustering algorithms and similarity metrics. Experimental results show that our system ac...

متن کامل

Unsupervised Relation Disambiguation with Order Identification Capabilities

2006

Jinxiu Chen Dong-Hong Ji Chew Lim Tan Zheng-Yu Niu

We present an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. It works by calculating eigenvectors of an adjacency graph’s Laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. This method can address ...

متن کامل

Discriminating Among Word Meanings by Identifying Similar Contexts

2004

Amruta Purandare Ted Pedersen

Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context represent...

متن کامل

SenseClusters - Finding Clusters that Represent Word Senses

2004

Amruta Purandare Ted Pedersen

SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured corpus, and clusters instances of a given target word based only on their mutual contextual similarities. It is a complete system that provides support for feature selection from large corpora, several ...

متن کامل

Chinese Word Sense Induction with Basic Clustering Algorithms

2010

Yuxiang Jia Shiwen Yu Zhengyan Chen

Word Sense Induction (WSI) is an important topic in natural langage processing area. For the bakeoff task Chinese Word Sense Induction (CWSI), this paper proposes two systems using basic clustering algorithms, k-means and agglomerative clustering. Experimental results show that k-means achieves a better performance. Based only on the data provided by the task organizers, the two systems get FSc...

متن کامل

Improving generative statistical parsing with semi-supervised word clustering

2009

Marie Candito Benoît Crabbé

We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexiconaided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clustering...

متن کامل

On Knowledge-Enhanced Document Clustering

Journal: :IJIRR 2012

Manjeet Rege Josan Koruthu Reynold Bailey

In text analytics (Srivastava & Sahami 2009), document clustering refers to the problem of automatically grouping documents into different groups (known as clusters), such that documents in one cluster are similar to each other while being dissimilar from the ones in a different cluster. Typically, the dataset is represented using the vector model in which a set of m documents with n unique wor...

متن کامل

Generalised Brown Clustering and Roll-Up Feature Generation

2016

Leon Derczynski Sean Chester

Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really b...

متن کامل

Improving semantic topic clustering for search queries with word co-occurrence and bigraph co-clustering

2016

Jing Kong Alex Scott Georg M. Goerg

Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) wo...

متن کامل

Web Document Clustering through Metafile Generation for Digraph Structure Using Document Index Graph

2014

PARUHUM SILALAHI

Clustering techniques are often used to cluster grouping text documents. Modeling and graph-based representation of the document clustering process can be done by using algorithms Document Index Graph (DIG). This study aims to implement the DIG algorithm for designing the structure digraphs used for graphical representation of web document clustering process. The data used is the REUTERS-21578 ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید