word clustering

نتایج جستجو برای: word clustering

تعداد نتایج: 205729 فیلتر نتایج به سال:

BIRA: Improved Predictive Exchange Word Clustering

2016

Jon Dehdari Liling Tan Josef van Genabith

Word clusters are useful for many NLP tasks including training neural network language models, but current increases in datasets are outpacing the ability of word clusterers to handle them. Little attention has been paid thus far on inducing high-quality word clusters at a large scale. The predictive exchange algorithm is quite scalable, but sometimes does not provide as good perplexity as othe...

متن کامل

Word Sense Induction Disambiguation Using Hierarchical Random Graphs

2010

Ioannis P. Klapaftis Suresh Manandhar

Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchi...

متن کامل

Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

2017

Flavio Massimiliano Cecchini Christian Biemann Martin Riedl

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarseg...

متن کامل

Word Clustering and Disambiguation Based on Co-occurrence Data1

2002

Hang Li

We address the problem of clustering words (or constructing a thesaurus) based on cooccurrence data, and conducting syntactic disambiguation by using the acquired word classes. We view the clustering problem as that of estimating a class-based probability distribution specifying the joint probabilities of word pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL...

متن کامل

Exploring Word Embeddings and Character N-Grams for Author Clustering

2016

Yunita Sari Mark Stevenson

We presented our system for PAN 2016 Author Clustering task. Our software used simple character n-grams to represent the document collection. We then ran K-Means clustering optimized using the Silhouette Coefficient. Our system yields competitive results and required only a short runtime. Character n-grams can capture a wide range of information, making them effective for authorship attribution...

متن کامل

Word Clustering for Collocation-Based Word Sense Disambiguation

2007

Peng Jin Xu Sun Yunfang Wu Shiwen Yu

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which i...

متن کامل

Word Sense Disambiguation Based on Word Sense Clustering

2006

Henry Anaya-Sánchez Aurora Pons-Porrata Rafael Berlanga Llavori

متن کامل

Applying Spectral Clustering for Chinese Word Sense Induction

2010

Zhengyan He Yang Song Houfeng Wang

Sense Induction is the process of identifying the word sense given its context, often treated as a clustering task. This paper explores the use of spectral cluster method which incorporates word features and ngram features to determine which cluster the word belongs to, each cluster represents one sense in the given document set.

متن کامل

On efficient training of word classes and their application to recurrent neural network language models

2015

Rami Botros Kazuki Irie Martin Sundermeyer Hermann Ney

In this paper, we investigated various word clustering methods, by studying two clustering algorithms: Brown clustering and exchange algorithm, and three objective functions derived from different class-based language models (CBLM): two-sided, predictive and conditional models. In particular, we focused on the implementation of the exchange algorithm with improved speed. In total, we compared s...

متن کامل

ISCAS: A System for Chinese Word Sense Induction Based on K-means Algorithm

2010

Zhenzhong Zhang Le Sun Wenbo Li

This paper presents an unsupervised method for automatic Chinese word sense induction. The algorithm is based on clustering the similar words according to the contexts in which they occur. First, the target word which needs to be disambiguated is represented as the vector of its contexts. Then, reconstruct the matrix constituted by the vectors of target words through singular value decompositio...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید