word clustering

Multilingual Word Sense Induction to Improve Web Search Result Clustering

2015

Lorenzo Albano Domenico Beneventano Sonia Bergamaschi

In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically inducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innov...

متن کامل

Hierarchical clustering of word class distributions

2012

Grzegorz Chrupała

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Hierarchical clustering of word class distributions

2012

Grzegorz Chrupala

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Semantic Word Clusters Using Signed Spectral Clustering

2017

João Sedoc Jean Gallier Dean P. Foster Lyle H. Ungar

Vector space representations of words capture many aspects of word similarity, but such methods tend to produce vector spaces in which antonyms (as well as synonyms) are close to each other. For spectral clustering using such word embeddings, words are points in a vector space where synonyms are linked with positive weights, while antonyms are linked with negative weights. We present a new sign...

متن کامل

Update Legal Documents Using Hierarchical Ranking Models and Word Clustering

2010

Minh Quang Nhat Pham Minh Le Nguyen Akira Shimazu

Our research addresses the task of updating legal documents when new information emerges. In this paper, we employ a hierarchical ranking model to the task of updating legal documents. Word clustering features are incorporated to the ranking models to exploit semantic relations between words. Experimental results on legal data built from the United States Code show that the hierarchical ranking...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

ژورنال: پردازش علائم و داده ها 2019

رحیمی, مرضیه, زاهدی, مرتضی, مشایخی, هدی,

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A modified K-means clustering algorithm for use in isolated work recognition

Journal: :IEEE Trans. Acoustics, Speech, and Signal Processing 1985

Jay G. Wilpon Lawrence R. Rabiner

Studies of isolated word recognition systems have shown that a set of carefully chosen templates can be used to bring the performance of speaker-independent systems up to that of systems trained to the individual speaker. The earliest work in this area used a sophisticated set of pattern recognition algorithms in a human-interactive mode to create the set of templates (multiple patterns) for ea...

متن کامل

Constrained Coclustering for Textual Documents

2010

Yangqiu Song Shimei Pan Shixia Liu Furu Wei Michelle X. Zhou Weihong Qian

In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrain...

متن کامل

developing the persian version of the homophone meaning generation test

Journal: :medical journal of islamic republic of iran 0

mona ebrahimipour ebrahimipour department of speech therapy, school of rehabilitation, iran university of medical sciences, tehran, iran. mohammad reza motamed department of neurology, iran university of medical sciences, tehran, iran. hassan ashayeri department of basic sciences in rehabilitation, school of rehabilitation, iran university of medical sciences, tehran, iran. yahya modarresi department of linguistics, human sciences and cultural education institute, tehran, iran. mohammad kamali department of basic sciences in rehabilitation, iran university of medical sciences, school of rehabilitation sciences, tehran, iran.

background: finding the right word is a necessity in communication, and its evaluation has always been a challenging clinical issue, suggesting the need for valid and reliable measurements. the homophone meaning generation test (hmgt) can measure the ability to switch between verbal concepts, which is required in word retrieval. the purpose of this study was to adapt and validate the persian ve...

متن کامل

Clustering of Polysemic Words

2006

Laurent Cicurel Stephan Bloehdorn Philipp Cimiano

In this paper, we propose an approach for constructing clusters of related terms that may be used for deriving formal conceptual structures in a later stage. In contrast to previous approaches in this direction, we explicitly take into account the fact that words can have different, possibly even unrelated, meanings. To account for such ambiguities in word meaning, we consider two alternative s...

متن کامل