word clustering

Modeling Word Senses With Fuzzy Clustering

2003

Erik Velldal

This thesis describes a clustering approach to automatically inferring soft semantic classes and characterizing senses of a set of Norwegian nouns. The words are represented by way of their distribution in text, identified as local contexts in the form of lexical-syntactic relations. Through a shallow processing step the context features are extracted for lemmatized word forms in syntactically ...

متن کامل

Constrained Text Clustering Using Word Trigrams

2012

M. Eduardo Ares Álvaro Barreiro

In recent years there has emerged the field of Constrained Clustering, which proposes clustering algorithms which are able to accommodate domain information to obtain a better final grouping. This information is usually provided as pairwise constraints, whose acquisition from humans can be costly. In this paper we propose a novel method based on word n-grams to automatically extract positive co...

متن کامل

New Word Vector Representation for Semantic Clustering

Journal: :TAL 2009

Salma Jamoussi

RÉSUMÉ. L’idée que nous défendons dans cet article est qu’il est possible d’obtenir des concepts sémantiques significatifs par des méthodes de classification automatique. Pour ce faire, nous commençons par proposer des mesures permettant de quantifier les relations sémantiques entre mots. Ensuite, nous utilisons les méthodes de classification non supervisée pour construire les concepts d’une ma...

متن کامل

Clustering multilingual documents by estimating text - to - text semantic relatedness

2010

Dani Yogatama

This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...

متن کامل

Bilingual Word Spectral Clustering for Statistical Machine Translation

2005

Bing Zhao Eric P. Xing Alexander H. Waibel

In this paper, a variant of a spectral clustering algorithm is proposed for bilingual word clustering. The proposed algorithm generates the two sets of clusters for both languages efficiently with high semantic correlation within monolingual clusters, and high translation quality across the clusters between two languages. Each cluster level translation is considered as a bilingual concept, whic...

متن کامل

Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering

Journal: :IJPRAI 2015

Guoyu Tang Yunqing Xia Erik Cambria Peng Jin Thomas Fang Zheng

Cross-lingual document clustering is the task of automatically organizing a large collection of multi-lingual documents into a few clusters, depending on their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To this end, we propose to represent cross-lingual documents through statistical wor...

متن کامل

Analysing the Semantic Change Based on Word Embedding

2016

Xuanyi Liao Guang Cheng

This paper intend to present an approach to analyse the change of word meaning based on word embedding, which is a more general method to quantize words than before. Through analysing the similar words and clustering in different period, semantic change could be detected. We analysed the trend of semantic change through density clustering method called DBSCAN. Statics and data visualization is ...

متن کامل

Interactive Clustering Techniques for Selecting Speaker-Independent Reference Templates for Isolated Word Recognition

2002

JAY G. WILPON

It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition. We describe a set of clustering techniques specifically designed for this purpose. These interactive procedures identify coarse structure, fine structure, overlap of, and outliers from clusters. The techniques have been applied t a large speech data base consisti...

متن کامل

Word Clustering Based on Un-LP Algorithm

2014

Jiguang Liang Xiaofei Zhou Yue Hu Li Guo Shuo Bai

Word clustering which generalizes specific features cluster words in the same syntactic or semantic categories into a group. It is an effective approach to reduce feature dimensionality and feature sparseness which are clearly useful for many NLP applications. This paper proposes an unsupervised label propagation algorithm (Un-LP) for word clustering which uses multi-exemplars to represent a cl...

متن کامل

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation

2010

Md. Akmal Haidar Douglas D. O'Shaughnessy

A new approach for computing weights of topic models in language model (LM) adaptation is introduced. We formed topic clusters by a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The new weighting idea is that the unigram count of the topic generated by hard-clus...

متن کامل