word clustering

MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS

Journal: :Radio Electronics, Computer Science, Control 2020

Text mining without document context

Journal: :Inf. Process. Manage. 2006

Eric SanJuan Fidelia Ibekwe-Sanjuan

We consider a challenging clustering task: the clustering of multi-word terms without document co-occurrence information in order to form coherent groups of topics. For this task, we developed a methodology taking as input multi-word terms and lexico-syntactic relations between them. Our clustering algorithm, named CPCL is implemented in the TermWatch system. We compared CPCL to other existing ...

متن کامل

Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction

2012

Roberto Navigli

Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issu...

متن کامل

A Practical Solution to the Problem of Automatic Word Sense Induction

2004

Reinhard Rapp

Recent studies in word sense induction are based on clustering global co-occurrence vectors, i.e. vectors that reflect the overall behavior of a word in a corpus. If a word is semantically ambiguous, this means that these vectors are mixtures of all its senses. Inducing a word’s senses therefore involves the difficult problem of recovering the sense vectors from the mixtures. In this paper we a...

متن کامل

Offline Language-free Writer Identification based on Speeded-up Robust Features

Journal: International Journal of Engineering 2015

manoj sharma, Vijaypal Dhaka,

This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...

متن کامل

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

2015

Tao Chen Ruifeng Xu Yulan He Xuan Wang

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

assessment of deep word knowledge in elementary and advanced iranian efl learners: a comparison of selective and productive wat tasks

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه ارومیه - پژوهشکده ادبیات 1393

صدیقه جلیلی اقدم, karim sadeghi,

testing plays a vital role in any language teaching program. it allows teachers and stakeholders, including program administrators, parents, admissions officers and prospective employers to be assured that the learners are progressing according to an accepted standard (douglas, 2010). the problems currently facing language testers have both practical and theoretical implications but the first i...

CS 224 N : Natural Language Processing

2005

David Kale

The objective of this project is to analyze the performance of a class-based language model and compare it to the performance of traditional n-gram language models. Class-based language models are well-studied, as is the use of clustering to learn classes of words. However, it seems fairly standard across the literature to use hard-clustering i.e. assign each word to a single class and then to ...

متن کامل

A Dynamic Programming Approach To Document Clustering Based On Term Sequence Alignment

2013

Muhammad Rafi Mohammad Shahid Shaikh

Document clustering is unsupervised machine learning technique that, when provided with a large document corpus, automatically sub-divides it into meaningful smaller sub-collections called clusters. Currently, document clustering algorithms use sequence of words (terms) to compactly represent documents and define a similarity function based on the sequences. We believe that the word sequence is...

متن کامل

Using Sense Clustering for the Disambiguation of Words (pp. 23-28)

Journal: :Polibits 2009

Henry Anaya-Sánchez Aurora Pons-Porrata Rafael Berlanga Llavori

Clustering methods have been extensively used in the solution of many Information Processing tasks in order to capture unknown object categories. This paper presents an approach to Word Sense Disambiguation based on clustering. The underlying idea is that the clustering of word senses provides a useful way to discover semantically related senses. We evaluate our proposal regarding both fineand ...

متن کامل