word clustering

نتایج جستجو برای: word clustering

تعداد نتایج: 205729 فیلتر نتایج به سال:

Discovering Word Senses from Text Using Random Indexing

2008

Niladri Chatterjee Shiwali Mohan

Random Indexing is a novel technique for dimensionality reduction while creating Word Space model from a given text. This paper explores the possible application of Random Indexing in discovering word senses from the text. The words appearing in the text are plotted onto a multi-dimensional Word Space using Random Indexing. The geometric distance between words is used as an indicative of their ...

متن کامل

Optimization of Word Sense Disambiguation Using Clustering in Weka

2012

Neetu Sharma

In the Natural Language Processing (NLP) community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word. These senses could be seen as the target labels of a classification problem. Clustering and classifica...

متن کامل

Triplet-Based Chinese Word Sense Induction

2010

Zhao Liu Xipeng Qiu Xuanjing Huang

This paper describes the implementation of our system at CLP 2010 bakeoff of Chinese word sense induction. We first extract the triplets for the target word in each sentence, then use the intersection of all related words of these triplets from the Internet. We use the related word to construct feature vectors for the sentence. At last we discriminate the word senses by clustering the sentences...

متن کامل

Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

2014

Thien Huu Nguyen Ralph Grishman

Relation extraction suffers from a performance loss when a model is applied to out-of-domain data. This has fostered the development of domain adaptation techniques for relation extraction. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems. We systematically explore various ways to apply word embeddings and show the best adaptation improve...

متن کامل

Speech recognition in car noise environments using multiple models according to noise masking levels

1998

Myung Gyu Song Hoi In Jung Kab-Jong Shim Hyung Soon Kim

In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In t...

متن کامل

Who spoke when? - automatic segmentation and clustering for determining speaker turns

1999

S. E. Johnson

The problem of labelling speaker turns by automatically segmenting and clustering a continuous audio stream is addressed. A new clustering scheme is presented and evaluated using a clustering e ciency score which treats both agglomerative and divisive clustering strategies equally. Results show an e ciency of 70% can be obtained on both manually and automatically derived segments on the 1996 Hu...

متن کامل

An information theoretic approach for using word cluster information in natural language call routing

2003

Li Li Feng Liu Wu Chou

In this paper, an information theoretic approach for using word clusters in natural language call routing (NLCR) is proposed. This approach utilizes an automatic word class clustering algorithm to generate word classes from the word based training corpus. In our approach, the information gain (IG) based term selection is used to combine both word term and word class information in NLCR. A joint...

متن کامل

Approaches for the Clustering of Geographic Metadata and the Automatic Detection of Quasi-Spatial Dataset Series

Journal: :ISPRS international journal of geo-information 2022

The discrete representation of resources in geospatial catalogues affects their information retrieval performance. performance could be improved by using automatically generated clusters related resources, which we name quasi-spatial dataset series. This work evaluates whether a clustering process can create series only textual from metadata elements. We assess the combination different kinds t...

متن کامل

The Intellectual Structure of Knowledge in the Field of Distance Education Using the Co-Word analyses

Journal: Future of Medical Education Journal 2018

Faramarz Soheili Hamid Maleki Mahmoud Ekrami, Somaye Rajabzade,

Background: Co- word analysis is one of the content analysis methods used in scientometric studies and mapping the scientific structure of various fields. The purpose of the present research is to map the structure of distance education using the co-word analysis. Methods: The research method is content analysis using co- word analysis. The research population are 31607 documents indexed in the...

متن کامل

Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics

Journal: :Bioinformatics 2016

Jie Ren Kai Song Minghua Deng Gesine Reinert Charles H. Cannon Fengzhu Sun

MOTIVATION Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this a...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید