word clustering

نتایج جستجو برای: word clustering

تعداد نتایج: 205729 فیلتر نتایج به سال:

Sense-Level Semantic Clustering of Hashtags in Social Media

2016

Ali Javed Byung Suk Lee

We enhance the accuracy of the currently available semantic hashtag clustering method, which leverages hashtag semantics extracted from dictionaries such as Wordnet and Wikipedia. While immune to the uncontrolled and often sparse usage of hashtags, the current method distinguishes hashtag semantics only at the word level. Unfortunately, a word can have multiple senses representing the exact sem...

متن کامل

Enhanced Word Classing for Recurrent Neural Network Language

2013

Yujing Si Zhen Zhang Ta Li Jielin Pan Yonghong Yan

Recurrent Neural Network Language Model (RNNLM) has recently been shown to outperform conventional N-gram LM as well as many other competing advanced language model techniques. However, the computation complexity of RNNLM is much higher than the conventional N-gram LM. As a result, the Class-based RNNLM (CRNNLM) is usually employed to speed up both the training and testing phase of RNNLM. In pr...

متن کامل

Clustering WordNet word senses

2003

Eneko Agirre Oier Lopez de Lacalle

This paper presents the results of a set of methods to cluster WordNet word senses. The methods rely on different information sources: confusion matrixes from Senseval-2 Word Sense Disambiguation systems, translation similarities, hand-tagged examples of the target word senses and examples obtained automatically from the web for the target word senses. The clustering results have been evaluated...

متن کامل

Multi-class composite N-gram based on connection direction

1999

Hirofumi Yamamoto Yoshinori Sagisaka

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word ...

متن کامل

TC-DWA:Text Clustering with Dual Word-Level Augmentation

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

The pre-trained language models, e.g., ELMo and BERT, have recently achieved promising performance improvement in a wide range of NLP tasks, because they can output strong contextualized embedded features words. Inspired by their great success, this paper we target at fine-tuning them to effectively handle the text clustering task, i.e., classic fundamental challenge machine learning. According...

متن کامل

On Clustering Algorithms: Applications in Word-Embedding Documents

Journal: :Journal of Computers 2019

متن کامل

Features of Distributional Method for Indonesian Word Clustering

Journal: :Jurnal Edukasi dan Penelitian Informatika (JEPIN) 2019

متن کامل

Hierarchical Latent Word Clustering

Journal: :CoRR 2016

Halid Ziya Yerebakan Fitsum A. Reda Yiqiang Zhan Yoshihisa Shinagawa

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Latent Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology repor...

متن کامل

Scaling Up Word Clustering

2016

Jon Dehdari Liling Tan Josef van Genabith

Word clusters improve performance in many NLP tasks including training neural network language models, but current increases in datasets are outpacing the ability of word clusterers to handle them. In this paper we present a novel bidirectional, interpolated, refining, and alternating (BIRA) predictive exchange algorithm and introduce ClusterCat, a clusterer based on this algorithm. We show tha...

متن کامل

Language Model Based on Word Clustering

2006

Lichi Yuan

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. The...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید