text clustering

نتایج جستجو برای: text clustering

تعداد نتایج: 264479 فیلتر نتایج به سال:

Clustering for Photo Retrieval at Image CLEF

2008

Diana Inkpen Marc Stogaitis François DeGuire Muath Alzghool

This paper presents the first participation of the University of Ottawa group in the Photo Retrieval task at Image CLEF 2008. Our system uses Lucene for text indexing and LIRE for image indexing. We experiment with several clustering methods in order to retrieve images from diverse clusters. The clustering methods are: k-means clustering, hierarchical clustering, and our own method based on Wor...

متن کامل

Textmining and Organization in Large Corpus

2005

Wei Ning Jan Larsen

Nowadays a common size of document corpus might have more than 5000 documents. It is almost impossible for a reader to read thought all documents within the corpus and find out relative information in a couple of minutes. In this master thesis project we propose text clustering as a potential solution to organizing large document corpus. As a sub-field of data mining, text mining is to discover...

متن کامل

Semantic smoothing for text clustering

Journal: :Knowl.-Based Syst. 2013

Jamal Abdul Nasir Iraklis Varlamis Asim Karim George Tsatsaronis

In this paper we present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clust...

متن کامل

Text Clustering with Random Indexing

2009

Magnus Rosell Stefan Arnborg

This project explored how the language technology method Random Indexing can be used for clustering of texts from Swedish newspapers. The resulting Random Indexing based representation yields similar results as an ordinary representation when the number of clusters matches the real categories. With an increased number of clusters the Random Index based representation yields better results than ...

متن کامل

WordNet-Based Text Document Clustering

2004

Julian Sedding Dimitar Kazakov

Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include ambiguity and synonymy, the former allowing for erroneous groupings and the latter causing similarities between documents t...

متن کامل

Clustering Concept Hierarchies from Text

2004

Philipp Cimiano Andreas Hotho Steffen Staab

Abstract We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. F...

متن کامل

Text clustering for topic detection

2004

Young-Woo Seo Katia Sycara

The world wide web represents vast stores of information. However, the sheer amount of such information makes it practically impossible for any human user to be aware of much of it. Therefore, it would be very helpful to have a system that automatically discovers relevant, yet previously unknown information, and reports it to users in human-readable form. As the first attempt to accomplish such...

متن کامل

String Vector based AHC as Approach to Word Clustering

2016

Taeho Jo

In this research, we propose the string vector based AHC (Agglomerative Hierarchical Clustering) algorithm as the approach to the word clustering. In the previous works on text clustering, it was successful to encode texts into string vectors by improving the performance of text clustering; it provided the motivation of doing this research. In this research, we encode words into string vectors,...

متن کامل

Optimal Clustering Scheme For Repeated Bisection Partitional Algorithm

2013

Kalyani Desikan Hannah Grace

Text clustering divides a set of texts into clusters such that texts within each cluster are similar in content. It may be used to uncover the structure and content of unknown text sets as well as to give new perspectives on familiar ones. The focus of this paper is to experimentally evaluate the quality of clusters obtained using partitional clustering algorithms that employ different clusteri...

متن کامل

Fuzzy Binarization and Segmentation of Text Images for Opcr

1997

Mario I. Chacon

This paper presents the results of a new approach for binarization of text images. The new technique uses the fuzzy C-means clustering algorithm to simulate the clustering performed by the human visual system. The clustering process was applied to eleven text images. All of them have black text but the background change in color. As a mean of comparison, binarization using the image histogram w...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید