نتایج جستجو برای: text clustering

تعداد نتایج: 264479  

2008
Diana Inkpen Marc Stogaitis François DeGuire Muath Alzghool

This paper presents the first participation of the University of Ottawa group in the Photo Retrieval task at Image CLEF 2008. Our system uses Lucene for text indexing and LIRE for image indexing. We experiment with several clustering methods in order to retrieve images from diverse clusters. The clustering methods are: k-means clustering, hierarchical clustering, and our own method based on Wor...

2005
Wei Ning Jan Larsen

Nowadays a common size of document corpus might have more than 5000 documents. It is almost impossible for a reader to read thought all documents within the corpus and find out relative information in a couple of minutes. In this master thesis project we propose text clustering as a potential solution to organizing large document corpus. As a sub-field of data mining, text mining is to discover...

Journal: :Knowl.-Based Syst. 2013
Jamal Abdul Nasir Iraklis Varlamis Asim Karim George Tsatsaronis

In this paper we present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clust...

2009
Magnus Rosell Stefan Arnborg

This project explored how the language technology method Random Indexing can be used for clustering of texts from Swedish newspapers. The resulting Random Indexing based representation yields similar results as an ordinary representation when the number of clusters matches the real categories. With an increased number of clusters the Random Index based representation yields better results than ...

2004
Julian Sedding Dimitar Kazakov

Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include ambiguity and synonymy, the former allowing for erroneous groupings and the latter causing similarities between documents t...

2004
Philipp Cimiano Andreas Hotho Steffen Staab

Abstract We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. F...

2004
Young-Woo Seo Katia Sycara

The world wide web represents vast stores of information. However, the sheer amount of such information makes it practically impossible for any human user to be aware of much of it. Therefore, it would be very helpful to have a system that automatically discovers relevant, yet previously unknown information, and reports it to users in human-readable form. As the first attempt to accomplish such...

2016
Taeho Jo

In this research, we propose the string vector based AHC (Agglomerative Hierarchical Clustering) algorithm as the approach to the word clustering. In the previous works on text clustering, it was successful to encode texts into string vectors by improving the performance of text clustering; it provided the motivation of doing this research. In this research, we encode words into string vectors,...

2013
Kalyani Desikan Hannah Grace

Text clustering divides a set of texts into clusters such that texts within each cluster are similar in content. It may be used to uncover the structure and content of unknown text sets as well as to give new perspectives on familiar ones. The focus of this paper is to experimentally evaluate the quality of clusters obtained using partitional clustering algorithms that employ different clusteri...

1997
Mario I. Chacon

This paper presents the results of a new approach for binarization of text images. The new technique uses the fuzzy C-means clustering algorithm to simulate the clustering performed by the human visual system. The clustering process was applied to eleven text images. All of them have black text but the background change in color. As a mean of comparison, binarization using the image histogram w...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید