Analysis of Documents Clustering Using Sampled Agglomerative Technique
نویسندگان
چکیده
In this paper a clustering algorithm for documents is proposed that adapts a sampling-based pruning strategy to simplify hierarchical clustering. The algorithm can be applied to any text documents data set whose entries can be embedded in a high dimensional Euclidean space in which every document is a vector of real numbers. This paper presents the results of an experimental study of the proposed document clustering technique. The performance of the method is illustrated in terms of quality of clusters.
منابع مشابه
Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis
Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...
متن کاملA Visualization Approach to Automatic Text Documents Categorization Based on HAC
The ability to visualize documents into clusters is very essential. The best data summarization technique could be used to summarize data but a poor representation or visualization of it will be totally misleading. As proposed in many researches, clustering techniques are applied and the results are produced when documents are grouped in clusters. However, in some cases, user may want to know t...
متن کاملA Multi-Agent System for Distributed Cluster Analysis
One of the approaches used to improve the accuracy and relevancy in information retrieval is cluster analysis. Clustering methods determine relationships among text documents, and allow the determination of similar groups or clusters of documents. These methods are computationally expensive, thereby limiting their use to a relatively small set of documents. This paper describes a multi-agent sy...
متن کاملComparative Study on Context-Based Document Clustering
Clustering is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. Objects in the same cluster should be as similar as possible, whereas objects in one cluster should be as dissimilar as possible from objects in the other clusters. Document clustering has become an increasingly important task in analysing huge documents. The challenging aspect to analyse ...
متن کاملComparison of Hierarchical Agglomerative Algorithms for Clustering Medical Documents
Extensive amount of data stored in medical documents require developing methods that help users to find what they are looking for effectively by organizing large amounts of information into a small number of meaningful clusters. The produced clusters contain groups of objects which are more similar to each other than to the members of any other group. Thus, the aim of high-quality document clus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008