Performance of Parallel K-Means Algorithms in Java
نویسندگان
چکیده
K-means is a well-known clustering algorithm often used for its simplicity and potential efficiency. Its properties limitations have been investigated by many works reported in the literature. K-means, though, suffers from computational problems when dealing with large datasets dimensions great number of clusters. Therefore, authors proposed experimented different techniques parallel execution K-means. This paper describes novel approach to which, today, based on commodity multicore machines shared memory. Two reference implementations Java are developed their performances compared. The first one structured according map/reduce schema that leverages built-in multi-threaded concurrency automatically provided streams. second one, allocated available cores, exploits programming model Theatre actor system, which control-based, totally lock-free, purposely relies threads as coarse-grain “programming-in-the-large” units. experimental results confirm some good performance can be achieved through implicit intuitive use However, better guaranteed modular implementation proves more adequate an exploitation resources.
منابع مشابه
Comparative Analysis of Parallel K Means and Parallel Fuzzy C Means Cluster Algorithms
In this paper, we give a short review of recent developments in clustering. Clustering is the process of grouping of data, where the grouping is established by finding similarities between data based on their characteristics. Such groups are termed as Clusters. Clustering is a procedure to organizing the objects into groups or clustered together, based on the principle of maximizing the intra-c...
متن کاملComparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملIrregular Parallel Algorithms in JAVA
The nested data-parallel programming model supports the design and implementation of irregular parallel algorithms. This paper describes work in progress to incorporate nested data parallelism into the object model of Java by developing a library of collection classes and adding a forall statement to the language. The collection classes provide parallel implementations of operations on the coll...
متن کاملPthread Parallel K-means
K-means is a popular non-hierarchical method for clustering large datasets. The time requirements increase linearly with the size of the data set which make it particulary suited for extremely large datasets such as those found in digital libraries. The method was developed by MacQueen [4] in 1967. In our project we take a uniprocessor k-means algorithm and implement a parallel k-means algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2022
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a15040117