Clustering Stream Data by Exploring the Evolution of Density Mountain
نویسندگان
چکیده
Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batchmode clustering, there are two key challenges in stream clustering: (i) Given that input data are changing continuously, how to incrementally update clustering results efficiently? (ii) Given that clusters continuously evolve with the evolution of data, how to capture the cluster evolution activities? Unfortunately, most of existing stream clustering algorithms can neither update the cluster result in real time nor track the evolution of clusters. In this paper, we propose an stream clustering algorithm EDMStream by exploring theEvolution of DensityMountain. The density mountain is used to abstract the data distribution, the changes of which indicate data distribution evolution. We track the evolution of clusters by monitoring the changes of density mountains. We further provide efficient data structures and filtering schemes to ensure the update of density mountains in real time, which makes online clustering possible. The experimental results on synthetic and real datasets show that, comparing to the state-of-the-art stream clustering algorithms, e.g., D-Stream, DenStream, DBSTREAM and MR-Stream, our algorithm can response to a cluster update much faster (say 7-15x faster than the best of the competitors) and at the same time achieve comparable cluster quality. Furthermore, EDMStream can successfully capture the cluster evolution activities.
منابع مشابه
Adaptive Stream Clustering Using Incremental Graph Maintenance
Challenges for clustering streaming data are getting continuously more sophisticated. This trend is driven by the the emerging requirements of the application where those algorithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence of outliers and th...
متن کاملEvolution-Based Clustering Technique for Data Streams with Uncertainty
The evolution-based stream clustering method supports the monitoring and change detection of clustering structures. This paper presented HUE-Stream which extends E-Stream and E-Stream++ by introducing a distance function, cluster representation and histogram management for the different types of clustering structure evolution. Compared with UMicro and LuMicro, HUE-Stream produces higher cluster...
متن کاملLeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream
Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream
متن کاملMuDi-Stream: A multi density clustering algorithm for evolving data stream
Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new metho...
متن کاملImproved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 11 شماره
صفحات -
تاریخ انتشار 2017