Density Peaks Clustering with Differential Privacy
نویسندگان
چکیده
Density peaks clustering (DPC) is a latest and well-known density-based clustering algorithm which offers advantages for finding clusters of arbitrary shapes compared to others algorithm. However, the attacker can deduce sensitive points from the known point when the cluster centers and sizes are exactly released in the cluster analysis. To the best of our knowledge, this is the first time that privacy protection has been applied to DPC. In this paper, we provide density peaks clustering privacy protection(DPCP) model to obtain the clustering results without revealing the data via differential privacy protection, in which the privacy protection is achieved by add Laplace noise to local density ρ and distance δ. However, the computation complexity will reaches O(n) and have an inaccurate clustering results when adding noise to the data set directly. Therefore, we are inspired by the idea of divide and conquer algorithm. Firstly, we divide the data set into relatively independent groups by Voronoi diagram and then adding noises. We employ a parallel computing by MapReduce to improve the efficiency. Secondly, according to the principle that is the privacy budget can be superimposed in high dimensional data. We introduces 1+ 2-differential privacy protection model and ensure the accuracy of the calculation via data replication and filter. Where 1 and 2 to protect ρ and δ respectively. Finally, through a lot of experiments, we also provide performance analysis and privacy proof of our solution. CCS Concepts •Security and privacy → Domain-specific security and privacy architectures;
منابع مشابه
Preserving Privacy for Interesting Location Pattern Mining from Trajectory Data
One main concern for individuals participating in the data collection of personal location history records (i.e., trajectories) is the disclosure of their location and related information when a user queries for statistical or pattern mining results such as frequent locations derived from these records. In this paper, we investigate how one can achieve the privacy goal that the inclusion of his...
متن کاملPrivacy-Integrated Graph Clustering Through Differential Privacy
Data mining tasks like graph clustering can automatically process a large amount of data and retrieve valuable information. However, publishing such graph clustering results also involves privacy risks. In particular, linking the result with available background knowledge can disclose private information of the data set. The strong privacy guarantees of the differential privacy model allow copi...
متن کاملBangA: An Efficient and Flexible Generalization-Based Algorithm for Privacy Preserving Data Publication
Privacy-Preserving Data Publishing (PPDP) has become a critical issue for companies and organizations that would release their data. k-Anonymization was proposed as a first generalization model to guarantee against identity disclosure of individual records in a data set. Point access methods (PAMs) are not well studied for the problem of data anonymization. In this article, we propose yet anoth...
متن کاملDFC: Density Fragment Clustering without Peaks
The density peaks clustering (DPC) algorithm is a novel density-based clustering approach. Outliers can be spotted and excluded automatically, and clusters can be found regardless of the shape and of dimensionality of the space in which they are embedded. However, it still has problems when processing a complex data set with irregular shapes and varying densities to get a good clustering result...
متن کاملA Link Density Clustering Algorithm based on Automatically Selecting Density Peaks For Overlapping Community Detection
In this paper, we proposed a link density clustering method for overlapping community detection based on density peaks. We firstly use an extended cosine link distance metric to reflect the relationship of links. Then we introduce a clustering algorithm with fast search for solving the link clustering problem by density peaks with box plot strategy to determine the cluster centres automatically...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017