Repeated Record Ordering for Constrained Size Clustering

نویسنده

چکیده مقاله:

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggregation technique, the algorithm has to divide the dataset into groups containing at least k members, where k is a user-defined parameter. The main application of microaggregation is in Statistical Disclosure Control (SDC) for privacy preserving data publishing. A microaggregation algorithm is qualified based on the sum of within-group squared error, SSE. Unfortunately, it is proved that the optimal microaggregation problem is NP-Hard in general, but the special case of univariate can be solved optimally in polynomial time. There exist many heuristics for the general case of the problem that are founded on the univariate case. These techniques have to order multivariate records in a sequence. This paper proposes a novel method for record ordering. Starting from a conventional clustering algorithm, the proposed method repeatedly puts multivariate records into a sequence and then clusters them again. The process is repeated until no improvement is achieved. Extensive experiments are carried out to confirm the effectiveness of the proposed method for different parameters and datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partitioning Complex Networks via Size-Constrained Clustering

The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferr...

متن کامل

Size constrained clustering problems in fixed dimension

Clustering or cluster analysis [1] is a classical method in unsupervised learning and one of the most used techniques in statistical data analysis. Clustering has a wide range of applications in many areas like pattern recognition, medical diagnostics, data mining, biology, market research and image analysis among others. A cluster is a set of data points that in some sense are similar to each ...

متن کامل

Constrained Ordering

We investigate the problem of finding a total order of a finite set that satisfies various local ordering constraints. Depending on the admitted constraints, we provide an efficient algorithm or prove NP-completeness. To this end, we define a reduction technique and discuss its properties.

متن کامل

Size-constrained 2-clustering in the plane with Manhattan distance

We present an algorithm for the 2-clustering problem with cluster size constraints in the plane assuming `1-norm, that works in O(n logn) time and O(n) space. Such a procedure also solves a full version of the problem, computing the optimal solutions for all possible constraints on cluster sizes. The algorithm is based on a separation result concerning the clusters of any optimal solution of th...

متن کامل

Evolving Variable-Ordering Heuristics for Constrained Optimisation

In this paper we present and evaluate an evolutionary approach for learning new constraint satisfaction algorithms, specifically for MAX-SAT optimisation problems. Our approach offers two significant advantages over existing methods: it allows the evolution of more complex combinations of heuristics, and; it can identify fruitful synergies among heuristics. Using four different classes of MAX-S...

متن کامل

Parallel Genetic Algorithms for Constrained Ordering Problems

This paper proposes two different parallel genetic algorithms (PGAs) for constrained ordering problems. Constrained ordering problems are constraint optimization problems (COPs) for which it is possible represent a candidate solution as a permutation of objects. A decoder is used to decode this permutation into an instantiafion of the COP vm-iables. Two examples of such constrmnsd ordering prob...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 33  شماره 7

صفحات  -

تاریخ انتشار 2020-07-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023