COLARM: Cost-based Optimization for Localized Association Rule Mining
نویسندگان
چکیده
Association rule mining typically focuses on discovering global rules valid across the entire dataset. Yet local rules valid for subsets of the dataset, while significantly different from global rules, are often also of tremendous importance to analysts. In this work, we tackle this overlooked problem of online mining of localized association rules. We provide support for analysts to interactively mine rules that are hidden in a global context yet are locally significant. To tackle this problem we design a compact multidimensional itemset-based data partitioning (MIP-index). MIP-index offers efficient mining performance by utilizing precomputed results, while still allowing the user the flexibility of selecting any data subset of interest at run-time. We design a suite of alternative execution strategies for processing such localized mining requests. Optimization principles such as selection push-up, supported R-tree filter and differential treatment of contained and partially overlapped MIPs are proposed. We analytically and experimentally demonstrate that different execution strategies are effective for different query scenarios. Given a localized mining query, our COLARM query optimizer takes a cost-based approach to identify the best strategy for execution. Through extensive experiments using benchmark data sets we demonstrate that the COLARM optimizer is highly accurate in online plan selection and discovering localized rules (otherwise hidden in the global context) in a diversity of localized mining requests.
منابع مشابه
A Novel Method for Selecting the Supplier Based on Association Rule Mining
One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...
متن کاملOptimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining
The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملPredator-Miner: Ad hoc Mining of Associations Rules within a Database Management System
In this demonstration, we present a prototype system, Predator-Miner, which extends Predator with an relationallike association rule mining operator to support data mining operations. Predator-Miner allows a user to combine association rule mining queries with SQL queries. This approach towards tight integration differs from existing techniques of using user-defined functions (UDFs), stored pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014