Mining Top-K Multidimensional Gradients

نویسندگان

  • Ronnie Alves
  • Orlando Belo
  • Joel Ribeiro
چکیده

Several business applications such as marketing basket analysis, clickstream analysis, fraud detection and churning migration analysis demand gradient data analysis. By employing gradient data analysis one is able to identify trends, outliers and answering “what-if” questions over large databases. Gradient queries were first introduced by Imielinski et al [1] as the cubegrade problem. The main idea is to detect interesting changes in a multidimensional space (MDS). Thus, changes in a set of measures (aggregates) are associated with changes in sector characteristics (dimensions). MDS contains a huge number of cells which poses great challenge for mining gradient cells on a useful time. Dong et al [2] have proposed gradient constraints to smooth the computational costs involved in such queries. Even by using such constraints on large databases, the number of interesting cases to evaluate is still large. In this work, we are interested to explore best cases (Top-K cells) of interesting multidimensional gradients. There several studies on Top-K queries, but preference queries with multidimensional selection were introduced quite recently by Dong et al [9]. Furthermore, traditional Top-K methods work well in presence of convex functions (gradients are non-convex ones). We have revisited iceberg cubing for complex measures, since it is the basis for mining gradient cells. We also propose a gradient-based cubing strategy to evaluate interesting gradient regions in MDS. Thus, the main challenge is to find maximum gradient regions (MGRs) that maximize the task of mining Top-K gradient cells. Our performance study indicates that our strategy is effective on finding the most interesting gradients in multidimensional space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Multidimensional Indexing

The multidimensional k-NN (k nearest neighbors) query problem arises in a large variety of database applications, including information retrieval, natural language processing, and data mining. To solve it efficiently, database needs an indexing structure supporting this kind of search. However, exact solution is hardly feasible in multidimensional space. In this paper we describe and analyze an...

متن کامل

K 2-Treaps: Range Top-k Queries in Compact Space

Efficient processing of top-k queries on multidimensional grids is a common requirement in information retrieval and data mining, for example in OLAP cubes. We introduce a data structure, the K-treap, that represents grids in compact form and supports efficient prioritized range queries. We compare the K-treap with state-of-the-art solutions on synthetic and real-world datasets, showing that it...

متن کامل

Ensemble-based Top-k Recommender System Considering Incomplete Data

Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...

متن کامل

Mining significant change patterns in multidimensional spaces

In this paper, we present a new OLAP Mining method for exploring interesting trend patterns. Our main goal is to mine the most (TOP-K) significant changes in Multidimensional Spaces (MDS) applying a gradientbased cubing strategy. The challenge is then finding maximum gradient regions, which maximises the task of detecting TOP-K gradient cells. Several heuristics are also introduced to prune MDS...

متن کامل

Mining Contrast Subspaces

In this paper, we tackle a novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C+ and C− and a query object o, we want to find top-k subspaces S that maximize the ratio of likelihood of o in C+ against that in C−. We demonstrate that this problem has important applications, and at the same time, is very challenging. It even does not allow polynomia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007