Efficient Algorithms for Repairing Inconsistent Dimensions in Data Warehouses
نویسنده
چکیده
Dimensions in Data Warehouses (DWs) are usually modeled as a hierarchical set of categories called the dimension schema. To guarantee summarizability, this is, the capability of using pre-computed answers at lower levels to compute answers at higher levels, a dimension is required to be strict and covering, meaning that every element of the dimension must be connected to a unique ancestor in each of its ancestor categories. In practice, rollup relations of dimensions need to be reclassified to correct errors or to adapt the data to changes. After these operations the dimension may become non-strict. A minimal r-repair is a new dimension that is strict and covering, is obtained from the original dimension through a minimum number of changes, and keeps the set of reclassifications. In the general case finding an r-repair for a dimension is NP-complete. We present efficient polynomial time algorithms to compute a single r-repair for dimensions that contain one conflicting level and become inconsistent after one reclassification of elements.
منابع مشابه
Logic Programs for Repairing Inconsistent Dimensions in Data Warehouses
A Data Warehouse (DW) is a data repository that integrates data from multiple sources and organizes the data according to a set of data structures called dimensions. Each dimension provides a perspective upon which the data can be viewed. In order to support an efficient processing of queries, a dimension is usually required to satisfy different classes of integrity constraints. In this paper, ...
متن کاملRepairing inconsistent dimensions in data warehouses
A dimension in a Data Warehouse (DW) is a set of elements connected by a hierarchical relationship. The elements are used to view summaries of data at different levels of abstraction. In order to support an efficient processing of such summaries, a dimension is usually required to satisfy different classes of integrity constraints. In scenarios where the constraints properly capture the semanti...
متن کاملEfficient Approximation Algorithms for Point-set Diameter in Higher Dimensions
We study the problem of computing the diameter of a set of $n$ points in $d$-dimensional Euclidean space for a fixed dimension $d$, and propose a new $(1+varepsilon)$-approximation algorithm with $O(n+ 1/varepsilon^{d-1})$ time and $O(n)$ space, where $0 < varepsilonleqslant 1$. We also show that the proposed algorithm can be modified to a $(1+O(varepsilon))$-approximation algorithm with $O(n+...
متن کاملDynamic cubing for hierarchical multidimensional data space
Data warehouses are being used in many applications since quite a long time. Traditionally, new data in these warehouses is loaded through offline bulk updates which implies that latest data is not always available for analysis. This, however, is not acceptable in many modern applications (such as intelligent building, smart grid etc.) that require the latest data for decision making. These mod...
متن کاملEfficient Aggregation Algorithms for Compressed Data Warehouses
ÐAggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013