Declustering Large Multidimensional Data Sets for Range Queries over Heterogeneous Disks
نویسندگان
چکیده
Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although disk heterogeneity often exists in systems like clusters, most work on declustering has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose novel approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to the intuitively natural approach to the problem based on randomly assigning virtual servers to physical disks, this represents an improvement of 8–31% in average fetch ratio, as well as a 21–42% reduction in the standard deviation of retrieval performance for small queries.
منابع مشابه
Scalability Analysis of Declustering Methods for Multidimensional Range Queries
Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high perf...
متن کاملLatin Hypercubes: A Class of Multidimensional Declustering Techniques
The I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to different disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only ...
متن کاملA Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries
Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the resp...
متن کاملMultidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences
We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dimensions and Kronecker Sequences for higher dimensions. Using exhaustive simulation, we show that, in two dimensions, the worst-case (additive) deviation of the scheme from the optim...
متن کاملEfficient retrieval of multidimensional datasets through parallel I/O
Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003