A Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries

نویسندگان

  • Randeep Bhatia
  • Rakesh K. Sinha
  • Chung-Min Chen
چکیده

Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the response time for range queries. Because of the combinatorial nature of the problem, it is not computationally feasible to perform an exhaustive search for the best scheme for large values of M (the number of disks). In this paper, we present an efficient technique for building good-performance declustering schemes for large values of M , based on known good declustering schemes for small values of M . We analyze the performance of the declustering schemes generated by this hierarchical technique, giving tight bounds on their query response times. For example we show, in two dimensions, that using optimal declustering schemes for M1 and M2 disks we can construct a scheme for M1 × M2 disks whose response time, expressed in terms of the maximum number of data blocks to be retrieved from any of the disks, is at most five more than the optimal response time. Our technique generalizes to any value of M in two dimensions and selected values of M in higher dimensions. We also present simulation results to show the effectiveness of these schemes in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Threshold-based declustering

Declustering techniques reduce query response time through parallel I/O by distributing data among multiple devices. Except for a few cases it is not possible to find declustering schemes that are optimal for all spatial range queries. As a result of this, most of the research on declustering has focused on finding schemes with low worst case additive error. However, additive error based scheme...

متن کامل

Efficient retrieval of multidimensional datasets through parallel I/O

Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...

متن کامل

cient Disk Allocation for Fast Similarity Searching

As databases increasingly integrate non-textual information it is becoming necessary to support eecient similarity searching in addition to range searching. Recently, declustering techniques have been proposed for improving the performance of similarity searches through parallel I/O. In this paper, we propose a new scheme which provides good declus-tering for similarity searching. In particular...

متن کامل

Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching

Data partitioning and declustering have been extensively used in the past to parallelize I/O for range queries. Numerous declustering and disk allocation techniques have been proposed in the literature. However, most of these techniques were primarily designed for two-dimensional data and for balanced partitioning of the data space. As databases increasingly integrate multimedia information in ...

متن کامل

Selective Replicated Declustering for Arbitrary Queries

Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. J.

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2003