Skew Handling in the DBS3 Parallel Database System

نویسندگان

  • Luc Bouganim
  • Daniela Florescu
  • Benoît Dageville
چکیده

The gains of parallel query execution can be limited because of high start-up time, interference between execution entities, and poor load balancing. In this paper, we present a solution which reduces these limitations in DBS3, a shared-memory parallel database system. This solution combines static data partitioning and dynamic processor allocation to adapt to the execution context. It makes DBS3 almost insensitive to data skew and allows decoupling the degree of parallelism from the degree of data partitioning. To address the problem of load balancing in the presence of data skew, we analyze three important factors that influence the behavior of our parallel execution model: skew factor, degree of parallelism and degree of partitioning. We report on experiments varying these three parameters with the DBS3 prototype on a 72-node KSR1 multiprocessor. The results demonstrate high performance gains, even with highly skewed data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Parallel Query Execution in DBS3

DBS3 (Database System for Shared Store) [1] is a parallel database system for sharedmemory multiprocessors [7]. It has been implemented on an Encore Multimax (10 processors) and on a Kendal Square Research KSR1 (72 processors). In a shared-memory architecture, each processor has uniform access to the entire database through a global main memory. Thus, the parallel scheduler, which allocates pro...

متن کامل

Parallel Query Processing in DBS3

In this paper, we describe our approach to the compile-time optimization and parallelization of queries for execution in DBS3, a shared-memory parallel database system. Our approach enables exploring a search space large enough to include zigzag trees which are intermediate between left-deep and right-deep trees. Zigzag trees are shown to provide better response time than right-deep trees in th...

متن کامل

Practical Skew Handling in Parallel Joins

We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being join...

متن کامل

Efficient Outer Join Data Skew Handling in Parallel DBMS

Large enterprises have been relying on parallel database management systems (PDBMS) to process their ever-increasing data volume and complex queries. The scalability and performance of a PDBMS comes from load balancing on all nodes in the system. Skewed processing will significantly slow down query response time and degrade the overall system performance. Business intelligence tools used by ent...

متن کامل

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

Shared nothing multiprocessor archit.ecture is known t.o be more scalable to support very large databases. Compared to other join strategies, a hash-ba9ed join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996