Towards Eliminating Random I/O in Hash Joins
نویسندگان
چکیده
The widening performance gap between CPU and disk is signiicant for hash join performance. Most current hash join methods try to reduce the volume of data transferred between memory and disk. In this paper , we try to reduce hash-join times by reducing random I/O. We study how current algorithms incur random I/O, and propose a new hash join method, Seq + , that converts much of the random I/O to sequential I/O. Seq + uses a new organization for hash buckets on disk, and larger input and output buuer sizes. We introduce the technique of batch writes to reduce the bucket-write cost, and the concepts of write-and read-groups of hash buckets to reduce the bucket-read cost. We derive a cost model for our method, and present formulas for choosing various algorithm parameters, including input and output buuer sizes. Our performance study shows that the new hash join method performs many times better than current algorithms under various environments. Since our cost functions underestimate the cost of current algorithms and overestimate the cost of Seq + , the actual performance gain of Seq + is likely to be even greater.
منابع مشابه
Towards Eliminating Random 1 / 0 in Hash Joins
The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join times b y reducing random I/O. We study how current algorithms incur random I/O, and propose a new hash join method, Seq+, that converts much of the random 1/0 t...
متن کاملMemory-Contention Responsive Hash Joins
In order to maximize system performance in environments with fluctuating memory contention, memory-intensive algorithms such as hash join must gracefully adapt to variations in available memory. Mixed workloads, creating fluctuations of erratic frequency and magnitude, make responsiveness to memory contention particularly important. Previous studies on adaptable hash joins have focused on lower...
متن کاملAn Adaptive Hash Join Algorithm for Multiuser Environments
As main memory becomes a cheaper resource, hash joins are an alternative to the traditional methods of performing equi-joins: nested loop and merge joins. This paper introduces a modified, adaptive hash join method that is designed to work with dynamic changes in the amount of available memory. The general idea of the algorithm is to regulate resource usage of a hash join in a way that allows i...
متن کاملMemory-Efficient Hash Joins
We present new hash tables for joins, and a hash join based on them, that consumes far less memory and is usually faster than recently published in-memory joins. Our hash join is not restricted to outer tables that fit wholly in memory. Key to this hash join is a new concise hash table (CHT), a linear probing hash table that has 100% fill factor, and uses a sparse bitmap with embedded populatio...
متن کاملHash Join Algorithms in a Multiuser Environment
As main memory becomes a cheaper resource, hash joins are an alternative to the traditional methods of perfonning equi-joins: nested loop and merge joins. This paper introduces a modified, adaptive hash join method that is designed to work with dynamic changes in the amount of available memory. The general idea of the algorithm is to regulate resource usage of a hash join in a way that allows i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996