Buffering and Read-Ahead Strategies for External Mergesort
نویسندگان
چکیده
The elapsed time for external mergesort is normally dominated by I/O time. This paper is focused on reducing I/O time during the merge phase. Three new buffering and readahead strategies are proposed, called equal buffering, extended forecasting and clustering. They exploit the fact that virtually all modern disks perform caching and sequential readahead. The latter two also collect information during run formation (the last key of each run block) which is then used to preplan reading. For random input data, extended forecasting and clustering were found to reduce merge time by 30% compared with traditional double buffering. Clustering exploits any temporal skew in input runs to further reduce the number of seeks. Authors’ current address: Microsoft, One Microsoft Way, Redmond, WA 98052-6399, U.S.A.
منابع مشابه
Speeding up External Mergesort
External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from diierent runs in consecutive disk addresses. This is done in t...
متن کاملPrefetching with Multiple Disks for External Mergesort: Simulation and Analysis
With the increase in the size of main memory in computer systems multiple disks and aggressive prefetching can be employed to signi cantly reduce I O time Two prefetching strategies intra run and inter run for external merging using multiple disks are studied Their performance is evaluated using simu lation and simple analytical expressions are derived to explain their asymptotic behavior The r...
متن کاملDesign and Performance Tradeoffs in Clustered Video Servers
In this paper, we investigate the suitability of clustered architectures for designing scalable multimedia servers. Specifically, we evaluate the effects of: (i) architectural design of the cluster, (ii) the size of the unit of data interleaving, and (iii) read-ahead buffering and scheduling on the real-time performance guarantees provided by the server. To analyze the effects of these paramete...
متن کاملSimple Randomized Mergesort on Parallel Disks1
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm calle...
متن کاملOptimal Bidding Strategies of GENCOs in Day-Ahead Energy and Spinning Reserve Markets Based on Hybrid GA-Heuristic Optimization Algorithm
In an electricity market, every generation company (GENCO) attempts to maximize profit according to other participants bidding behaviors and power systems operating conditions. The goal of this study is to examine the optimal bidding strategy problem for GENCOs in energy and spinning reserve markets based on a hybrid GA-heuristic optimization algorithm. The heuristic optimization algorithm used...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998