Parallel Processing Letters Program Optimization Based on Compile-time Cache Performance Prediction
نویسندگان
چکیده
We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache hit-rates are produced by applying the reference string, determined during compilation, to an architecturally parameterized cache simu-lator. We also describe a heuristic that uses this method for compile-time optimization of loop ranges in iteration-space blocking. The results of the loop program optimizations are presented for diierent parallel program benchmarks and various processor architec-tures, such as IBM SP1 RS/6000, the SuperSPARC, and the Intel i860.
منابع مشابه
Program Optimization Based on Compile-Time Cache Performance Prediction
We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache-miss rates are produced by applying the program's reference string of a loop nest, determined during compilation, to an architecturally parameterized cache simulator. The obtained cache-miss rates correlate well with the performance of the loop nests on actual target machines....
متن کاملParallel Processing Letters C World Scientiic Publishing Company Tiling for Parallel Execution { Optimizing Node Cache Performance
Tiling has been used by parallelizing compilers to deene ne-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining tile size that achieves the highest cache hit-rate. The widening disparity between the processor's peak instruction rate and the main memory access time in modern processo...
متن کاملModels for Performance Prediction of Cache Coherence Protocols
Key words: Cache coherence, distributed shared memory, memory access behavior, analytical performance prediction , performance evaluation, dynamic hybrid protocols. In a modern shared memory multiprocessor, it is possible to support more than one protocol for maintaining cache coherence. Possible candidates might be based on the Write-Back/Invalidate, Write-Through/Invalidate, and Write-Update ...
متن کاملProcedure Cloning and Integration for Converting Parallelism from Coarse to Fine Grain
This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods extend whole-program optimization by expanding the scope of the compiler through a combination of software thread integration and procedure cloning. In each experiment we integrate a frequently executed procedure with its...
متن کاملDesign and Implementation of a Lightweight Dynamic Optimization System
Many opportunities exist to improve micro-architectural performance due to performance events that are difficult to optimize at static compile time. Cache misses and branch mis-prediction patterns may vary for different micro-architectures using different inputs. Dynamic optimization provides an approach to address these and other performance events at runtime. This paper describes a software s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996