Parallel Processing Letters Program Optimization Based on Compile-time Cache Performance Prediction

نویسندگان

  • Wesley K. Kaplow
  • Boleslaw K. Szymanski
چکیده

We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache hit-rates are produced by applying the reference string, determined during compilation, to an architecturally parameterized cache simu-lator. We also describe a heuristic that uses this method for compile-time optimization of loop ranges in iteration-space blocking. The results of the loop program optimizations are presented for diierent parallel program benchmarks and various processor architec-tures, such as IBM SP1 RS/6000, the SuperSPARC, and the Intel i860.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Program Optimization Based on Compile-Time Cache Performance Prediction

We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache-miss rates are produced by applying the program's reference string of a loop nest, determined during compilation, to an architecturally parameterized cache simulator. The obtained cache-miss rates correlate well with the performance of the loop nests on actual target machines....

متن کامل

Parallel Processing Letters C World Scientiic Publishing Company Tiling for Parallel Execution { Optimizing Node Cache Performance

Tiling has been used by parallelizing compilers to deene ne-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining tile size that achieves the highest cache hit-rate. The widening disparity between the processor's peak instruction rate and the main memory access time in modern processo...

متن کامل

Models for Performance Prediction of Cache Coherence Protocols

Key words: Cache coherence, distributed shared memory, memory access behavior, analytical performance prediction , performance evaluation, dynamic hybrid protocols. In a modern shared memory multiprocessor, it is possible to support more than one protocol for maintaining cache coherence. Possible candidates might be based on the Write-Back/Invalidate, Write-Through/Invalidate, and Write-Update ...

متن کامل

Procedure Cloning and Integration for Converting Parallelism from Coarse to Fine Grain

This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods extend whole-program optimization by expanding the scope of the compiler through a combination of software thread integration and procedure cloning. In each experiment we integrate a frequently executed procedure with its...

متن کامل

Design and Implementation of a Lightweight Dynamic Optimization System

Many opportunities exist to improve micro-architectural performance due to performance events that are difficult to optimize at static compile time. Cache misses and branch mis-prediction patterns may vary for different micro-architectures using different inputs. Dynamic optimization provides an approach to address these and other performance events at runtime. This paper describes a software s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996