On the Scalability of Loop Tiling Techniques

نویسندگان

  • David G. Wonnacott
  • Michelle Mills Strout
چکیده

The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting parallelism for optimizing dense array codes. This model is expressive enough to describe transformations of imperfectly nested loops, and to capture a variety of program transformations, including many approaches to loop tiling. Tools such as the highly successful PLuTo automatic parallelizer have provided empirical confirmation of the success of polyhedral-based optimization, through experiments in which a number of benchmarks have been executed on machines with smallto medium-scale parallelism. In anticipation of ever higher degrees of parallelism, we have explored the impact of various loop tiling strategies on the asymptotic degree of available parallelism. In our analysis, we consider “weak scaling” as described by Gustavson, i.e., in which the data set size grows linearly with the number of processors available. Some, but not all, of the approaches to tiling provide weak scaling. In particular, the tiling currently performed by PLuTo does not scale in this sense. In this article, we review approaches to loop tiling in the published literature, focusing on both scalability and implementation status. We find that fully scalable tilings are not available in general-purpose tools, and call upon the polyhedral compilation community to focus on questions of asymptotic scalability. Finally, we identify ongoing work that may resolve this issue.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

International Workshop on Polyhedral Compilation Techniques

The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting parallelism for optimizing dense array codes. This model is expressive enough to describe transformations of imperfectly nested loops, and to capture a variety of program transformations, including many approaches to loop tiling. Tools such as the highly successful PLuTo automatic parallelizer hav...

متن کامل

Loop Parallelization Techniques for FPGA Accelerator Synthesis

Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well...

متن کامل

Effective Automatic Data Allocation for Parallelization of Affine Loop Nests

This paper proposes techniques for data allocation and computation mapping when compiling affine loop nest sequences for distributedmemory clusters. Techniques for transformation and detection of parallelism, and generation of communication sets relying on the polyhedral framework already exist. However, these recent approaches used a simple strategy to map computation to nodes – typically bloc...

متن کامل

Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling Ecole Normale Supérieure De Lyon Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling

Tiling is a technique used for exploitingmedium grain parallelism in nested loops It relies on a rst step that detects sets of permutable nested loops All algorithms developed so far consider the statements of the loop body as a single block in other words they are not able to take advantage of the structure of dependences between di erent statements In this report we overcome this limitation b...

متن کامل

A Cost-Effective Implementation of Multilevel Tiling

This paper presents a new cost-effective algorithm to compute exact loop bounds when multilevel tiling is applied to a loop nest having affine functions as bounds (nonrectangular loop nest). Traditionally, exact loop bounds computation has not been performed because its complexity is doubly exponential on the number of loops in the multilevel tiled code and, therefore, for certain classes of lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012