Eecient Program Partitioning Based on Compiler Controlled Communication 1

نویسندگان

Ram Subramanian

Santosh Pande

چکیده

In this paper, we present an e cient framework for intraprocedural performance based program partitioning for sequential loop nests. Due to the limitations of static dependence analysis especially in the inter-procedural sense, many loop nests are identi ed as sequential but available task parallelism amongst them could be potentially exploited. Since this available parallelism is quite limited, performance based program analysis and partitioning which carefully analyzes the interaction between the loop nests and the underlying architectural characteristics must be undertaken to e ectively use this parallelism. We propose a compiler driven approach that con gures underlying architecture to support a given communication mechanism. We then devise an iterative program partitioning algorithm that generates e cient program partitioning by analyzing interaction between e ective cost of communication and the corresponding partitions. We model this problem as one of partitioning a directed acyclic task graph (DAG) in which each node is identi ed with a sequential loop nest and the edges denote the precedences and communication between the nodes corresponding to data transfer between loop nests. We introduce the concept of behavioral edges between edges and nodes in the task graph for capturing the interactions between computation and communication through parametric functions. We present an e cient iterative partitioning algorithm using the behavioral edge augmented PDG to incrementally compute and improve the schedule. A signi cant performance improvement (factor of 10 in many cases) is demonstrated by using our framework on some applications which exhibit this type of parallelism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiling Dataaow into Threads Eecient Compiler-controlled Multithreading for Lenient Parallel Languages Compiling Dataaow into Threads Eecient Compiler-controlled Multithreading for Lenient Parallel Languages Compiling Dataaow into Threads Eecient Compiler-controlled Multithreading for Lenient Parallel Languages

Powerful non-strict parallel languages require fast dynamic scheduling. This thesis explores how the need for multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, ID90, for a threaded abstract machine, TAM...

متن کامل

Chain-based Scheduling: Part I { Loop Transformations and Code Generation Chain-based Scheduling: Part I { Loop Transformations and Code Generation

Chain-based scheduling 1] is an eecient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers 1,2,3] are graph algorithms on t...

متن کامل

A Dimension Independent General Partitioning Algorithm to supportHPF ( re ) distribution directives

A General Partitioning Algorithm can provide a powerful way of computing local and global addressing for diierent distributions on diierent processor topologies. Therefore, it can be used to design an eecient algorithm for run-time (re)distributions in array based languages such as High Performance Fortran(HPF). EEcient redistributions are essential in many compute and communication intensive a...

متن کامل

Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers

The computation partitioning, communication analysis, and optimization phases performed during compilation for distributed-memory multicomputers require an eecient way of describing distributed sets of iterations and regions of data. Processor Tagged Descriptors (PTDs) provide these capabilities through a single set representation parameterized by the processor location for each dimension of a ...

متن کامل

Automatic Parallelization of Non-uniform Dependences

This report summarizes our current experiences with Automatic Program Parallelization tools for converting sequential Fortran code for use on a multiprocessor computer. A number of such tools were evaluated, including Parafrase, Adaptor, PAT, Petit and the SUIF compiler package. We evaluated the suitability of such tools for parallelizing Computational Fluid Dynamics code supplied by the Army R...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Eecient Program Partitioning Based on Compiler Controlled Communication 1

نویسندگان

چکیده

منابع مشابه

Chain-based Scheduling: Part I { Loop Transformations and Code Generation Chain-based Scheduling: Part I { Loop Transformations and Code Generation

A Dimension Independent General Partitioning Algorithm to supportHPF ( re ) distribution directives

Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers

Automatic Parallelization of Non-uniform Dependences

عنوان ژورنال:

اشتراک گذاری