On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems
نویسندگان
چکیده
This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared-memory multiprocessors. The mapping of computations is based on the connict-free write distribution of the reduction vector across the processors. The proposed method is general, scalable, and easy to implement on a compiler. A performance evaluation and comparison with other existing techniques is presented. From the experimental results, the proposed method is a clear alternative to the array expansion and privatized buuer methods , usual on state-of-the-art parallelizing compilers, like Polaris or SUIF.
منابع مشابه
Scalable Automatic Parallelization of Irregular Reductions on Shared Memory Multiprocessors
This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scal-able shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further distributed across processors. Iterations belonging to the same set are chosen in such a way that update diierent entries in the reduction array. Tha...
متن کاملcient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?
This paper presents a new parallelization method for an efcient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized witho...
متن کاملEfficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures
This paper presents a new parallelization method for an ef-cient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized with...
متن کاملAutomatic Parallelization for Non-cache Coherent Multiprocessors
Although much work has been done on parallelizing compilers for cache coherent shared memory multiprocessors and message-passing multiprocessors, there is relatively little research on parallelizing compilers for noncache coherent multiprocessors with global address space. In this paper, we present a preliminary study on automatic parallelization for the Cray T3D, a commercial scalable machine ...
متن کاملParallelizing Irregular Applications through the YAPPA Compilation Framework
Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999