Optimizing Charm++ over MPI

نویسندگان

Ralf Gunter

David Goodell

James Dinan

Pavan Balaji

چکیده

Charm++ may employ any of a myriad network-specific APIs for handling communication, which are usually promoted as being faster than its catch-all MPI module. Such a performance difference not only causes development effort to be spent on tuning vendor-specific APIs but also discourages hybrid Charm++/MPI applications. We investigate this disparity across several machines and applications, ranging from small InfiniBand clusters to Blue Gene/Q supercomputers and from synthetic benchmarks to large-scale biochemistry codes. We demonstrate the use of one feature from the recent MPI-3 standard to bridge this gap where applicable, and we discuss what can be done today.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Charm++ & MPI: Combining the Best of Both Worlds

MPI and Charm++ embody two distinct perspectives for writing parallel programs. While MPI provides a processor-centric, user-driven model for developing parallel codes, Charm++ supports work-centric, overdecompositionbased, system-driven parallel programming. One or the other can be the best or most natural fit for distinct modules that constitute a parallel application. In this paper, we prese...

متن کامل

Optimizing Point-to-Point Communication between Adaptive MPI Endpoints in SharedMemory

Correspondence *SamWhite, Email: [email protected] Abstract AdaptiveMPI is an implementation of theMPI standard that supports the virtualization of ranks as user-level threads, rather than OS processes. In this work, we optimize the communication performance of AMPI based on the locality of the endpoints communicating within a cluster of SMP nodes. We differentiate between point-to-point mes...

متن کامل

Object-Based Adaptive Load Balancing for MPI Programs∗

Parallel Computational Science and Engineering (CSE) applications often exhibit irregular structure and dynamic load patterns. Many such applications have been developed using procedural languages (e.g. Fortran) in message passing parallel programming paradigm (e.g. MPI) for distributed memory machines. Incorporating dynamic load balancing techniques at the application-level involves significan...

متن کامل

Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory

The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resultin...

متن کامل

Optimizing MPI Collectives for X1

Traditionally MPI collective operations have been based on point-to-point messages, with possible optimizations for system topologies and communication protocols. The Cray X1 scatter/gather hardware and shared memory mapping features allow for significantly different approaches to MPI collectives leading to substantial performance gains over standard methods, especially for short message length...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Optimizing Charm++ over MPI

نویسندگان

چکیده

منابع مشابه

Charm++ & MPI: Combining the Best of Both Worlds

Optimizing Point-to-Point Communication between Adaptive MPI Endpoints in SharedMemory

Object-Based Adaptive Load Balancing for MPI Programs∗

Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory

Optimizing MPI Collectives for X1

عنوان ژورنال:

اشتراک گذاری