Formal Analysis of MPI-Based Parallel Programs: Present and Future∗

نویسندگان

  • Ganesh Gopalakrishnan
  • Robert M. Kirby
  • Stephen Siegel
  • Rajeev Thakur
  • William Gropp
  • Ewing Lusk
  • Bronis R. de Supinski
  • Martin Schulz
  • Greg Bronevetsky
چکیده

The Message Passing Interface (MPI) is the dominant programming model for high-performance computing. Applications that use over 100,000 cores are currently running on large HPC systems. Unfortunately, MPI usage presents a number of correctness challenges that can result in wasted simulation cycles and errant computations. In this article, we describe recently developed formal and semi-formal approaches for verifying MPI applications, highlighting the scalability/coverage tradeoffs taken in them. We concentrate on MPI because of its ubiquity and because similar issues will arise when developing verification techniques for other performance-oriented concurrency APIs. We conclude with a look at the inevitability of multiple concurrency models in future parallel systems and point out why the HPC community urgently needs increased participation from formal methods researchers. 1 MPI and Importance of Correctness The Message Passing Interface (MPI) Standard was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. The document is not issued by an official standards organization, but has become a de facto standard through near universal adoption. Development of MPI continues, with MPI-2.2 released in 2009. The standard has been published on the web [1] and as a book, and several other books based on it have appeared (e.g., [2, 3]). Implementations are available in open source form [4, 5], from software vendors such as Microsoft and Intel, and from every vendor of high-performance computing systems. MPI is widely cited; Google scholar returned 28,400 hits for “+MPI +"Message Passing Interface"” in September, 2010. MPI is ubiquitous: it runs everywhere, from embedded systems to the largest of supercomputers. Many MPI programs represent dozens, if not hundreds, of person-years of development, including their calibration for accuracy, and performance tuning. MPI was designed to support highly scalable comput∗Supported in part by Microsoft, NSF CNS-0509379, CCF-0811429, CCF-0903408, CCF-0953210, CCF-0733035, and DOE ASCR DE-AC0206CH11357. Accepted 6/1/11 to appear in CACM. ing; applications that use over 100,000 cores are currently running on systems such as the IBM Blue Gene/P (Figure 1) and Cray XT5. Scientists and engineers all over the world use MPI in thousands of applications, such as investigations of alternate energy sources and weather simulations. For highperformance computing, MPI is by far the dominant programming model; most (at some centers, all) applications running on supercomputers use MPI. MPI is considered a requirement even for Exascale systems, at least by application developers [6]. Unfortunately, the MPI debugging methods available to these application developers are wasteful, and ultimately unreliable. Existing MPI testing tools seldom provide coverage guarantees. They examine essentially equivalent execution sequences, thus reducing testing efficiency. These methods fare even worse at large problem scales. Consider the costs of HPC bugs: (i) a high-end HPC center costs hundreds of millions to commission, and the machines become obsolete within six years, (ii) in many of these centers, over $3 million dollars are spent in electricity costs alone each year, and (iii) research teams apply for computer time through competitive proposals, spending years planning their experiments. In addition to these development costs, one must add the costs to society of relying on potentially defective software to inform decisions on issues of great importance, such as climate change. Because the stakes are so high, formal methods can play an important role in debugging and verifying MPI applications. In this paper, we give a survey of existing techniques, clearly describing the pros and cons of each approach. These methods have value far beyond MPI, addressing the general needs of future concurrency application developers, who will inevitably use a variety of low-level concurrency APIs. Overview of MPI: Historically, parallel systems have used either message passing or shared memory for communication. Compared to other message passing systems noted for their parsimony, MPI supports a large number of cohesively engineered features essential for designing large-scale simulations. MPI-2.2 [1] specifies over 300 functions. Most developers use only a few dozen of these in any given application. MPI programs consist of one or more threads of execution

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

An Approach to Formalization and Analysis of Message Passing Libraries

Message passing using libraries implementing the Message Passing Interface (MPI) standard is the dominant communication mechanism in high performance computing (HPC) applications. Yet, the lack of an implementation independent formal semantics for MPI is a huge void that must be filled, especially given the fact that MPI will be implemented on novel hardware platforms in the near future. To hel...

متن کامل

Analysis of Mpi Programs

We investigate the application of formal verification techniques to parallel programs which employ the Message Passing Interface (MPI). We develop a formal model of a subset of MPI, and then prove a number of theorems about that model which ameliorate or eliminate altogether the state explosion problem. As an example, we show that if one wishes to verify freedom from deadlock, it suffices to co...

متن کامل

Contributed Articles Formal Analysis of Mpi-based Parallel Programs

MosT ParaLLeL CoMPUTiNG applications in highperformance computing use the Message Passing Interface (MPI) API. Given the fundamental importance of parallel computing to science and engineering research, application correctness is paramount. MPI was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. however, the ...

متن کامل

Protocol-based verification of MPI programs

We present a methodology for the verification of Message Passing Interface (MPI) programs written in C. The aim is to statically verify programs against protocol specifications, enforcing properties such as fidelity and absence of deadlocks. We make use of a protocol language based on a dependent type system for message-passing parallel programs. For the verification of a program against a give...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011