EGSC HPCC General Information: Message Passing Interface (MPI)

The Message Passing Interface (MPI)

The Message Passing Interface (MPI) is a specification for a method of coordinating the operation of multiple versions of the same program running in parallel on multiple nodes of a cluster of computers. The problem which MPI was developed to solve is this: How can a programmer or programming team, using a mostly familiar language and tools, develop a single program to be deployed in parallel and to benefit from parallel deployment on a possibly changeable number of independent but networked computers? In order to solve this problem, MPI provides programmers with an extension of the syntax of existing programming languages to enable relatively simple development of parallel-processing and parallel-computation software. The relatively high-level constructs added to programming languages by MPI are implemented through an infrastructure which allows programs compiled with MPI extensions to communicate with one another by passing messages back and forth over the underlying cluster network. The MPI message-passing infrastructure is external to - and independent of - the programs which make use of it.

In general, computation is rich with opportunities for parallelization and varieties of parallelization. This is both a blessing and a curse. There are so many kinds of parallel and distributed processing that on approaching any particular problem we may find ourselves overwhelmed by the possibilities. (Parallel and Distributed Computing by Claudia Leopold provides a thorough overview treatment of the many alternatives to serial processing and computation. A full citation is in the bibliography.) If we speak of patterns of parallel processing (or paradigms, or models), then we can say that MPI provides a natural and efficient means of expressing parallel programs which match some, but far from all, patterns of parallel computation. MPI enables parallel solutions for a large subset of problems in computation, though many if not all of these problems are amenable to alternative solutions, some of which will almost certainly be more efficient at run time (though not necessarily when the time and other costs of development are considered).

Implementations of MPI have been made available on the EGSC HPCC because MPI provides what is probably the most robust and general adjunct to Linux software development capabilities for taking advantage of any Beowulf cluster's parallelism in order to carry out and complete computational tasks quickly. Another widely used facility for parallelization, Parallel Virtual Machine (PVM), has advantages for some classes of problems and may be made available on the EGSC HPCC in the future. Please see the following Web sites for more information on MPI and PVM:

Implementations of MPI

Developing and deploying software using MPI requires both (i) a set of compile-time tools and (ii) run-time support. When we refer to an "implementation" of MPI, we mean the libraries and other materials needed to compile programs which can run in parallel in conjunction with the run-time facilities which provide message transport and coordination among programs compiled with MPI libraries and language constructs. The MPI run-time facilities for Linux clusters generally require at least one support process to be running on each node of the cluster involved in a parallel MPI job or task. Open-source and commercial implementations of MPI are widely available for C, FORTRAN, and C++. Other languages are also supported by implementations in less wide-spread use. Programmers should be aware that a program written for one implementation of MPI may not run under another implementation without re-compilation and possibly some modification. Some implementations of MPI require that all cluster nodes have access to a shared file system such as a Network File System (NFS) mounted drive. The EGSC HPCC does not provide a shared file system because the use of such systems unacceptably degrades cluster performance.

LAM-MPI and MPICH-2

The Version 7.1.1 of the LAM implementation of MPI developed at the University of Indiana has been made available on the EGSC HPCC because it is widely used, stable, and acceptably efficient for many tasks. The MPICH2 implementation of MPI, developed at the Argonne National Laboratory (ANL), has been chosen because it allows development of parallel programs which are readily transportable across cluster and hardware types and because it is likely to be enhanced and supported because of the authoritative role of the ANL in the development of the MPI specifications. Neither of these implementations requires the cluster nodes to share a file system. Both of these implementations support Version 1.1 of the MPI Standard and most of Version 2.0 of the MPI Standard. The slow performance under Version 1.x.x of MPICH has been corrected in MPICH2, which has been written from scratch.

Please refer to this Web site's Help pages for information about starting up the LAM/MPI or MPICH2 run-time infrastructure under your user account.