The future of mpi
Download
1 / 24

The Future of MPI - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

The Future of MPI. William Gropp Argonne National Laboratory www.mcs.anl.gov/~gropp. The Success of MPI. Applications Most recent Gordon Bell prize winners use MPI Libraries Growing collection of powerful software components Tools Performance tracing (Vampir, Jumpshot, etc.)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Future of MPI' - desiderio-hierro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The future of mpi

The Future of MPI

William GroppArgonne National Laboratorywww.mcs.anl.gov/~gropp


The success of mpi
The Success of MPI

  • Applications

    • Most recent Gordon Bell prize winners use MPI

  • Libraries

    • Growing collection of powerful software components

  • Tools

    • Performance tracing (Vampir, Jumpshot, etc.)

    • Debugging (Totalview, etc.)

  • Results

    • Papers: http://www.mcs.anl.gov/mpi/papers

  • Clusters

    • Ubiquitous parallel computing


Why was mpi successful
Why Was MPI Successful?

  • It address all of the following issues:

    • Portability

    • Performance

    • Simplicity and Symmetry

    • Modularity

    • Composability

    • Completeness


Portability and performance
Portability and Performance

  • Portability does not require a “lowest common denominator” approach

    • Good design allows the use of special, performance enhancing features without requiring hardware support

    • MPI’s nonblocking message-passing semantics allows but does not require “zero-copy” data transfers

  • BTW, it is “Greatest Common Denominator”


Simplicity and symmetry
Simplicity and Symmetry

  • MPI is organized around a small number of concepts

    • The number of routines is not a good measure of complexity

    • Fortran

      • Large number of intrinsic functions

    • C and Java runtimes are large

    • Development Frameworks

      • Hundreds to thousands of methods

    • This doesn’t bother millions of programmers


Measuring complexity
Measuring Complexity

  • Complexity should be measured in the number of concepts, not functions or size of the manual

  • MPI is organized around a few powerful concepts

    • Point-to-point message passing

    • Datatypes

    • Blocking and nonblocking buffer handling

    • Communication contexts and process groups


Elegance of design
Elegance of Design

  • MPI often uses one concept to solve multiple problems

  • Example: Datatypes

    • Describe noncontiguous data transfers, necessary for performance

    • Describe data formats, necessary for heterogeneous systems

  • “Proof” of elegance:

    • Datatypes exactly what is needed for high-performance I/O, added in MPI-2.


Parallel i o
Parallel I/O

  • Collective model provides high I/O performance

    • Matches applications most general view: objects, distributed among processes

  • MPI Datatypes extend I/O model to noncontiguous data in both memory and file

    • Unix readv/writev only applies to memory


Parallel i o performance with mpi io

Posix style I/O

Parallel I/O Performance with MPI-IO

Structured Mesh I/O

Unstructured Grid I/O

(Posix too slow to show)


Modularity
Modularity

  • Modern algorithms are hierarchical

    • Do not assume that all operations involve all or only one process

    • Provide tools that don’t limit the user

  • Modern software is built from components

    • MPI designed to support libraries

    • Many applications have no explicit MPI calls; all MPI contained within well-designed libraries


Composability
Composability

  • Environments are built from components

    • Compilers, libraries, runtime systems

    • MPI designed to “play well with others”

  • MPI exploits newest advancements in compilers

    • … without ever talking to compiler writers

    • OpenMP is an example


Completeness
Completeness

  • MPI provides a complete parallel programming model and avoids simplifications that limit the model

    • Contrast: Models that require that synchronization only occurs collectively for all processes or tasks

  • Make sure that the functionality is there when the user needs it

    • Don’t force the user to start over with a new programming model when a new feature is needed


Is ease of use the overriding goal
Is Ease of Use the Overriding Goal?

  • MPI often described as “the assembly language of parallel programming”

  • C and Fortran have been described as “portable assembly languages”

  • Ease of use is important. But completeness is more important.

    • Don’t force users to switch to a different approach as their application evolves


Lessons from mpi
Lessons From MPI

  • A general programming model for high-performance technical computing must address many issues to succeed

  • Even that is not enough. Also needs:

    • Good design

    • Buy-in by the community

    • Effective implementations

  • MPI achieved these through an Open Standards Process


An open and balanced process
An Open and Balanced Process

  • Balanced representation from

    • Users

      • What users want and need

        • Including correctness

    • Implementers (Vendors)

      • What can be provided

        • Many MPI features determined by implementation needs

    • Researchers

      • Directions and Futures

        • MPI planned for interoperation with OpenMP before OpenMP conceived

        • Support for libraries strongly influenced by research


Where next
Where Next?

  • Improving MPI

    • Simplifying and enhancing the expression of MPI programs

  • Improving MPI Implementations

    • Performance

    • Performance

    • Performance

  • New Directions

    • What can displace (or complement) MPI?(Yesterday’s panel presentation on programming models project and tomorrow’s panel on the future of supercomputing)


Improving mpi
Improving MPI

  • Simpler interfaces

    • Use compiler or precompiler techniques to support simpler, integrated syntax

    • Fortran 95 arrays, datatypes in C/C++

  • Eliminate function calls

    • Use program analysis and transformation to inline operations

  • More tools for correctness and performance debugging

    • MPI profiling interface is a good start

    • Debugger interface used by Totalview is an example of tool development

    • Effort to provide a common interface to internal performance data, such as idle time waiting for a message

  • Changes to MPI

    • E.g., MPI-2 RMA lacks a read-modify-write

    • But don’t hold your breath

      • These require research and experimentation before they are ready for a standardization process


Improving mpi implementations
Improving MPI Implementations

  • Faster Point-to-point

    • Some current implementations make unnecessary copies

  • Collective operations

    • Better algorithms exist

      • SMP optimizations

      • Scatter-gather broadcast, reduce, etc.

  • Optimizing for new hardware

    • RDMA networks

    • NIC-enabled remote atomic operations

  • Wide area networks

    • Optimizations for high latency

    • Speculative sends

    • Quality of service extensions (through MPI attributes)

  • Massive scaling

    • Many implementations optimize internal buffers for modest numbers of processes

    • Some MPI routines (e.g., MPI_Graph_create) do not have scalable definitions


More improvements for mpi implementations
More Improvements for MPI Implementations

  • Reduce latency

    • Automatic techniques to compress code paths

    • Closer match to hardware capabilities

  • Improve RMA

    • Many current implementations at best functional

  • Parallel I/O, particularly for clusters

    • Communication aggregation

    • Reliability in the presence of faults

  • Fault tolerance

    • Exploit MPI Intercommunicators to generalize the two-party model

  • Thread safe and efficient implementations

    • Lock-free design

    • Software engineering for common MPI implementation source tree

  • Many groups working on improved MPI implementations

    • MPICH-2 is an all-new and efficient implementation

      • Includes many of these ideas

      • Designed, as MPICH was, to encourage others to experiment and extend MPI


What s new in mpich2
What’s New in MPICH2

  • Beta-test version available for groups that expect to perform research on MPI implementations with MPICH2

    • Version 0.92 released last Friday

  • Contains

    • All of MPI-1, MPI-I/O, service functions from MPI-2, active-target RMA

    • C, C++, Fortran 77 bindings

    • Example devices for TCP, Infiniband, shared memory

    • Documentation

  • Passes extensive correctness tests

    • Intel test suite (as corrected); good unit test suite

    • MPICH test suite; adequate system test suite

    • Notre Dame C++ tests, based on IBM C test suite

    • Passes more tests than MPICH1 


Mpich2 research
MPICH2 Research

  • All new implementation is our vehicle for research in

    • Thread safety and efficiency (e.g., avoid thread locks)

    • Optimized MPI datatypes

    • Optimized Remote Memory Access (RMA)

    • High Scalability (64K MPI processes and more)

    • Exploiting Remote Direct Memory Access (RDMA) capable networks

    • All of MPI-2, including dynamic process management, parallel I/O, RMA

    • Usability and Robustness

      • Software engineering techniques that automate and simplify creating and maintaining a solid, user-friendly implementation

      • Allow extensive runtime error checking but do not require it

      • Integrated performance debugging

    • Clean interfaces to other system components such as scalable process managers


Some target platforms
Some Target Platforms

  • Clusters (TCP, UDP, Infiniband, Myrinet, Proprietary Interconnects, …)

  • Clusters of SMPs

  • Grids (UDP, TCP, Globus I/O, …)

  • Cray Red Storm

  • BlueGene/x

    • 64K processors; 64K address spaces

    • ANL/IBM developing MPI for BG/L

  • QCDoC

  • Cray X1 (at least I/O)

  • Other systems


Logical structure of mpich 2
(Logical) Structure of MPICH-2

MPICH-2

PMI

ADIO

ADI-3

remshell

Vendors

PVFS

Fork

MPD

bproc

Other parallel

file systems

NFS

XFS

Windows

Unix(python)

HFS

SFS

Myrinet,

Other NIC

Multi-

Method

Existing

Channel

Interface

Portals

In Progress

TCP

shmem

Infiniband

For others

MM

BG/L

shmem

TCP


Conclusions
Conclusions

  • The Future of MPI is Bright!

    • Higher-performance implementations

    • More libraries and applications

    • Better tools for developing and tuning MPI programs

    • Leverage of complementary technologies

  • Full MPI-2 implementations will become common

    • Several already exist; many ES apps use MPI RMA


ad