The future of mpi
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

The Future of MPI PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

The Future of MPI. William Gropp Argonne National Laboratory www.mcs.anl.gov/~gropp. The Success of MPI. Applications Most recent Gordon Bell prize winners use MPI Libraries Growing collection of powerful software components Tools Performance tracing (Vampir, Jumpshot, etc.)

Download Presentation

The Future of MPI

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The future of mpi

The Future of MPI

William GroppArgonne National Laboratorywww.mcs.anl.gov/~gropp


The success of mpi

The Success of MPI

  • Applications

    • Most recent Gordon Bell prize winners use MPI

  • Libraries

    • Growing collection of powerful software components

  • Tools

    • Performance tracing (Vampir, Jumpshot, etc.)

    • Debugging (Totalview, etc.)

  • Results

    • Papers: http://www.mcs.anl.gov/mpi/papers

  • Clusters

    • Ubiquitous parallel computing


Why was mpi successful

Why Was MPI Successful?

  • It address all of the following issues:

    • Portability

    • Performance

    • Simplicity and Symmetry

    • Modularity

    • Composability

    • Completeness


Portability and performance

Portability and Performance

  • Portability does not require a “lowest common denominator” approach

    • Good design allows the use of special, performance enhancing features without requiring hardware support

    • MPI’s nonblocking message-passing semantics allows but does not require “zero-copy” data transfers

  • BTW, it is “Greatest Common Denominator”


Simplicity and symmetry

Simplicity and Symmetry

  • MPI is organized around a small number of concepts

    • The number of routines is not a good measure of complexity

    • Fortran

      • Large number of intrinsic functions

    • C and Java runtimes are large

    • Development Frameworks

      • Hundreds to thousands of methods

    • This doesn’t bother millions of programmers


Measuring complexity

Measuring Complexity

  • Complexity should be measured in the number of concepts, not functions or size of the manual

  • MPI is organized around a few powerful concepts

    • Point-to-point message passing

    • Datatypes

    • Blocking and nonblocking buffer handling

    • Communication contexts and process groups


Elegance of design

Elegance of Design

  • MPI often uses one concept to solve multiple problems

  • Example: Datatypes

    • Describe noncontiguous data transfers, necessary for performance

    • Describe data formats, necessary for heterogeneous systems

  • “Proof” of elegance:

    • Datatypes exactly what is needed for high-performance I/O, added in MPI-2.


Parallel i o

Parallel I/O

  • Collective model provides high I/O performance

    • Matches applications most general view: objects, distributed among processes

  • MPI Datatypes extend I/O model to noncontiguous data in both memory and file

    • Unix readv/writev only applies to memory


Parallel i o performance with mpi io

Posix style I/O

Parallel I/O Performance with MPI-IO

Structured Mesh I/O

Unstructured Grid I/O

(Posix too slow to show)


Modularity

Modularity

  • Modern algorithms are hierarchical

    • Do not assume that all operations involve all or only one process

    • Provide tools that don’t limit the user

  • Modern software is built from components

    • MPI designed to support libraries

    • Many applications have no explicit MPI calls; all MPI contained within well-designed libraries


Composability

Composability

  • Environments are built from components

    • Compilers, libraries, runtime systems

    • MPI designed to “play well with others”

  • MPI exploits newest advancements in compilers

    • … without ever talking to compiler writers

    • OpenMP is an example


Completeness

Completeness

  • MPI provides a complete parallel programming model and avoids simplifications that limit the model

    • Contrast: Models that require that synchronization only occurs collectively for all processes or tasks

  • Make sure that the functionality is there when the user needs it

    • Don’t force the user to start over with a new programming model when a new feature is needed


Is ease of use the overriding goal

Is Ease of Use the Overriding Goal?

  • MPI often described as “the assembly language of parallel programming”

  • C and Fortran have been described as “portable assembly languages”

  • Ease of use is important. But completeness is more important.

    • Don’t force users to switch to a different approach as their application evolves


Lessons from mpi

Lessons From MPI

  • A general programming model for high-performance technical computing must address many issues to succeed

  • Even that is not enough. Also needs:

    • Good design

    • Buy-in by the community

    • Effective implementations

  • MPI achieved these through an Open Standards Process


An open and balanced process

An Open and Balanced Process

  • Balanced representation from

    • Users

      • What users want and need

        • Including correctness

    • Implementers (Vendors)

      • What can be provided

        • Many MPI features determined by implementation needs

    • Researchers

      • Directions and Futures

        • MPI planned for interoperation with OpenMP before OpenMP conceived

        • Support for libraries strongly influenced by research


Where next

Where Next?

  • Improving MPI

    • Simplifying and enhancing the expression of MPI programs

  • Improving MPI Implementations

    • Performance

    • Performance

    • Performance

  • New Directions

    • What can displace (or complement) MPI?(Yesterday’s panel presentation on programming models project and tomorrow’s panel on the future of supercomputing)


Improving mpi

Improving MPI

  • Simpler interfaces

    • Use compiler or precompiler techniques to support simpler, integrated syntax

    • Fortran 95 arrays, datatypes in C/C++

  • Eliminate function calls

    • Use program analysis and transformation to inline operations

  • More tools for correctness and performance debugging

    • MPI profiling interface is a good start

    • Debugger interface used by Totalview is an example of tool development

    • Effort to provide a common interface to internal performance data, such as idle time waiting for a message

  • Changes to MPI

    • E.g., MPI-2 RMA lacks a read-modify-write

    • But don’t hold your breath

      • These require research and experimentation before they are ready for a standardization process


Improving mpi implementations

Improving MPI Implementations

  • Faster Point-to-point

    • Some current implementations make unnecessary copies

  • Collective operations

    • Better algorithms exist

      • SMP optimizations

      • Scatter-gather broadcast, reduce, etc.

  • Optimizing for new hardware

    • RDMA networks

    • NIC-enabled remote atomic operations

  • Wide area networks

    • Optimizations for high latency

    • Speculative sends

    • Quality of service extensions (through MPI attributes)

  • Massive scaling

    • Many implementations optimize internal buffers for modest numbers of processes

    • Some MPI routines (e.g., MPI_Graph_create) do not have scalable definitions


More improvements for mpi implementations

More Improvements for MPI Implementations

  • Reduce latency

    • Automatic techniques to compress code paths

    • Closer match to hardware capabilities

  • Improve RMA

    • Many current implementations at best functional

  • Parallel I/O, particularly for clusters

    • Communication aggregation

    • Reliability in the presence of faults

  • Fault tolerance

    • Exploit MPI Intercommunicators to generalize the two-party model

  • Thread safe and efficient implementations

    • Lock-free design

    • Software engineering for common MPI implementation source tree

  • Many groups working on improved MPI implementations

    • MPICH-2 is an all-new and efficient implementation

      • Includes many of these ideas

      • Designed, as MPICH was, to encourage others to experiment and extend MPI


What s new in mpich2

What’s New in MPICH2

  • Beta-test version available for groups that expect to perform research on MPI implementations with MPICH2

    • Version 0.92 released last Friday

  • Contains

    • All of MPI-1, MPI-I/O, service functions from MPI-2, active-target RMA

    • C, C++, Fortran 77 bindings

    • Example devices for TCP, Infiniband, shared memory

    • Documentation

  • Passes extensive correctness tests

    • Intel test suite (as corrected); good unit test suite

    • MPICH test suite; adequate system test suite

    • Notre Dame C++ tests, based on IBM C test suite

    • Passes more tests than MPICH1 


Mpich2 research

MPICH2 Research

  • All new implementation is our vehicle for research in

    • Thread safety and efficiency (e.g., avoid thread locks)

    • Optimized MPI datatypes

    • Optimized Remote Memory Access (RMA)

    • High Scalability (64K MPI processes and more)

    • Exploiting Remote Direct Memory Access (RDMA) capable networks

    • All of MPI-2, including dynamic process management, parallel I/O, RMA

    • Usability and Robustness

      • Software engineering techniques that automate and simplify creating and maintaining a solid, user-friendly implementation

      • Allow extensive runtime error checking but do not require it

      • Integrated performance debugging

    • Clean interfaces to other system components such as scalable process managers


Some target platforms

Some Target Platforms

  • Clusters (TCP, UDP, Infiniband, Myrinet, Proprietary Interconnects, …)

  • Clusters of SMPs

  • Grids (UDP, TCP, Globus I/O, …)

  • Cray Red Storm

  • BlueGene/x

    • 64K processors; 64K address spaces

    • ANL/IBM developing MPI for BG/L

  • QCDoC

  • Cray X1 (at least I/O)

  • Other systems


Logical structure of mpich 2

(Logical) Structure of MPICH-2

MPICH-2

PMI

ADIO

ADI-3

remshell

Vendors

PVFS

Fork

MPD

bproc

Other parallel

file systems

NFS

XFS

Windows

Unix(python)

HFS

SFS

Myrinet,

Other NIC

Multi-

Method

Existing

Channel

Interface

Portals

In Progress

TCP

shmem

Infiniband

For others

MM

BG/L

shmem

TCP


Conclusions

Conclusions

  • The Future of MPI is Bright!

    • Higher-performance implementations

    • More libraries and applications

    • Better tools for developing and tuning MPI programs

    • Leverage of complementary technologies

  • Full MPI-2 implementations will become common

    • Several already exist; many ES apps use MPI RMA


  • Login