the future of mpi n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Future of MPI PowerPoint Presentation
Download Presentation
The Future of MPI

Loading in 2 Seconds...

play fullscreen
1 / 24

The Future of MPI - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

The Future of MPI. William Gropp Argonne National Laboratory www.mcs.anl.gov/~gropp. The Success of MPI. Applications Most recent Gordon Bell prize winners use MPI Libraries Growing collection of powerful software components Tools Performance tracing (Vampir, Jumpshot, etc.)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Future of MPI' - desiderio-hierro


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the future of mpi

The Future of MPI

William GroppArgonne National Laboratorywww.mcs.anl.gov/~gropp

the success of mpi
The Success of MPI
  • Applications
    • Most recent Gordon Bell prize winners use MPI
  • Libraries
    • Growing collection of powerful software components
  • Tools
    • Performance tracing (Vampir, Jumpshot, etc.)
    • Debugging (Totalview, etc.)
  • Results
    • Papers: http://www.mcs.anl.gov/mpi/papers
  • Clusters
    • Ubiquitous parallel computing
why was mpi successful
Why Was MPI Successful?
  • It address all of the following issues:
    • Portability
    • Performance
    • Simplicity and Symmetry
    • Modularity
    • Composability
    • Completeness
portability and performance
Portability and Performance
  • Portability does not require a “lowest common denominator” approach
    • Good design allows the use of special, performance enhancing features without requiring hardware support
    • MPI’s nonblocking message-passing semantics allows but does not require “zero-copy” data transfers
  • BTW, it is “Greatest Common Denominator”
simplicity and symmetry
Simplicity and Symmetry
  • MPI is organized around a small number of concepts
    • The number of routines is not a good measure of complexity
    • Fortran
      • Large number of intrinsic functions
    • C and Java runtimes are large
    • Development Frameworks
      • Hundreds to thousands of methods
    • This doesn’t bother millions of programmers
measuring complexity
Measuring Complexity
  • Complexity should be measured in the number of concepts, not functions or size of the manual
  • MPI is organized around a few powerful concepts
    • Point-to-point message passing
    • Datatypes
    • Blocking and nonblocking buffer handling
    • Communication contexts and process groups
elegance of design
Elegance of Design
  • MPI often uses one concept to solve multiple problems
  • Example: Datatypes
    • Describe noncontiguous data transfers, necessary for performance
    • Describe data formats, necessary for heterogeneous systems
  • “Proof” of elegance:
    • Datatypes exactly what is needed for high-performance I/O, added in MPI-2.
parallel i o
Parallel I/O
  • Collective model provides high I/O performance
    • Matches applications most general view: objects, distributed among processes
  • MPI Datatypes extend I/O model to noncontiguous data in both memory and file
    • Unix readv/writev only applies to memory
parallel i o performance with mpi io

Posix style I/O

Parallel I/O Performance with MPI-IO

Structured Mesh I/O

Unstructured Grid I/O

(Posix too slow to show)

modularity
Modularity
  • Modern algorithms are hierarchical
    • Do not assume that all operations involve all or only one process
    • Provide tools that don’t limit the user
  • Modern software is built from components
    • MPI designed to support libraries
    • Many applications have no explicit MPI calls; all MPI contained within well-designed libraries
composability
Composability
  • Environments are built from components
    • Compilers, libraries, runtime systems
    • MPI designed to “play well with others”
  • MPI exploits newest advancements in compilers
    • … without ever talking to compiler writers
    • OpenMP is an example
completeness
Completeness
  • MPI provides a complete parallel programming model and avoids simplifications that limit the model
    • Contrast: Models that require that synchronization only occurs collectively for all processes or tasks
  • Make sure that the functionality is there when the user needs it
    • Don’t force the user to start over with a new programming model when a new feature is needed
is ease of use the overriding goal
Is Ease of Use the Overriding Goal?
  • MPI often described as “the assembly language of parallel programming”
  • C and Fortran have been described as “portable assembly languages”
  • Ease of use is important. But completeness is more important.
    • Don’t force users to switch to a different approach as their application evolves
lessons from mpi
Lessons From MPI
  • A general programming model for high-performance technical computing must address many issues to succeed
  • Even that is not enough. Also needs:
    • Good design
    • Buy-in by the community
    • Effective implementations
  • MPI achieved these through an Open Standards Process
an open and balanced process
An Open and Balanced Process
  • Balanced representation from
    • Users
      • What users want and need
        • Including correctness
    • Implementers (Vendors)
      • What can be provided
        • Many MPI features determined by implementation needs
    • Researchers
      • Directions and Futures
        • MPI planned for interoperation with OpenMP before OpenMP conceived
        • Support for libraries strongly influenced by research
where next
Where Next?
  • Improving MPI
    • Simplifying and enhancing the expression of MPI programs
  • Improving MPI Implementations
    • Performance
    • Performance
    • Performance
  • New Directions
    • What can displace (or complement) MPI?(Yesterday’s panel presentation on programming models project and tomorrow’s panel on the future of supercomputing)
improving mpi
Improving MPI
  • Simpler interfaces
    • Use compiler or precompiler techniques to support simpler, integrated syntax
    • Fortran 95 arrays, datatypes in C/C++
  • Eliminate function calls
    • Use program analysis and transformation to inline operations
  • More tools for correctness and performance debugging
    • MPI profiling interface is a good start
    • Debugger interface used by Totalview is an example of tool development
    • Effort to provide a common interface to internal performance data, such as idle time waiting for a message
  • Changes to MPI
    • E.g., MPI-2 RMA lacks a read-modify-write
    • But don’t hold your breath
      • These require research and experimentation before they are ready for a standardization process
improving mpi implementations
Improving MPI Implementations
  • Faster Point-to-point
    • Some current implementations make unnecessary copies
  • Collective operations
    • Better algorithms exist
      • SMP optimizations
      • Scatter-gather broadcast, reduce, etc.
  • Optimizing for new hardware
    • RDMA networks
    • NIC-enabled remote atomic operations
  • Wide area networks
    • Optimizations for high latency
    • Speculative sends
    • Quality of service extensions (through MPI attributes)
  • Massive scaling
    • Many implementations optimize internal buffers for modest numbers of processes
    • Some MPI routines (e.g., MPI_Graph_create) do not have scalable definitions
more improvements for mpi implementations
More Improvements for MPI Implementations
  • Reduce latency
    • Automatic techniques to compress code paths
    • Closer match to hardware capabilities
  • Improve RMA
    • Many current implementations at best functional
  • Parallel I/O, particularly for clusters
    • Communication aggregation
    • Reliability in the presence of faults
  • Fault tolerance
    • Exploit MPI Intercommunicators to generalize the two-party model
  • Thread safe and efficient implementations
    • Lock-free design
    • Software engineering for common MPI implementation source tree
  • Many groups working on improved MPI implementations
    • MPICH-2 is an all-new and efficient implementation
      • Includes many of these ideas
      • Designed, as MPICH was, to encourage others to experiment and extend MPI
what s new in mpich2
What’s New in MPICH2
  • Beta-test version available for groups that expect to perform research on MPI implementations with MPICH2
    • Version 0.92 released last Friday
  • Contains
    • All of MPI-1, MPI-I/O, service functions from MPI-2, active-target RMA
    • C, C++, Fortran 77 bindings
    • Example devices for TCP, Infiniband, shared memory
    • Documentation
  • Passes extensive correctness tests
    • Intel test suite (as corrected); good unit test suite
    • MPICH test suite; adequate system test suite
    • Notre Dame C++ tests, based on IBM C test suite
    • Passes more tests than MPICH1 
mpich2 research
MPICH2 Research
  • All new implementation is our vehicle for research in
    • Thread safety and efficiency (e.g., avoid thread locks)
    • Optimized MPI datatypes
    • Optimized Remote Memory Access (RMA)
    • High Scalability (64K MPI processes and more)
    • Exploiting Remote Direct Memory Access (RDMA) capable networks
    • All of MPI-2, including dynamic process management, parallel I/O, RMA
    • Usability and Robustness
      • Software engineering techniques that automate and simplify creating and maintaining a solid, user-friendly implementation
      • Allow extensive runtime error checking but do not require it
      • Integrated performance debugging
    • Clean interfaces to other system components such as scalable process managers
some target platforms
Some Target Platforms
  • Clusters (TCP, UDP, Infiniband, Myrinet, Proprietary Interconnects, …)
  • Clusters of SMPs
  • Grids (UDP, TCP, Globus I/O, …)
  • Cray Red Storm
  • BlueGene/x
    • 64K processors; 64K address spaces
    • ANL/IBM developing MPI for BG/L
  • QCDoC
  • Cray X1 (at least I/O)
  • Other systems
logical structure of mpich 2
(Logical) Structure of MPICH-2

MPICH-2

PMI

ADIO

ADI-3

remshell

Vendors

PVFS

Fork

MPD

bproc

Other parallel

file systems

NFS

XFS

Windows

Unix(python)

HFS

SFS

Myrinet,

Other NIC

Multi-

Method

Existing

Channel

Interface

Portals

In Progress

TCP

shmem

Infiniband

For others

MM

BG/L

shmem

TCP

conclusions
Conclusions
  • The Future of MPI is Bright!
    • Higher-performance implementations
    • More libraries and applications
    • Better tools for developing and tuning MPI programs
    • Leverage of complementary technologies
  • Full MPI-2 implementations will become common
    • Several already exist; many ES apps use MPI RMA