440 likes | 585 Views
This resource provides a comprehensive overview of the Message Passing Interface (MPI) essential for parallel programming. It covers the basics of parallel computing, including point-to-point communications and collective communication features within MPI. The agenda includes a lecture by Hanjun Kin from Princeton, along with advanced MPI features and practical examples. Major challenges in computational science, such as weather forecasting and big data, are discussed, emphasizing the need for efficient parallel processing. Additional resources and guides for MPI are also included.
E N D
MPI Message Passing Interface YvonKermarrec
More readings • “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers • LAM/MPI User Guide: http://www.lam-mpi.org/tutorials/lam/ • The MPI standard is available from http://www.mpi-forum.org/
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Serial Computing • 1k pieces puzzle • Takes 10 hours
Parallelism on Shared Memory • Orange and brown share the puzzle on the same table • Takes 6 hours(not 5 due to communication & contention)
The more, the better?? • Lack of seats (Resource limit) • More contention among people
Parallelism on Distributed Systems • Scalable seats (Scalable Resource) • Less contention from private memory spaces
How to share the puzzle? • DSM (Distributed Shared Memory) • Message Passing
DSM (Distributed Shared Memory) • Provides shared memory physically or virtually • Pros - Easy to use • Cons - Limited Scalability, High coherence overhead
Message Passing • Pros – Scalable, Flexible • Cons – Someone says it’s more difficult than DSM
Agenda • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Weneed more computational power • The weatherforcastexample by P Pacheco: • Suppose wewish to predict the weather over the United and Canada for the next 48 hours • Also suppose thatwewant to model the atmospherefromsealevel to an altitude of 20 km • we use a cubicalgrid, witheach cube measuring 0.1 km to model the atmosphere ,or 2.0 x 107 km2 x 20 km x 103 cubes per km3 = 4 x 1011grid points • Suppose weneed to computer 100 instructions for each points for the next 48 hours : weneed4 x 1013 x 48 operations • If our computer executes109 ope/sec, weneed 23 days
The need for parallelprogramming • We face numerous challenges in science (biology, simulation, earthquakes, …) and wecannotbuildfastenough computers…. • Data canbebig (big data…) and memoryisratherlimited • Processors can do a lot ... But to adress figures as mentionnedwecan program smarter but thatis not enough
The need for parallel machines • Wecanbuild a parallel machines, but thereisstill a hugeamount of work to bedone: • decide on and implement an interconnection network for the processors and memory modules, • design and implement system software for the hardware • Design algorithms and data structures to solve our problem • Divide the algorithms and data structures into subproblems • Indentify the communications and data exchanges • Assign subproblems to processors
The need for parallel machines • Flynn’staxonomy (or how to work more!) • SISD : Single Instruction – Single Data : the common and classical machine… • SIMD : Single Instruction – Multiple data : the same instructions are carried out simultaneously on multiple data items • MIMD : Multiple Instructions – Multiple Data • SPMD : Single Program – Multiple Data : the same version of the program isreplicated and run on different data
The need for parallel machines • Wecanbuild one parallel computer … but thatwouldbeveryexpensive, time and energyconsuming, … and hard to maintain • Wemaywant to integratewhatisavailable in the labs – to agregate the availablecomputing ressources and reuseordinary machines : • US D.oEnergy and the PVM project (Parallel Virtual Machine) from ‘89
MPI : Message Passing Interface ? • MPI : an Interface • A message-passing libraryspecification • extended message-passing model • not a language or compiler specification • not a specific implementation or product • For parallel computers, clusters, and heterogeneous networks • A riche set of features • Designed to provide access to advanced parallel hardware for end users, library writers, and tooldevelopers
MPI ? • An international product • Earlyvendorsystems (Intel’s NX, IBM’s EUI, TMC’s CMMD) were not portable • Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts • Were rather limited… and lackedvendor support • Were not implemented at the most efficient level • The MPI Forum organized in 1992 with broad participation by: • vendors: IBM, Intel, TMC, SGI, Convex … • users: application scientists and library writers
How big is the MPI library? • Huge ( 125 Functions )… • Basic ( 6 Functions ) • But only a subset is needed to program a distributed application
Environments for parallelprogramming • Upshot, Jumpshot, and MPE tools • http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/ • • Pallas VAMPIR • http://www.vampir.eu/ • • Paragraph • http://www.ncsa.uiuc.edu/Apps/MCS/ParaGraph/ParaGraph.html
A Minimal MPI Program in C #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!\n" ); MPI_Finalize(); return 0; }
Finding Out About the Environment • Two important questions that arise early in a parallel program are: • How many processes are participating in this computation? • Which one am I? • MPI provides functions to answer these questions: • MPI_Comm_size reports the number of processes. • MPI_Comm_rank reports the rank, a number between 0 and size-1, identifying the calling process
Better Hello (C) #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
Some Basic Concepts • Processes can be collected into groups. • Each message is sent in a context, and must be received in the same context. • A group and context together form a communicator. • A process is identified by its rank in the group associated with a communicator. • There is a default communicator whose group contains all initial processes, called MPI_COMM_WORLD.
MPI Datatypes • The data in a message to sent or received is described by a triple (address, count, datatype), where • An MPI datatypeis recursively defined as: • predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION) • a contiguous array of MPI datatypes • an indexed array of blocks of datatypes • an arbitrary structure of datatypes • There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.
Basic MPI types MPI datatypeC datatype MPI_CHAR signed char MPI_SIGNED_CHAR signed char MPI_UNSIGNED_CHAR unsigned char MPI_SHORT signed short MPI_UNSIGNED_SHORT unsigned short MPI_INT signed int MPI_UNSIGNED unsigned int MPI_LONG signed long MPI_UNSIGNED_LONG unsigned long MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double
MPI Tags • Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. • Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. • Some non-MPI message-passing systems have called tags “message types”. MPI calls them tags to avoid confusion with datatypes.
MPI blocking send MPI_SEND(void *start, int count,MPI_DATATYPE datatype, int dest, int tag, MPI_COMM comm) • The message buffer is described by (start, count, datatype). • dest is the rank of the target process in the defined communicator. • tag is the message identification number.
MPI Basic (Blocking) Receive MPI_RECV(start, count, datatype, source, tag, comm, status) • Waits until a matching (on source and tag) message is received from the system, and the buffer can be used. • source is rank in communicator specified by comm, or MPI_ANY_SOURCE. • status contains further information • Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.
Retrieving Further Information • Status is a data structure allocated in the user’s program. • In C: intrecvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count);
More info • A receive operation may accept messages from an arbitrary sender, but a send operation must specify a unique receiver. • Source equals destination is allowed, that is, a process can send a message to itself.
Why MPI is simple? • Many parallel programs can be written using just these six functions, only two of which are non-trivial; • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV
Simple full example #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { const int tag = 42; /* Message tag */ int id, ntasks, source_id, dest_id, err, i; MPI_Status status; int msg[2]; /* Message array */ err = MPI_Init(&argc, &argv); /* Initialize MPI */ if (err != MPI_SUCCESS) { printf("MPI initialization failed!\n"); exit(1); } err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */ err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */ if (ntasks < 2) { printf("You have to use at least 2 processors to run this program\n"); MPI_Finalize(); /* Quit if there is only one processor */ exit(0); }
Simple full example (Cont.) if (id == 0) { /* Process 0 (the receiver) does this */ for (i=1; i<ntasks; i++) { err = MPI_Recv(msg, 2, MPI_INT, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, \ &status); /* Receive a message */ source_id = status.MPI_SOURCE; /* Get id of sender */ printf("Received message %d %d from process %d\n", msg[0], msg[1], \ source_id); } } else { /* Processes 1 to N-1 (the senders) do this */ msg[0] = id; /* Put own identifier in the message */ msg[1] = ntasks; /* and total number of processes */ dest_id = 0; /* Destination address */ err = MPI_Send(msg, 2, MPI_INT, dest_id, tag, MPI_COMM_WORLD); } err = MPI_Finalize(); /* Terminate MPI */ if (id==0) printf("Ready\n"); exit(0); return 0; }
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Collective communications • A single call handles the communication between all the processes in a communicator • There are 3 types of collective communications • Data movement (e.g. MPI_Bcast) • Reduction (e.g. MPI_Reduce) • Synchronization (e.g. MPI_Barrier)
Broadcast • intMPI_Bcast(void *buffer, int count, MPI_Datatypedatatype, int root, MPI_Commcomm); • One process (root) sends data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Bcast P3 P3 P4 P4
Gather • intMPI_Gather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, int root, MPI_Commcomm) • One process (root) collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Gather P3 P3 P4 P4
Gather to All • intMPI_Allgather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, MPI_Commcomm) • All the processes collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Allgather P3 P3 P4 P4
Reduction • int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) • One process (root) collects data to all the other processes in the same communicator, and performs an operation on the data • MPI_SUM, MPI_MIN, MPI_MAX, MPI_PROD, logical AND, OR, XOR, and a few more • MPI_Op_create(): User defined operator P1 P1 P2 P2 MPI_Reduce P3 P3 P4 P4
Synchronization • intMPI_Barrier(MPI_Commcomm) #include "mpi.h" #include <stdio.h> int main(intargc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); return 0; }
Examples…. • Master and slaves
For more functions… • http://www.mpi-forum.org • http://www.llnl.gov/computing/tutorials/mpi/ • http://www.nersc.gov/nusers/help/tutorials/mpi/intro/ • http://www-unix.mcs.anl.gov/mpi/tutorial/ • MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/) • Open MPI (http://www.open-mpi.org/) • http://w3.pppl.gov/~ethier/MPI_OpenMP_2011.pdf