Message-Passing Computing Collective patterns and MPI routines 2 - Synchronizing Computations

Message-Passing ComputingCollective patterns and MPI routines2 - Synchronizing Computations • Barriers implementations • Safety and deadlock • Safe MPI routines 6a.1 ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 4, 2014 slides6d.ppt

Recap on synchronization Our collective data transfer patterns do not specify whether synchronized (an implementation detail) MPI routines that implement these patterns do not generally synchronize processes. Collective data transfer MPI routines have same semantics as if individual MPI_ send()s and MPI_recv()’s were used (according to the MPI standard). Unfortunate as depends on how implemented!!! Different implementations may do things differently and with different network configurations. (Check with new MPI version 3.)

Synchronizing processes Needed in many applications to ensure parallel processes start at same point and complete data is available to work on by processes. Synchronization should be avoided where possible as it delays processes but sometimes it is unavoidable. Some algorithms such as iterative computations require previous iteration values to compute next iteration values. So previous values need to be computed first. (This constraint can be relaxed for increased performance in some cases, see later in course)

Synchronous Message Passing Routines that return when message transfer completed. Synchronous send routine • Returns only after message received (matching receive posted). In MPI, MPI_SSend() routine. Synchronous receive routine • Waits until the message it is expecting arrives. In MPI, actually the regular MPI_recv() routine.

Synchronous Message Passing Synchronous message-passing routines intrinsically perform two actions: • They transfer data and • They synchronize processes.

Synchronous Ssend() and recv() using 3-way protocol Possible implementation In this case, send waits until complete message can be accepted by receiving process before sending message. Then does not need a external message buffer. Process 1 Process 2 Time Request to send Ssend(); Suspend Ac kno wledgment recv(); process Both processes Message contin ue (a) When occurs bef ore send() recv() Process 1 Process 2 Time recv(); Suspend Request to send process Ssend(); Again in MPI actual implementation not specified in standard Message Both processes contin ue Ac kno wledgment (b) When occurs bef ore recv() send()

Parameters of synchronous send(same as blocking send) MPI_Ssend(buf, count, datatype, dest, tag, comm) Address of Datatype of Message tag send b uff er each item Number of items Rank of destination Comm unicator to send process

Asynchronous Message Passing • Routines that do not wait for actions to complete before returning. Usually require local storage for messages. • More than one version depending upon the actual semantics for returning. • In general, they do not synchronize processes but allow processes to move forward sooner. • Must be used with care.

MPI Definitions of Blocking and Non-Blocking • Blocking - return after their local actions complete, though the message transfer may not have been completed. Sometimes called locally blocking. • Non-blocking - return immediately (asynchronous) Non-blocking assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this. Blocking/non-blocking terms may have different interpretations in other systems.

MPI blocking routinesBlock until local actions complete • Blocking send - MPI_send() - blocks only until message is on its way. User can modify buffer after it returns. • Blocking receive - MPI_recv() - blocks until message arrives

MPI Nonblocking Routines • Non-blocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered. User should not modify the send buffer until the communication completes. • Non-blocking receive - MPI_Irecv() - will return even if no message to accept. User should not modify the recv buffer until the communication completes.

Nonblocking Routine Formats MPI_Isend(buf,count,datatype,dest,tag,comm,req) MPI_Irecv(buf,count,datatype,source,tag,comm,req) Completion detected by MPI_Wait() and MPI_Test(). MPI_Wait(req,status) waits until operation completed and returns then. MPI_Test(req,flag,status) returns with flag set indicating whether operation completed at that time.

Example To send an integer x from process 0 to process 1 and allow process 0 to continue: MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { int x; MPI_Isend(&x,1,MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1); compute(); MPI_Wait(req1, status); } else if (myrank == 1) { int x; MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status); }

How message-passing routines return before message transfer completed Message buffer needed between source and destination to hold message: Process 1 Process 2 Message b uff er Time send(); Contin ue recv(); Read process message b uff er

Asynchronous (blocking) routines changing to synchronous routines • Message buffers only of finite length • A point could be reached when send routine held up because all available buffer space exhausted. • Then, send routine will wait until storage becomes re-available - i.e. routine will behave as a synchronous routine.

Barrier Synchronization Basic mechanism for synchronizing processes - inserted at point in each process where it must wait. All processes can only continue from this point when all processes have reached it (or, in some systems, when a stated number of processes have reached this point). 6a.16

MPI Barrier MPI_Barrier(comm) Barrier with a named communicator being only parameter. Called by each process in the group, blocking until all members of group have reached barrier call and only returning then. Communicator

MPI_Barrier use with time stamps A common example of using a barrier is to synchronize processes before taking a time stamp. MPI_Barrier(MPI_COMM_WORLD); start_time = MPI_Wtime(); … \\ Do work MPI_Barrier(MPI_COMM_WORLD); end_time = MPI_Wtime(); 2nd barrier not always needed if synchronization already present, for example a gather in the root.Once root has correct data, who cares what the other processes are doing. We have the answer.

Internal Implementation of a Barrier routine Centralized counter implementation (a linear barrier): O(P) complexity with P processes 6.5

Reentrant code Good barrier implementations must take into account that the same barrier routine might be used more than once in a process. (re-entrant code) Might be possible for a process to enter the barrier for a second time before other processes have left barrier for first time. One counter solution would be to have different counter variables for each instance of barrier - somewhat akin to different lock variables for each critical section in shared memory programming, see later. 6.20

Two phase solution for re-entrant barriers Have two phases: • A process enters arrival phase and does not leave this phase until all processes have arrived in this phase. • Then processes move to departure phase and are released. If these processes reach another barrier, they will enter and remain in the second arrival phase until all have arrived a second time. 6.21

Counter barrier with two phases Still O(P) complexity with P processes. In fact double the number of messages 6.22

Tree Implementation More efficient than a counter. O(log p) steps Suppose 8 processes, P0, P1, P2, P3, P4, P5, P6, P7: 1st stage: P1 sends message to P0; (when P1 reaches its barrier) P3 sends message to P2; (when P3 reaches its barrier) P5 sends message to P4; (when P5 reaches its barrier) P7 sends message to P6; (when P7 reaches its barrier) 2nd stage: P2 sends message to P0; (P2 & P3 reached their barrier) P6 sends message to P4; (P6 & P7 reached their barrier) 3rd stage: P4 sends message to P0; (P4, P5, P6, & P7 reached barrier) P0 terminates arrival phase; (when P0 reaches barrier & received message from P4) Release with a reverse tree construction. 6.23

Tree barrier 6.24

Butterfly Barrier Pattern Means pair of processes synchronize with each other. Could be done one send/recv pair for barrier synchronization 6.25

Using Butterfly Barrier Pattern for all gather operation If each synchronization point were two pairs of send/recvs exchanging data, can distribute data for each process to each of the other processes. After the 1st step P0 and P1 has data of P0 and P1. P2 and P3 has data of P2 and P3. P4 and P5 has data of P4 and P5. P6 and P7 has data of P6 and P7. After the 2nd step P0, P1, P2, and P3 has data of P0, P1, P2, and P3 P4, P5, P6, and P7 has data of P4, P5, P6, and P7 After the 3rd step P0,P1,P2,P3,P4,P5,P6, and P7 has data of P0,P1,P2,P3,P4,P5,P6, and P7

Using Butterfly Barrier Pattern to do all gather operation P0 P1 P2 P6 P3 P4 P7 P5 6.27

Local Synchronization and Data Transfer Suppose a process Pi needs be synchronized and exchange data with process Pi-1. Could consider: Need synchronous send()’s if synchronization as well as data transfer. Not a perfect three-process barrier because process Pi-1 will only synchronize with Pi and continue as soon as Pi allows. Similarly, process Pi+1 only synchronizes with Pi. 6.28

Safety and Deadlock When all processes send their messages first and then receive all of their messages is “unsafe” because it relies upon buffering in the send()s. The amount of buffering is not specified in MPI. If insufficient storage available, send routine may be delayed from returning until storage becomes available or until the message can be sent without buffering. Then, a locally blockingsend() could behave as a synchronous send(), only returning when the matching recv() is executed. Since a matching recv() would never be executed if all the send()s are synchronous, deadlock would occur. 6.29

Making the code safe Alternate the order of the send()s and recv()s in adjacent processes so that only one process performs the send()s first. Then even synchronous send()s would not cause deadlock. Example Linear pipeline, deadlock can be avoided by arranging so the even-numbered processes perform their sends first and the odd-numbered processes perform their receives first. Good way you can test for safety is to replace message-passing routines in a program with synchronous versions. 6.30

MPI Safe Message Passing Routines MPI offers several methods for safe communication: 6.31

Combined deadlock-free blocking sendrecv() routines MPI provides MPI_Sendrecv()and MPI_Sendrecv_replace(). (with12 parameters!) 6.32

MPI_Sendrecv() Combines blocking send with blocking receive operation without deadlock Source and destination can be the same or different. int MPI_Sendrecv( void *sendbuf, int sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status *status) } Parameters for send Parameters for receive Same communicator 6.33

MPI Version 3 • Approved Sept 21, 2012 • http://www.mpi-forum.org/docs/docs.html • Extension of previous MPI 2.2 • Major update of MPI • Collective routine added including: • Nonblocking Collective Operations • New One-sided Communication Operations • Neighborhood Collectives • C++ bindings dropped as added little! Not done in class. Can still use C routines in C++ programs.)

Questions

Message-Passing Computing Collective patterns and MPI routines 2 - Synchronizing Computations

Message-Passing Computing Collective patterns and MPI routines 2 - Synchronizing Computations

Presentation Transcript

Message Passing Programming (MPI)

MPI – Message Passing Interface

MPI: Message-Passing Interface Chapter 2

Message Passing Interface (MPI)

MPI: Message-Passing Interface

MPI: Message Passing Interface

Message Passing Interface (MPI)

MPI Message Passing Interface

Message Passing Interface (MPI)

Message Passing Interface (MPI) 2

Message Passing and MPI Collective Operations and Buffering

Message Passing Interface (MPI)

Message Passing Interface (MPI) 2

MPI Message Passing Interface

MPI – Message Passing Interface

MPI Message Passing Interface

Message Passing Interface MPI