1 / 25

Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication

Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication. Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco. Introduction. Objective: To further examine message passing communication patterns. Topics Implementing Allgather Ring

wendi
Download Presentation

Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Processing (CS 667)Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco Parallel Processing

  2. Introduction • Objective: To further examine message passing communication patterns. • Topics • Implementing Allgather • Ring • Hypercube • Non-blocking send/recv • MPI_Isend • MPI_Wait • MPI_Test Parallel Processing

  3. Broadcast/Reduce Ring P3 P2 P3 P2 P0 P0 P1 P1 P3 P2 P3 P2 P0 P0 P1 P1 Parallel Processing

  4. Bi-directional Broadcast Ring P3 P2 P3 P2 P0 P0 P1 P1 P3 P2 P0 P1 Parallel Processing

  5. Allgather Ring P3 P2 P3 P2 x3 x2 x2,x3 x1,x2 x0 x1 x0,x3 x0,x1 P0 P0 P1 P1 P3 P2 P3 P2 x1,x2,x3 x0,x1,x2 x0,x1,x2,x3 x0,x1,x2,x3 x0,x1,x3 x0,x1,x2,x3 x0,x2,x3 x0,x1,x2,x3 P0 P0 P1 P1 Parallel Processing

  6. AllGather • int MPI_AllGather( • void* send_data /* in */ • int send_count /* in */ • MPI_Datatype send_type /* in */ • void* recv_data /* out */ • int recv_count /* in */ • MPI_Datatype recv_type /* in */ • MPI_Comm communicator /* in */) x0 Process 0 x1 Process 1 Process 2 x2 Process 3 x3 Parallel Processing

  7. Allgather_ring void Allgather_ring(float x[], int blocksize, float y[], MPI_Comm comm) { int i, p, my_rank; int successor, predecessor; int send_offset, recv_offset; MPI_Status status; MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; successor = (my_rank + 1) % p; predecessor = (my_rank – 1 + p) % p; Parallel Processing

  8. Allgather_ring for (i=0; i < p-1; i++) { send_offset = ((my_rank – i + p) % p)*blocksize; recv_offset = ((my_rank –i – 1+p) % p)*blocksize; MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm); MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&status); } } Parallel Processing

  9. 1 0 110 111 011 010 11 10 100 101 001 000 01 00 Hypercube • Graph (recursively defined) • n-dimensional cube has 2n nodes with each node connected to n vertices • Binary labels of adjacent nodes differ in one bit Parallel Processing

  10. 110 111 011 010 100 101 001 000 Broadcast/Reduce Parallel Processing

  11. 110 111 011 010 100 101 001 000 Allgather Parallel Processing

  12. Allgather 0 1 2 4 3 5 6 7 0 1 2 4 3 5 6 7 0 1 2 4 3 5 6 7 0 1 2 4 3 5 6 7 Parallel Processing

  13. Allgather_cube void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) { int i, d, p, my_rank; unsigned eor_bit, and_bits; int stage, partner; MPI_Datatype hole_type; int send_offset, recv_offset; MPI_Status status; int log_base2(int p); MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1; Parallel Processing

  14. Allgather_cube for (stage = 0; stage < d; stage++) { partner = my_rank ^ eor_bit; send_offset = (my_rank & and_bits) * blocksize; recv_offset = (partner & and_bits)*blocksize; MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage))*blocksize, MPI_FLOAT,&hold_type); MPI_Type_commit(&hole_type); MPI_Send(y+send_offset,1,hole_type,partner, 0, comm); MPI_Recv(y+recv_offset,1,hole_type,partner, 0, comm,&status); MPI_Type_free(&hole_type); eor_bit = eor_bit >> 1; and_bits = and_bits >> 1; } Parallel Processing

  15. Buffering Assumption • Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. • SendRecv can be used to guarantee that deadlock does not occur. Parallel Processing

  16. SendRecv • int MPI_Sendrecv( • void* send_buf /* in */, • int send_count /* in */, • MPI_Datatype send_type /* in */, • int dest /* in */, • int send_tag /* in */, • void* recv_buf /* out */, • int recv_count /* in */, • MPI_Datatype recv_type /* in */, • int source /* in */, • int recv_tag /* in */, • MPI_Comm communicator /* in */, • MPI_Status* status /* out */) Parallel Processing

  17. SendRecvReplace • int MPI_Sendrecv_replace( • void* buffer /* in */, • int count /* in */, • MPI_Datatype datatype /* in */, • int dest /* in */, • int send_tag /* in */, • int source /* in */, • int recv_tag /* in */, • MPI_Comm communicator /* in */, • MPI_Status* status /* out */) Parallel Processing

  18. Nonblocking Send/Recv • Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur. • The communication is posted and can be tested later for completion • int MPI_Isend( /* Immediate */ • void* buffer /* in */, • int count /* in */, • MPI_Datatype datatype /* in */, • int dest /* in */, • int tag /* in */, • MPI_Comm comm /* in */, • MPI_Request* request /* out */) Parallel Processing

  19. Nonblocking Send/Recv • int MPI_Irecv( • void* buffer /* in */, • int count /* in */, • MPI_Datatype datatype /* in */, • int source /* in */, • int tag /* in */, • MPI_Comm comm /* in */, • MPI_Request* request /* out */) • int MPI_Wait( • MPI_Request* request /* in/out a*/, • MPI_Status* status /* out */) • int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status); Parallel Processing

  20. Allgather_ring (Overlapped) recv_offset = ((my_rank –1 + p) % p)*blocksize; for (i=0; i < p-1; i++) { MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm, &send_request); MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&recv_request); send_offset = ((my_rank – i -1 + p) % p)*blocksize; recv_offset = ((my_rank – i – 2 +p) % p)*blocksize; MPI_Wait(&send_request, &status); MPI_Wait(&recv_request, &status); } Parallel Processing

  21. AllGather • int MPI_AllGather( • void* send_data /* in */ • int send_count /* in */ • MPI_Datatype send_type /* in */ • void* recv_data /* out */ • int recv_count /* in */ • MPI_Datatype recv_type /* in */ • MPI_Comm communicator /* in */) x0 Process 0 x1 Process 1 Process 2 x2 Process 3 x3 Parallel Processing

  22. Alltoall • int MPI_Alltoall( • void* send_buffer /* in */ • int send_count /* in */ • MPI_Datatype send_type /* in */ • void* recv_buffer /* out */ • int recv_count /* in */ • MPI_Datatype recv_type /* in */ • MPI_Comm communicator /* in */) Process 0 02 00 03 20 01 00 30 10 Process 1 10 13 12 21 11 01 31 11 Process 2 20 23 21 02 32 22 12 22 Process 3 30 33 31 03 33 32 13 23 Parallel Processing

  23. AlltoAll • Sequence of permutations implemented with send_recv Parallel Processing

  24. AlltoAll (2 way) • Sequence of permutations implemented with send_recv Parallel Processing

  25. Communication Modes • Synchronous (wait for receive) • Ready (make sure receive has been posted) • Buffered (user provides buffer space) Parallel Processing

More Related