1 / 52

Friday, October 20, 2006

Friday, October 20, 2006. “Work expands to fill the time available for its completion.” Parkinson’s 1st Law. MPI_Recv(void *buf, int count , MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status ) MPI_Get_count( MPI_Status *status , MPI_Datatype datatype

Download Presentation

Friday, October 20, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Friday, October 20, 2006 “Work expands to fill the time available for its completion.” • Parkinson’s 1st Law

  2. MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_Get_count(MPI_Status *status, MPI_Datatype datatype int *count_recvd) • Returns number of entries received in count_recvd variable.

  3. Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed row-wise (n/p rows per process) • Each process computes different portion of x

  4. Matrix Vector Multiplication (Initial distribution. Colors represent data distributed on different processes) n/p rows A b x

  5. Matrix Vector Multiplication (Colors represent that all parts of b are required by each process) n/p rows b x A

  6. Matrix Vector Multiplication (All parts of b are required by each process) • Which collective operation can we use?

  7. Matrix Vector Multiplication (All parts of b are required by each process)

  8. Collective communication

  9. Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed column-wise (n/p columnsper process) • Each process computes different portion of x.

  10. Matrix Vector Multiplication (initial distribution. Colors represent data distributed on different processes) n/p cols b x A

  11. Partial sums calculated by each process n/p cols partial x0 A b x

  12. MPI_Reduce count=4 dest=1 Task 0 Task 1 Task 2 Task 3 Task 1 Element wise reduction can be done.

  13. Row-wise requires one MPI_Allgather operation. • Column-wise requires MPI_Reduce and MPI_Scatter operations.

  14. Matrix Matrix Multiplication • A and B are nxn matrices • p is the number of processing elements • The matrices are partitioned into blocks of size n/√p x n/√p

  15. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C

  16. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C BUT!To compute Ci,jwe need all sub-matrices Ai,kand Bk,j for 0<=k<√p

  17. To compute Ci,j we need all sub-matrices Ai,k and Bk,j for 0<=k<√p • All to all broadcast of matrix A’s blocks in each row • All to all broadcast of matrix B’s blocks in each column

  18. Canon’s Algorithm • Memory efficient version of the previous algorithm. Each process in ith row requires all√p sub-matrices Ai,k 0<=k<√p • Schedule computation so that computation of √p processes in ith row use diferent Ai,k at any given time

  19. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B

  20. A B C

  21. Canon’s Algorithm A B C To compute C0,0we need all sub-matrices A0,kand Bk,0 for 0<=k<√p

  22. Canon’s Algorithm Shift left Shift up A B C

  23. Canon’s Algorithm Shift left Shift up A B C

  24. Canon’s Algorithm Shift left Shift up A B C Sequence of √p sub-matrix multiplications done.

  25. A B C

  26. A01 and B01 should not be multiplied! A B C

  27. A B C Some initial alignment required!

  28. A B C Shift all sub-matrices Ai,j to the left (with wraparound) by i steps Shift all sub-matrices Bi,j up (with wraparound) by j steps After circular shift operations, Pijhas submatrices Ai,(j+i)mod√p and B(i+j)mod√p, j

  29. After initial alignment: A B

  30. Topologies • Many computational science and engineering problems use a series of matrix or grid operations. • The dimensions of the matrices or grids are often determined by the physical problems. • Frequently in multiprocessing, these matrices or grids are partitioned, or domain-decomposed, so that each partition is assigned to a process.

  31. Topologies • MPI uses linear ordering and views processes in 1-D topology. • Although it is still possible to refer to each of the partitions by a linear rank number, a mapping of the linear process rank to a higher dimensional virtual rank numbering would facilitate a much clearer and natural computational representation.

  32. Topologies • To address the needs of this MPI library provides topology routines. • Interacting processes would be identified by coordinates in that topology.

  33. Topologies • Each MPI process would be mapped in the higher dimensional topology. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly connected in a hypercube.

  34. Topologies • Ideally, mapping would be determined by interaction among processes and connectivity of physical processors. • However, mechanism for assigning ranks to MPI does not use information about interconnection network. • Reason:Architecture independent advantages of MPI (otherwise different mappings would have to be specified for different interconnection networks) • Left to MPI library to find appropriate mapping that reduces cost of sending and receiving messages.

  35. MPI allows specification of virtual process topologies of in terms of a graph • Each node in graph corresponds to a process and edge exists between two nodes if they communicate with each other. • Most common topologies are Cartesian topologies (one, two or higher grids)

  36. Creating and Using Cartesian Topologies • We can create Cartesian topologies using the function: • int MPI_Cart_create( • MPI_Comm comm_old, int ndims, • int *dims, int *periods, • int reorder, MPI_Comm *comm_cart)

  37. With processes renamed in a 2D grid topology, we are able to assign or distribute work, or distinguish among the processes by their grid topology rather than by their linear process ranks.

  38. MPI_CART_CREATE is a collective communication function. • It must be called by all processes in the group.

  39. Creating and Using Cartesian Topologies • Since sending and receiving messages still require (one-dimensional) ranks, MPI provides routines to convert ranks to Cartesian coordinates and vice-versa. int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int *coords) int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank)

  40. Creating and Using Cartesian Topologies • The most common operation on Cartesian topologies is a shifting data along a dimension of the topology. int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int *rank_source, int *rank_dest) • MPI_CART_SHIFT is used to find two "nearby" neighbors of the calling process along a specific direction of an N-dimensional Cartesian topology. • This direction is specified by the input argument, direction, to MPI_CART_SHIFT. • The two neighbors are called "source" and "destination" ranks.

More Related