friday october 20 2006 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Friday, October 20, 2006 PowerPoint Presentation
Download Presentation
Friday, October 20, 2006

Loading in 2 Seconds...

play fullscreen
1 / 52
deirdre-lane

Friday, October 20, 2006 - PowerPoint PPT Presentation

98 Views
Download Presentation
Friday, October 20, 2006
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Friday, October 20, 2006 “Work expands to fill the time available for its completion.” • Parkinson’s 1st Law

  2. MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_Get_count(MPI_Status *status, MPI_Datatype datatype int *count_recvd) • Returns number of entries received in count_recvd variable.

  3. Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed row-wise (n/p rows per process) • Each process computes different portion of x

  4. Matrix Vector Multiplication (Initial distribution. Colors represent data distributed on different processes) n/p rows A b x

  5. Matrix Vector Multiplication (Colors represent that all parts of b are required by each process) n/p rows b x A

  6. Matrix Vector Multiplication (All parts of b are required by each process) • Which collective operation can we use?

  7. Matrix Vector Multiplication (All parts of b are required by each process)

  8. Collective communication

  9. Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed column-wise (n/p columnsper process) • Each process computes different portion of x.

  10. Matrix Vector Multiplication (initial distribution. Colors represent data distributed on different processes) n/p cols b x A

  11. Partial sums calculated by each process n/p cols partial x0 A b x

  12. MPI_Reduce count=4 dest=1 Task 0 Task 1 Task 2 Task 3 Task 1 Element wise reduction can be done.

  13. Row-wise requires one MPI_Allgather operation. • Column-wise requires MPI_Reduce and MPI_Scatter operations.

  14. Matrix Matrix Multiplication • A and B are nxn matrices • p is the number of processing elements • The matrices are partitioned into blocks of size n/√p x n/√p

  15. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C

  16. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C BUT!To compute Ci,jwe need all sub-matrices Ai,kand Bk,j for 0<=k<√p

  17. To compute Ci,j we need all sub-matrices Ai,k and Bk,j for 0<=k<√p • All to all broadcast of matrix A’s blocks in each row • All to all broadcast of matrix B’s blocks in each column

  18. Canon’s Algorithm • Memory efficient version of the previous algorithm. Each process in ith row requires all√p sub-matrices Ai,k 0<=k<√p • Schedule computation so that computation of √p processes in ith row use diferent Ai,k at any given time

  19. 16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B

  20. A B C

  21. Canon’s Algorithm A B C To compute C0,0we need all sub-matrices A0,kand Bk,0 for 0<=k<√p

  22. Canon’s Algorithm Shift left Shift up A B C

  23. Canon’s Algorithm Shift left Shift up A B C

  24. Canon’s Algorithm Shift left Shift up A B C Sequence of √p sub-matrix multiplications done.

  25. A B C

  26. A01 and B01 should not be multiplied! A B C

  27. A B C Some initial alignment required!

  28. A B C Shift all sub-matrices Ai,j to the left (with wraparound) by i steps Shift all sub-matrices Bi,j up (with wraparound) by j steps After circular shift operations, Pijhas submatrices Ai,(j+i)mod√p and B(i+j)mod√p, j

  29. After initial alignment: A B

  30. Topologies • Many computational science and engineering problems use a series of matrix or grid operations. • The dimensions of the matrices or grids are often determined by the physical problems. • Frequently in multiprocessing, these matrices or grids are partitioned, or domain-decomposed, so that each partition is assigned to a process.

  31. Topologies • MPI uses linear ordering and views processes in 1-D topology. • Although it is still possible to refer to each of the partitions by a linear rank number, a mapping of the linear process rank to a higher dimensional virtual rank numbering would facilitate a much clearer and natural computational representation.

  32. Topologies • To address the needs of this MPI library provides topology routines. • Interacting processes would be identified by coordinates in that topology.

  33. Topologies • Each MPI process would be mapped in the higher dimensional topology. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly connected in a hypercube.

  34. Topologies • Ideally, mapping would be determined by interaction among processes and connectivity of physical processors. • However, mechanism for assigning ranks to MPI does not use information about interconnection network. • Reason:Architecture independent advantages of MPI (otherwise different mappings would have to be specified for different interconnection networks) • Left to MPI library to find appropriate mapping that reduces cost of sending and receiving messages.

  35. MPI allows specification of virtual process topologies of in terms of a graph • Each node in graph corresponds to a process and edge exists between two nodes if they communicate with each other. • Most common topologies are Cartesian topologies (one, two or higher grids)

  36. Creating and Using Cartesian Topologies • We can create Cartesian topologies using the function: • int MPI_Cart_create( • MPI_Comm comm_old, int ndims, • int *dims, int *periods, • int reorder, MPI_Comm *comm_cart)

  37. With processes renamed in a 2D grid topology, we are able to assign or distribute work, or distinguish among the processes by their grid topology rather than by their linear process ranks.

  38. MPI_CART_CREATE is a collective communication function. • It must be called by all processes in the group.

  39. Creating and Using Cartesian Topologies • Since sending and receiving messages still require (one-dimensional) ranks, MPI provides routines to convert ranks to Cartesian coordinates and vice-versa. int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int *coords) int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank)

  40. Creating and Using Cartesian Topologies • The most common operation on Cartesian topologies is a shifting data along a dimension of the topology. int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int *rank_source, int *rank_dest) • MPI_CART_SHIFT is used to find two "nearby" neighbors of the calling process along a specific direction of an N-dimensional Cartesian topology. • This direction is specified by the input argument, direction, to MPI_CART_SHIFT. • The two neighbors are called "source" and "destination" ranks.