98 Views

Download Presentation
##### Friday, October 20, 2006

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Friday, October 20, 2006**“Work expands to fill the time available for its completion.” • Parkinson’s 1st Law**MPI_Recv(void *buf, int count,**MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_Get_count(MPI_Status *status, MPI_Datatype datatype int *count_recvd) • Returns number of entries received in count_recvd variable.**Matrix Vector Multiplication**• n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed row-wise (n/p rows per process) • Each process computes different portion of x**Matrix Vector Multiplication (Initial distribution. Colors**represent data distributed on different processes) n/p rows A b x**Matrix Vector Multiplication (Colors represent that all**parts of b are required by each process) n/p rows b x A**Matrix Vector Multiplication (All parts of b are required by**each process) • Which collective operation can we use?**Matrix Vector Multiplication (All parts of b are required by**each process)**Matrix Vector Multiplication**• n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed column-wise (n/p columnsper process) • Each process computes different portion of x.**Matrix Vector Multiplication (initial distribution. Colors**represent data distributed on different processes) n/p cols b x A**Partial sums calculated by each process**n/p cols partial x0 A b x**MPI_Reduce**count=4 dest=1 Task 0 Task 1 Task 2 Task 3 Task 1 Element wise reduction can be done.**Row-wise requires one MPI_Allgather operation.**• Column-wise requires MPI_Reduce and MPI_Scatter operations.**Matrix Matrix Multiplication**• A and B are nxn matrices • p is the number of processing elements • The matrices are partitioned into blocks of size n/√p x n/√p**16 processes each represented by a different color.**Different portions of the nxn matrices are divided among these processes. A B C**16 processes each represented by a different color.**Different portions of the nxn matrices are divided among these processes. A B C BUT!To compute Ci,jwe need all sub-matrices Ai,kand Bk,j for 0<=k<√p**To compute Ci,j we need all sub-matrices Ai,k and Bk,j for**0<=k<√p • All to all broadcast of matrix A’s blocks in each row • All to all broadcast of matrix B’s blocks in each column**Canon’s Algorithm**• Memory efficient version of the previous algorithm. Each process in ith row requires all√p sub-matrices Ai,k 0<=k<√p • Schedule computation so that computation of √p processes in ith row use diferent Ai,k at any given time**16 processes each represented by a different color.**Different portions of the nxn matrices are divided among these processes. A B**A**B C**Canon’s Algorithm**A B C To compute C0,0we need all sub-matrices A0,kand Bk,0 for 0<=k<√p**Canon’s Algorithm**Shift left Shift up A B C**Canon’s Algorithm**Shift left Shift up A B C**Canon’s Algorithm**Shift left Shift up A B C Sequence of √p sub-matrix multiplications done.**A**B C**A**B C Some initial alignment required!**A**B C Shift all sub-matrices Ai,j to the left (with wraparound) by i steps Shift all sub-matrices Bi,j up (with wraparound) by j steps After circular shift operations, Pijhas submatrices Ai,(j+i)mod√p and B(i+j)mod√p, j**Topologies**• Many computational science and engineering problems use a series of matrix or grid operations. • The dimensions of the matrices or grids are often determined by the physical problems. • Frequently in multiprocessing, these matrices or grids are partitioned, or domain-decomposed, so that each partition is assigned to a process.**Topologies**• MPI uses linear ordering and views processes in 1-D topology. • Although it is still possible to refer to each of the partitions by a linear rank number, a mapping of the linear process rank to a higher dimensional virtual rank numbering would facilitate a much clearer and natural computational representation.**Topologies**• To address the needs of this MPI library provides topology routines. • Interacting processes would be identified by coordinates in that topology.**Topologies**• Each MPI process would be mapped in the higher dimensional topology. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly connected in a hypercube.**Topologies**• Ideally, mapping would be determined by interaction among processes and connectivity of physical processors. • However, mechanism for assigning ranks to MPI does not use information about interconnection network. • Reason:Architecture independent advantages of MPI (otherwise different mappings would have to be specified for different interconnection networks) • Left to MPI library to find appropriate mapping that reduces cost of sending and receiving messages.**MPI allows specification of virtual process topologies of in**terms of a graph • Each node in graph corresponds to a process and edge exists between two nodes if they communicate with each other. • Most common topologies are Cartesian topologies (one, two or higher grids)**Creating and Using Cartesian Topologies**• We can create Cartesian topologies using the function: • int MPI_Cart_create( • MPI_Comm comm_old, int ndims, • int *dims, int *periods, • int reorder, MPI_Comm *comm_cart)**With processes renamed in a 2D grid topology, we are able to**assign or distribute work, or distinguish among the processes by their grid topology rather than by their linear process ranks.**MPI_CART_CREATE is a collective communication function.**• It must be called by all processes in the group.**Creating and Using Cartesian Topologies**• Since sending and receiving messages still require (one-dimensional) ranks, MPI provides routines to convert ranks to Cartesian coordinates and vice-versa. int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int *coords) int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank)**Creating and Using Cartesian Topologies**• The most common operation on Cartesian topologies is a shifting data along a dimension of the topology. int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int *rank_source, int *rank_dest) • MPI_CART_SHIFT is used to find two "nearby" neighbors of the calling process along a specific direction of an N-dimensional Cartesian topology. • This direction is specified by the input argument, direction, to MPI_CART_SHIFT. • The two neighbors are called "source" and "destination" ranks.