150 likes | 260 Views
Prepare for the upcoming project proposal due on March 13, which requires a 2-page text and a 1-page action plan. You will also need to include three references. Note that there will be no class on March 13, and proposals should be placed in the box provided. Paper presentations are scheduled for March 11. The assignment focuses on data distribution methods for matrix multiplication, examining one-dimensional and two-dimensional decompositions, and their impact on communication costs. Analyze the effectiveness of different algorithms in relation to these methods.
E N D
CS 584 Lecture 20 • Assignment • Glenda program • Project Proposal is coming up! (March 13) • 2 pages text + 1 page plan of action • 3 references • No class March 13 • Put your project proposal in my box. • Paper presentations on March 11 (Tom Abbott)
Case Study: Matrix Multiply • Goal: Data-distribution neutral • Three basic ways to distribute • row • column • submatrix • Question? • Does our library need different algorithms?
Analytical Model • Compare the two algorithms • Ignore the computation costs • What are the communication costs.
One Dimensional Decomposition • Each processor "owns" black portion • To compute the owned portion of the answer, each processor requires all of A. • This affects data-distribution.
1-D Decomp. æ ö 2 N ç ÷ = - + T ( P 1 ) t t ç ÷ s w P è ø
Two Dimensional Decomposition • Requires less data per processor • Algorithm can be performed stepwise.
Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards
Algorithm Set B' = Blocal for j = 0 to sqrt(P) -2 in each row I the [(I+j) mod sqrt(P)]th task broadcasts A' = Alocal to the other tasks in the row accumulate A' * B' send B' to upward neighbor done
2-D Decomp. ( ) æ ö 2 æ ö log P N ç ÷ = - + + ç ÷ T P 1 1 t t ç ÷ s w 2 P è ø è ø
Redistribution • If we only have one algorithm, we need to possibly redistribute the data • How much does this cost?
Redistribution ( ) æ ö 2 N ÷ = - + T P 1 t t ç ÷ s w P P è ø
Analysis • Performance analysis reveals that the 2 dimensional decomposition is always better. • So our matrix multiply only needs one algorithm • Might need redistribution algorithm to be totally data distribution neutral • However, this is not the best algorithm.
Systolic Algorithm ( ) æ ö 2 N ç ÷ = - + T 2 P 1 t t ç ÷ s w P ø è