1 / 15

CS 584 Lecture 20

CS 584 Lecture 20. Assignment Glenda program Project Proposal is coming up! (March 13) 2 pages text + 1 page plan of action 3 references No class March 13 Put your project proposal in my box. Paper presentations on March 11 (Tom Abbott). Module Compostion. Case Study: Matrix Multiply.

virote
Download Presentation

CS 584 Lecture 20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 584 Lecture 20 • Assignment • Glenda program • Project Proposal is coming up! (March 13) • 2 pages text + 1 page plan of action • 3 references • No class March 13 • Put your project proposal in my box. • Paper presentations on March 11 (Tom Abbott)

  2. Module Compostion

  3. Case Study: Matrix Multiply • Goal: Data-distribution neutral • Three basic ways to distribute • row • column • submatrix • Question? • Does our library need different algorithms?

  4. Analytical Model • Compare the two algorithms • Ignore the computation costs • What are the communication costs.

  5. One Dimensional Decomposition • Each processor "owns" black portion • To compute the owned portion of the answer, each processor requires all of A. • This affects data-distribution.

  6. 1-D Decomp. æ ö 2 N ç ÷ = - + T ( P 1 ) t t ç ÷ s w P è ø

  7. Two Dimensional Decomposition • Requires less data per processor • Algorithm can be performed stepwise.

  8. Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards

  9. Algorithm Set B' = Blocal for j = 0 to sqrt(P) -2 in each row I the [(I+j) mod sqrt(P)]th task broadcasts A' = Alocal to the other tasks in the row accumulate A' * B' send B' to upward neighbor done

  10. 2-D Decomp. ( ) æ ö 2 æ ö log P N ç ÷ = - + + ç ÷ T P 1 1 t t ç ÷ s w 2 P è ø è ø

  11. Redistribution • If we only have one algorithm, we need to possibly redistribute the data • How much does this cost?

  12. Redistribution ( ) æ ö 2 N ÷ = - + T P 1 t t ç ÷ s w P P è ø

  13. Analysis • Performance analysis reveals that the 2 dimensional decomposition is always better. • So our matrix multiply only needs one algorithm • Might need redistribution algorithm to be totally data distribution neutral • However, this is not the best algorithm.

  14. Systolic Algorithm ( ) æ ö 2 N ç ÷ = - + T 2 P 1 t t ç ÷ s w P ø è

More Related