1 / 26

Design of parallel algorithms

Design of parallel algorithms. Linear equations Jari Porras. Linear equations. a 0,0 x 0 + ... + a 0,n-1 x n-1 = b 0 ... a n-1,0 x 0 + ... + a n-1,n-1 x n-1 = b n-1 Ax = b Usually solved in 2 stages reduce into upper triangular system Ux = y back-substitution x n-1 ... x 0

Download Presentation

Design of parallel algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of parallel algorithms Linear equations Jari Porras 1/26

  2. Linear equations a0,0x0 + ... + a0,n-1xn-1 = b0 ... an-1,0x0 + ... + an-1,n-1xn-1 = bn-1 • Ax = b • Usually solved in 2 stages • reduce into upper triangular system Ux = y • back-substitution xn-1 ... x0 • Gaussian elimination 2/26

  3. Gaussian elimination 3/26

  4. Gaussian elimination 4/26

  5. Gaussian elimination • Gaussian elimination requires • n2/2 divisions (line 6) • (n3/3) – (n2/2) subtractions and multiplications (line 12) • Sequential run time 2n3/3 • How is the gausian elimination peformed in parallel ? 5/26

  6. Parallel Gaussian elimination • Row/column striping vs. chackerboarding ? • Block vs cyclic striped ? • Number of processors p < n, p = n, p > n • Active processors ? • Required steps ? 6/26

  7. 7/26

  8. Analysis • 1st step • kth iteration requires n – k – 1 divisions at processor Pk • 2nd step • (ts + tw(n – k – 1)) log n time on hypercube • 3rd step • kth iteration requires n – k – 1 multiplications and subtractions at all processors Pi • Tp = 3/2 n(n-1) + tsnlog n + ½ twn(n-1)logn 8/26

  9. Analysis • Not cost-optimal since pTp = (n3logn) • What is the main reason ? • Inefficient parallelization ? • What could be done ? 9/26

  10. 10/26

  11. 11/26

  12. Analysis • Pipelined operation • all n steps are executed in parallel • last step starts in nth step and is completed in constant time (changes only the bottm right corner element) • (n) steps • Each step takes O(n) time • Thus parallel run time O(n2) and cost (n3) • Cost-optimal !! 12/26

  13. p < n ? • Block striping • several rows / processor • Does the activity change ? • Block vs. cyclic striping 13/26

  14. 14/26

  15. 15/26

  16. Analysis • With block striping • processor with all rows belonging to the active part performs (n – k – 1)n/p multiplications and subtractions • if the pipelined version is used the number of arithmetic operations (2(n-k-1)n/p) is higher than number of words communicated (n-k-1) • computation dominates • parallel run time n3/p 16/26

  17. Checkeboard partitioning • Use n x n mesh • Same approach as before, but • requires two broadcasts (rowwise and columnwise) • Analyse the cost-optimality • How about the pipelining ? 17/26

  18. 18/26

  19. Pipelined checkerboard 19/26

  20. Pipelined checkerboard 20/26

  21. p < n2 • Map matrix onto p x p mesh by usin block checkerboard partitioning • Remember the effect of active processors !! • Number of multiplications and subtractions n2/p and n/ p word communication • computation dominates ! 21/26

  22. 22/26

  23. 23/26

  24. Partial pivoting • Basic algorithm fails if any elemnt on diagonal is zero • Partial pivoting helps • select row that has the largest element on the wanted column and exchange rows • What is the effect to the partitioning strategy ? • How about pipelining 24/26

  25. Back-substitution • The second stage of solving linear equations • Back-substitution is used to determine vector x • Complexity n2 • use partitioning scheme that is suitable for Gaussian elimination 25/26

  26. Back substitution 26/26

More Related