1 / 20

CS 290H Lecture 15 GESP concluded

CS 290H Lecture 15 GESP concluded. Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion Email me with your preferred day – first come first served Course evaluations at end of class today.

ismael
Download Presentation

CS 290H Lecture 15 GESP concluded

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 290H Lecture 15GESP concluded • Final presentations for survey projects next Tue and Thu • 20-minute talk with at least 5 min for questions and discussion • Email me with your preferred day – first come first served • Course evaluations at end of class today

  2. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization

  3. 0 1 2 0 2 0 0 1 2 1 4 3 5 3 5 4 5 3 3 4 2 0 2 1 0 0 1 U 3 5 5 3 4 4 3 0 2 0 0 2 1 1 L 4 5 3 3 3 4 5 0 0 1 2 2 0 1 SuperLU-dist: Distributed static data structure Process(or) mesh Block cyclic matrix layout

  4. GESP: Gaussian elimination with static pivoting • PA = LU • Sparse, nonsymmetric A • P is chosen numerically in advance, not by partial pivoting! • After choosing P, can permute PA symmetrically for sparsity: Q(PA)QT = LU P = x

  5. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  6. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  7. 1 2 3 4 5 1 4 1 5 2 2 3 3 3 1 4 2 4 5 PA 5 Row permutation for heavy diagonal [Duff, Koster] 1 2 3 4 5 • Represent A as a weighted, undirected bipartite graph (one node for each row and one node for each column) • Find matching (set of independent edges) with maximum product of weights • Permute rows to place matching on diagonal • Matching algorithm also gives a row and column scaling to make all diag elts =1 and all off-diag elts <=1 1 2 3 4 5 A

  8. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  9. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  10. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  11. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  12. Iterative refinement to improve solution Usually 0 – 3 steps are enough • Iterate: • r = b – A*x • backerr = maxi ( ri / (|A|*|x| + |b|)i ) • if backerr < ε or backerr > lasterr/2 then stop iterating • solve L*U*dx = r • x = x + dx • lasterr = backerr • repeat

  13. Convergence analysis of iterative refinement Let C = I – A(LU)-1 [ so A = (I – C)·(LU) ] x1 = (LU)-1b r1 = b – Ax1 = (I – A(LU)-1)b = Cb dx1 = (LU)-1 r1 = (LU)-1Cb x2 = x1+dx1 = (LU)-1(I + C)b r2 = b – Ax2 = (I – (I – C)·(I + C))b = C2b . . . In general, rk = b – Axk = Ckb Thus rk  0 if |largest eigenvalue of C| < 1.

  14. SuperLU-dist: GE with static pivoting [Li, Demmel] • Target: Distributed-memory multiprocessors • Goal: No pivoting during numeric factorization • Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) • Scale rows and columns to equilibrate • Permute A symmetrically for sparsity • Factor A = LU with no pivoting, fixing up small pivots: • if |aii|<ε ·||A|| then replace aii by ε1/2 ·||A|| • Solve for x using the triangular factors: Ly = b, Ux = y • Improve solution by iterative refinement

  15. 2 1 4 5 7 6 3 Directed graph • A is square, unsymmetric, nonzero diagonal • Edges from rows to columns • Symmetric permutations PAPT A G(A)

  16. Undirected graph, ignoring edge directions 2 1 • Overestimates the nonzero structure of A • Sparse GESP can use symmetric permutations (min degree, nested dissection) of this graph 4 5 7 6 3 A+AT G(A+AT)

  17. 2 1 4 5 7 6 3 Symbolic factorization of undirected graph • Overestimates the nonzero structure of L+U chol(A +AT) G+(A+AT)

  18. 2 1 4 5 7 6 3 + L+U Symbolic factorization of directed graph • Add fill edge a -> b if there is a path from a to b through lower-numbered vertices. • Sparser than G+(A+AT) in general. • But what’s a good ordering for G+(A)? A G (A)

  19. Question: Preordering for GESP • Use directed graph model, less well understood than symmetric factorization • Symmetric: bottom-up, top-down, hybrids • Nonsymmetric: mostly bottom-up • Symmetric: best ordering is NP-complete, but approximation theory is based on graph partitioning (separators) • Nonsymmetric: no approximation theory is known; partitioning is not the whole story • Good approximations and efficient algorithmsboth remain to be discovered

  20. Remarks on nonsymmetric GE • Multifrontal tends to be faster but use more memory • Unsymmetric-pattern multifrontal • Lots more complicated, not simple elimination tree • Sequential and SMP versions in UMFpack and WSMP (see web links) • Distributed-memory unsymmetric-pattern multifrontal is a research topic • Combinatorial preliminaries are important: ordering, etree, symbolic factorization, matching, scheduling • not well understood in many ways • also, mostly not done in parallel • Not mentioned: symmetric indefinite problems • Direct-methods technology is also used in preconditioners for iterative methods

More Related