cs 290h lecture 13 column approximate minimum degree other approaches to nonsymmetric lu
Download
Skip this Video
Download Presentation
CS 290H Lecture 13 Column approximate minimum degree; Other approaches to nonsymmetric LU

Loading in 2 Seconds...

play fullscreen
1 / 19

CS 290H Lecture 13 Column approximate minimum degree; Other approaches to nonsymmetric LU - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

CS 290H Lecture 13 Column approximate minimum degree; Other approaches to nonsymmetric LU. Final project progress report due today Homework 3 due this Sunday 21 November Read “Computing the block triangular form of a sparse matrix” (reader #6). =. x. Column Preordering for Sparsity. Q.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS 290H Lecture 13 Column approximate minimum degree; Other approaches to nonsymmetric LU' - kalare


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs 290h lecture 13 column approximate minimum degree other approaches to nonsymmetric lu
CS 290H Lecture 13Column approximate minimum degree;Other approaches to nonsymmetric LU
  • Final project progress report due today
  • Homework 3 due this Sunday 21 November
  • Read “Computing the block triangular form of a sparse matrix” (reader #6)
column preordering for sparsity

=

x

Column Preordering for Sparsity

Q

  • PAQT= LU:Q preorders columns for sparsity, P is row pivoting
  • Column permutation of A  Symmetric permutation of ATA (or G(A))

P

column intersection graph

1

2

3

4

5

3

1

2

3

4

5

1

4

5

2

3

4

5

2

1

Column Intersection Graph
  • G(A) = G(ATA) if no cancellation(otherwise)
  • Permuting the rows of A does not change G(A)

A

ATA

G(A)

filled column intersection graph

1

2

3

4

5

3

+

1

2

3

4

5

G(A)

1

4

5

2

3

4

2

5

1

+

+

+

Filled Column Intersection Graph
  • G(A) = symbolic Cholesky factor of ATA
  • In PA=LU, G(U)  G(A) and G(L)  G(A)
  • Tighter bound on L from symbolic QR
  • Bounds are best possible if A is strong Hall

A

chol(ATA)

column preordering for sparsity1

=

x

Column Preordering for Sparsity

Q

  • PAQT= LU:Q preorders columns for sparsity, P is row pivoting
  • Column permutation of A  Symmetric permutation of ATA (or G(A))
  • Symmetric ordering: Approximate minimum degree
  • But, forming ATA is expensive (sometimes bigger than L+U).

P

column approximate minimum degree matlab 6

1

2

3

4

5

1

1

1

2

2

2

3

3

3

4

4

5

4

5

5

Column Approximate Minimum Degree [Matlab 6]

row

col

  • Eliminate “row” nodes of aug(A) first
  • Then eliminate “col” nodes by approximate min degree
  • 4x speed and 1/3 better ordering than Matlab-5 min degree, 2x speed of AMD on ATA
  • Can also use other orderings, e.g. nested dissection on aug(A)

I

A

row

AT

I

col

G(aug(A))

A

aug(A)

column elimination tree

5

1

2

3

4

5

1

2

3

4

5

1

4

2

2

3

3

4

5

1

Column Elimination Tree
  • Elimination tree of ATA (if no cancellation)
  • Depth-first spanning tree of G(A)
  • Represents column dependencies in various factorizations

T(A)

A

chol(ATA)

+

column dependencies in pa lu

k

j

T[k]

  • If A is strong Hall then, for some pivot sequence, every column modifies its parent in T(A).
Column Dependencies in PA = LU
  • If column j modifies column k, then j  T[k].
shared memory superlu mt
Shared Memory SuperLU-MT
  • 1D data layout across processors
  • Dynamic assignment of panel tasks to processors
  • Task tree follows column elimination tree
  • Two sources of parallelism:
    • Independent subtrees
    • Pipelining dependent panel tasks
  • Single processor “BLAS 2.5” SuperLU kernel
  • Good speedup for 8-16 processors
  • Scalability limited by 1D data layout
superlu mt performance highlight 1999
SuperLU-MT Performance Highlight (1999)

3-D flow calculation (matrix EX11, order 16614):

left looking column lu factorization
for column j = 1 to n do

solve

pivot: swap ujj and an elt of lj

scale:lj = lj / ujj

j

U

L

A

( )

L 0L I

( )

ujlj

L

= aj for uj, lj

Left-looking Column LU Factorization
  • Column j of A becomes column j of L and U
symmetric pattern multifrontal factorization

4

1

G(A)

7

1

2

3

4

5

6

7

8

9

6

3

8

1

2

2

5

9

3

4

5

6

9

T(A)

7

8

8

9

7

A

6

3

4

1

2

5

Symmetric-pattern multifrontal factorization
symmetric pattern multifrontal factorization1

4

1

G(A)

7

6

3

8

2

5

9

9

T(A)

8

7

6

3

4

1

2

5

Symmetric-pattern multifrontal factorization

For each node of T from leaves to root:

  • Sum own row/col of A with children’s Update matrices into Frontal matrix
  • Eliminate current variable from Frontal matrix, to get Update matrix
  • Pass Update matrix to parent
symmetric pattern multifrontal factorization2

4

1

3

7

1

G(A)

7

1

3

6

3

8

7

F1 = A1

=> U1

2

5

9

9

T(A)

8

3

7

7

3

7

6

3

4

1

2

5

Symmetric-pattern multifrontal factorization

For each node of T from leaves to root:

  • Sum own row/col of A with children’s Update matrices into Frontal matrix
  • Eliminate current variable from Frontal matrix, to get Update matrix
  • Pass Update matrix to parent
symmetric pattern multifrontal factorization3

4

1

3

7

1

G(A)

7

1

3

6

3

8

7

F1 = A1

=> U1

2

5

9

9

2

3

9

T(A)

3

9

8

3

7

2

3

7

3

3

9

7

6

3

9

4

1

2

5

F2 = A2

=> U2

Symmetric-pattern multifrontal factorization

For each node of T from leaves to root:

  • Sum own row/col of A with children’s Update matrices into Frontal matrix
  • Eliminate current variable from Frontal matrix, to get Update matrix
  • Pass Update matrix to parent
symmetric pattern multifrontal factorization4

4

1

3

7

1

G(A)

7

1

3

6

3

8

7

F1 = A1

=> U1

2

5

9

2

3

9

2

3

9

9

F2 = A2

=> U2

8

3

3

9

7

3

7

8

9

7

3

3

3

7

8

9

7

9

6

7

3

7

8

4

1

2

5

8

9

9

F3 = A3+U1+U2

=> U3

Symmetric-pattern multifrontal factorization

T(A)

symmetric pattern multifrontal factorization5

4

1

G+(A)

7

1

2

3

4

5

6

7

8

9

6

3

8

1

2

2

5

9

3

4

5

6

9

T(A)

7

8

8

9

7

L+U

6

3

4

1

2

5

Symmetric-pattern multifrontal factorization
symmetric pattern multifrontal factorization6

G(A)

9

T(A)

4

1

7

8

7

6

3

8

6

3

2

5

4

1

2

5

9

Symmetric-pattern multifrontal factorization
  • Really uses supernodes, not nodes
  • All arithmetic happens on dense square matrices.
  • Needs extra memory for a stack of pending update matrices
  • Potential parallelism:
    • between independent tree branches
    • parallel dense ops on frontal matrix
mumps distributed memory multifrontal amestoy duff l excellent koster tuma
MUMPS: distributed-memory multifrontal[Amestoy, Duff, L’Excellent, Koster, Tuma]
  • Symmetric-pattern multifrontal factorization
  • Parallelism both from tree and by sharing dense ops
  • Dynamic scheduling of dense op sharing
  • Symmetric preordering
  • For nonsymmetric matrices:
    • optional weighted matching for heavy diagonal
    • expand nonzero pattern to be symmetric
    • numerical pivoting only within supernodes if possible (doesn’t change pattern)
    • failed pivots are passed up the tree in the update matrix
ad