Similarities in the Structure Prediction of Sparse QR Factorization and Sparse LU Factorization with Partial Pivoting

Similarities in the Structure Prediction ofSparse QR Factorization and Sparse LU Factorization with Partial Pivoting Laura Grigori INRIA – FRANCE Joint work with John Gilbert, UCSB

Talk Outline • Introduction • Sparse direct methods. • Combinatorial tools • Directed graphs (expose path structure in Cholesky and LU factorization). • Bipartite graphs (expose path structure in QR and LU factorization with partial pivoting). • Structure prediction • Consider reducible matrices. • Present tight bounds for the factors Q, R. • Identify tight bounds for the factors L, U of PA=LU. • Conclusions and future work • Algorithms, experiments.

U 1 2 3 4 L 5 6 7 Sparse LU Factorization for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n st A(i,k) != 0 for j = i+1 to n st A(j,i) != 0 A(j,k) = A(j,k) - A(j,i) * A(i,k) • Order equations and variables. • Structure prediction and symbolic factorization: Identify nonzero structure of L, U factors. • Numeric factorization (most time consuming): Use supernodes to exploit memory hierarchy. Benefit from additional parallelism due to sparsity. • Triangular solves + iterative refinement.

Sparse QR factorization by Householder transformations • Apply orthogonal transformations to annihilate subdiagonal entries • Householder transformation • Traditional Householder algorithm eliminates columns of A from left to right (right-looking) using one Householder transformation for each column.

Sparse Direct Methods • The fill-in is more important for QR than for LU. • Since , the QR factorization of A is related to the Cholesky factorization of . • The structure of Q, R (for A=QR factorization) and L, U (for A = LU factorization) can be determined before the numerical computation. • For PA = LU, nonzero structure of L and U is determined during the numerical factorization. • A structure prediction phase allows to identify bounds of L and U. • Studied by George, Gilbert, Hare, Johnson, Liu, Ng, Olesky, Pothen … for LU, matrices are irreducible (strong Hall property). • Our contribution: identified tight bounds for matrices that are reducible (Hall property).

1 2 3 4 1’ 2’ 3’ 4’ A 2 1 3 4 G(A) Directed graphs, bipartite graphs, matchings • A is n x n sparse, unsymmetric matrix. Directed graph G (A): - n vertices. - edge (i,j) for each nonzero Aij. • A is m x n sparse, unsymmetric matrix. Bipartite graph H(A): - m row vertices and n column vertices. - edge (r,c) for each nonzero Arc. Matching M on H(A): - set of edges, no two of which are adjacent. - M is a column complete matching if it has n edges. - M is a perfect matching if it has m=n edges. 1’ 1 2’ 2 3’ 3 4’ 4 H(A)

Hall property • A is Hall if every n-by-k submatrix (1  k  n) has at least k nonzero rows. • A is strong Hall if every n-by-k submatrix (1  k  n) has at least k+1 nonzero rows. • Facts: • Every nonsingular matrix is a Hall matrix. • A Hall matrix is reducible, while a strong Hall matrix is irreducible. • The rows (or columns) of a Hall matrix can be permuted so that the permuted matrix has a zero-free diagonal. • A bipartite graph has a column-complete matching if and only if it has the Hall property. • Coleman/Edenbrandt/Gilbert (‘86).

Results in Structure Prediction A – strong Hall matrix - Filled column intersection graph G(A): symbolic Cholesky factor of ATA; G(U)  G(A) and G(L)  G(A) bound is best possible for the structure of U and R. A – Hall matrix Use Dulmage-Mendelsohn to decompose A into strong Hall submatrices. Structure prediction for Hall matrices. Tight prediction for Q and R [Hare, Johnson, Olesky, van den Driessche, Pothen] -It identifies zeros forced by the orthogonality required in Q. Tight bounds for L and U of PA = LU [Grigori, Gilbert] -It identifies numerical cancellations, related to submatrices of A that are structurally singular and that force entries in L and U to be zero.

A 4 Hall Sets • A is a m x n Hall matrix with m ≥ n. Definition Hall Set [Hare et al. 93]: k columns form a Hall set if they have colectively nonzeros in exactly k rows. Notation: Sw = Hall set of maximum cardinality in the first w columns of A. sw = row vertices covered by columns in Sw. Lemma: Let S be a Hall set of A, and M be a column complete matching of H(A). Then each column vertex c of S is matched by M to a row vertex r’ of s. 1 2 3 4 5 6 Example: A4= A[1’:6’][1:4] Sw={3}, sw={3’} H(A4)-{3} is a strong Hall graph. M={(5’1),(6’2),(3’3),(2’4)} 1’ 1’ 1 2’ 2’ 2 3’ 3’ 3 4’ 4’ 4 5’ 5’ 6’ 6’

1 2 3 4 5 6 1’ 2’ 3’ 4’ 5’ 6’ A Sparse QR Factorization [Hare et al.] (1) Suppose:-A is Hall matrix factored as A=QR. Identify zeros forced by the orthogonality required in Q. Rely on two facts: • Qj span{A1, . . . , Aj} and • Qj is orthogonal to A1, . . . , Aj-1. Example: fill element. bogus fill. 1 2 3 4 5 6 1 2 3 4 5 6 1’ 1’ 2’ 2’ 3’ 3’ 4’ 4’ 5’ 5’ 6’ 6’ Q R

1 2 3 4 5 6 1’ 2’ 3’ 4’ 5’ 6’ A Sparse QR Factorization [Hare et al.] (2) Suppose:the QR factorization arrives at computation of column c of Q. Result: H(A) contains a path T[r’:c] with vertices in Sc-1 υ sc-1 => Qrc = 0 regardless the nonzero values of A. Example: S2 υ s2= {2 } υ { 2’ } => Q23 = 0. - Qj is orthogonal to every Ah  Sc-1. - The columns Ah[sc-1] of Sc-1 span a subspace of dimension |sc-1|. =>Qj[sc-1] is orthogonal to every vector in the above subspace, and so must be zero. 1 2 3 4 5 6 1 2 3 4 5 6 1’ 1’ 2’ 2’ 3’ 3’ 4’ 4’ 5’ 5’ 6’ 6’ Q R

Sparse QR Factorization [Hare et al.] (3) Structure prediction for Q: Theorem:H is graph of a Hall matrix and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 => there is A with H(A) = H and of rank n such that A = QR and Qrc≠ 0. Structure prediction for R: Theorem: H is graph of a Hall matrix and the structure of Q is determined as above. => the structure of R is given by the upper triangular part of .

H(A2) H(B) 1’ 1 1’ 1 2’ 2 2’ 2 6’ 3 Characterization of fill during LU elimination Suppose:-A is Hall matrix factored as PA=LU. Result: H(A) contains a path T[r’:c] with intermediate vertices in Sc-1 υ sc-1 => for any permutation P with r’ permuted in position q’, Lqc = 0. Example: There is a path T[6’:3] = { 6’,1,1’,3 } with vertex in S2 υ s2= {2 } υ { 1’ } => L63 = 0 regardless the nonzero values of A. 1 2 3 4 5 6 A2= A[1’:2’][1:2] B = A[1’,2’,6’][1:3] L63 = det B / det A2 B is structurally singular =>L63 = 0 1’ 2’ 3’ 4’ 5’ 6’ A

Characterization of fill during LU elimination (cont.) • A2 is nonsingular => there is a perfect matching M2 in A2. Proof by contradiction: suppose L63 ≠ 0 • B is nonsingular => there is a perfect matching MB in B. • 6’, 3 are vertices matched by MB, but not matched by M2. Trace a path from 6’ to 3 as follows: 1. i = 0, r’0 = 6’. 2. (r’i , ci) MB, and r’i s2, then ciS2. 3. stop if ci = 5. 4. (ci, r’i+1) M2 and ciS2 , then r’i+1 s2. 5. i = i + 1 and go to 2. Contradiction The path traced has no intermediate vertices in S2 υ s2= {2 } υ { 1’ } => 1 2 3 4 5 6 1’ 2’ H(A2) H(B) 3’ 1’ 1 1’ 1 4’ 2’ 2 2’ 2 5’ 6’ 3 6’ A

1 2 3 4 5 6 7 1’ 2’ 3’ 4’ 5’ 6’ 7’ Upper bound for nonzero structure of L Theorem: H is graph of a Hall matrix with nonzero diagonal and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 => there is P and A such that r’ is permuted in some position q≥c and Lqc≠ 0. Example: Path T={7’,4,2’,1,5’,5}, S4 = {3}, s4 = {3’} Find P and A such that 7’ is permuted in position q and Lq5≠ 0. Approach: Build a matching M that defines a permutation P. 1’ 1 2’ 2 3’ 3 4’ 4 5’ 5 6’ 7’ H(A)

1 2 3 4 5 6 7 5’ 2’ 3’ 4’ 1’ 6’ 7’ Upper bound for nonzero structure of L (cont.) Choose values of A: nonzeros corresponding to edges of M to be larger than n, all other to be between 0 and 1, all Hall square submatrices to be nonsingular. =>Pivots during the first 4 steps of elimination correspond to edges of M. PA[1’:4’][1:4] is Hall matrix - matching M. PA[1’:4’,7’][1:5] is Hall matrix – use M and the path T to obtain a perfect matching. => Element L75≠ 0. 1’ 1 1’ 1 2’ 2 2’ 2 3’ 3 3’ 3 4’ 4 4’ 4 5 5’ 5’ 6’ 6’ 7’ 7’ H(A) H(A)

Upper bound for nonzero structure of U Theorem: H is graph of a Hall matrix with nonzero diagonal, w<c two column vtcs. there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,w-1} and no vertex in Sw-1 υ sw-1 => there is P and A such that r’ is permuted in some position s ≤ c and Usc≠0. Example:S1 = S2 = ø, S3 = S4 = {2,3}. Path {5’,1,2’,4} in H(A) and w=3 satisfy the conditions => There is P and A such that 5’ is permuted in position 3’ and U34≠0. 1 2 3 4 5 1 2 3 4 5 1’ 2’ 1’ 1 2’ 3’ 2’ 2 3’ 5’ 3’ 3 4’ 4’ 4’ 4 5’ 1’ 5’ 5 A H(A) PA

Algorithms • Similarities with symbolic QR for Hall matrices [Hare et al. 93, Pothen]. • Compute Hall sets of maximum cardinality Sj ,sj, for 1 ≤ j ≤ n-1. - complexity dominated by maximal transversal [Hare et al.]. • Use Hall sets to compute fill during LU with partial pivoting: - to be investigated. • Use Hall sets to compute upper bounds for the factors L and U: - complexity , where h is the number of distinct (nonempty) maximal Hall sets Sj , 1 ≤ j ≤ n-1 [Hare et al.].

Conclusions Results: • Characterized fill for the LU factorization with partial pivoting. • Determined tight exact bounds for the structure of L and U. Future work: • Study algorithms for the efficient computation of the bounds. • Compute the number of numerical cancellations, improvements in memory bounds and dependencies estimations. • Compare with decomposing A into Strong Hall submatrices.

Structure Prediction for L and U Upper bound for L: Theorem:H is graph of a Hall matrix with nonzero diagonal and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,c-1} and no vertex in Sc-1 υ sc-1 • there is P and A such that r’ is permuted in some position q≥c and Lqc≠ 0. - The nonzero structure of PQ is a tight bound for the structure of L. Upper bound for U: Theorem:H is graph of a Hall matrix with nonzero diagonal, w<c two column vertices and there is a path { r’,c1, …, r’t ,c} with intermediate column vertices in {1,..,w-1} and no vertex in Sw-1 υ sw-1 => there is P and A such that r’ is permuted in some position s ≤ c and Usc≠0. - The nonzero structure of R is a tight bound for the nonzero structure of U.

Similarities in the Structure Prediction of Sparse QR Factorization and Sparse LU Factorization with Partial Pivoting

Similarities in the Structure Prediction of Sparse QR Factorization and Sparse LU Factorization with Partial Pivoting

Presentation Transcript

Scalabilities Issues in Sparse Factorization and Triangular Solution

6. LU Factorization

Linear Systems LU Factorization

GPU Sparse LU Factorization and Its Application in Circuit Simulation

Lecture 5 Parallel Sparse Factorization, Triangular Solution

Optimizing LU Factorization in Cilk ++

Parallelization of the Classic Gram-Schmidt QR-Factorization

Factorization-based Sparse Solvers and Preconditioners

Performance Analysis of Parallel Sparse LU Factorization on CPUs and GPUs

Block LU Factorization Lecture 24

Iterative Methods and QR Factorization

Linear Least Squares QR Factorization

Tuning Pruning in Sparse Non-negative Matrix Factorization

Method of LU Factorization

An Out-of-Core Sparse Symmetric Indefinite Factorization Method

Probabilistic Sparse Matrix Factorization

Lecture 4 Sparse Factorization: Data-flow Organization

Sparse LU Factorization for Parallel Circuit Simulation on GPU

Online Learning for Matrix Factorization and Sparse Coding

householder QR factorization Slide with matlab code

LU Factorization