1 / 9

CS 290H Administrivia: April 2, 2008

CS 290H Administrivia: April 2, 2008. Course web site: www.cs.ucsb.edu/~gilbert/cs290 Join the email (Google) discussion group!! (see web site) Homework 1 is due next Monday (see web site) Reading in Davis: Review Ch 1 (definitions).

Download Presentation

CS 290H Administrivia: April 2, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 290H Administrivia: April 2, 2008 • Course web site: www.cs.ucsb.edu/~gilbert/cs290 • Join the email (Google) discussion group!! (see web site) • Homework 1 is due next Monday (see web site) • Reading in Davis: • Review Ch 1 (definitions). • Skim Ch 2 (you don't have to read all the code in detail). • Read Chapter 3 (sparse triangular solves). • I have a few copies of Davis (at a discount :)

  2. Compressed Sparse Matrix Storage value: • Full storage: • 2-dimensional array. • (nrows*ncols) memory. row: colstart: • Sparse storage: • Compressed storage by columns (CSC). • Three 1-dimensional arrays. • (2*nzs + ncols + 1) memory. • Similarly,CSR.

  3. Matrix – Matrix Multiplication: C = A * B C(:, :) = 0; for i = 1:n for j = 1:n for k = 1:n C(i, j) = C(i, j) + A(i, k) * B(k, j); • The n3 scalar updates can be done in any order. • Six possible algorithms: ijk, ikj, jik, jki, kij, kji (lots more if you think about blocking for cache). • Goal is O(nonzero flops) time for sparse A, B, C. • Even time = O(n2) is too slow!

  4. Organizations of Matrix Multiplication Barriers to O(flops) work - Inserting updates into C is too slow - n2 loop iterations cost too much if C is sparse - Loop k only over nonzeros in column j of B - Use sparse accumulator (SPA) for column updates • Outer product:for k = 1:n C = C + A(:, k) * B(k, :) • Inner product:for i = 1:n for j = 1:n C(i, j) = A(i, :) * B(:, j) • Column by column:for j = 1:n for k where B(k, j)  0 C(:, j) = C(:, j) + A(:, k) * B(k, j)

  5. Sparse Accumulator (SPA) • Abstract data type for a single sparse matrix column • Operations: • initialize spa O(n) time & O(n) space • spa = spa + (scalar) * (CSC vector) O(nnz(spa)) time • (CSC vector) = spa O(nnz(spa)) time • spa = 0 O(nnz(spa)) time • … possibly other ops

  6. Sparse Accumulator (SPA) • Abstract data type for a single sparse matrix column • Operations: • initialize spa O(n) time & O(n) space • spa = spa + (scalar) * (CSC vector) O(nnz(spa)) time • (CSC vector) = spa O(nnz(spa)) time • spa = 0 O(nnz(spa)) time • … possibly other ops • Standard implementation (many variants): • dense n-element floating-point array “value” • dense n-element boolean array “is-nonzero” • linked structure to sequence through nonzeros

  7. CSC Sparse Matrix Multiplication with SPA for j = 1:n C(:, j) = A * B(:, j) = x B C A All matrix columns and vectors are stored compressed except the SPA. scatter/accumulate gather SPA

  8. 3 7 1 3 7 1 6 8 6 8 4 10 4 10 9 2 9 2 5 5 Graphs and Sparse Matrices: Cholesky factorization Fill:new nonzeros in factor Symmetric Gaussian elimination: for j = 1 to n add edges between j’s higher-numbered neighbors G+(A)[chordal] G(A)

  9. } O(#nonzeros in A), almost Symmetric positive definite systems: A = LLT • Preorder • Independent of numerics • Symbolic Factorization • Elimination tree • Nonzero counts • Supernodes • Nonzero structure of L • Numeric Factorization • Static data structure • Supernodes use BLAS3 to reduce memory traffic • Triangular Solves O(#nonzeros in L) O(#flops) O(#flops)

More Related