1 / 29

Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005

Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005. Matlab’s sparse matrix design principles. All operations should give the same results for sparse and full matrices (almost all)

ddeanna
Download Presentation

Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse Matrices and Combinatorial Algorithms in Star-P Status and PlansApril 8, 2005

  2. Matlab’s sparse matrix design principles • All operations should give the same results for sparse and full matrices (almost all) • Sparse matrices are never created automatically, but once created they propagate • Performance is important -- but usability, simplicity, completeness, and robustness are more important • Storage for a sparse matrix should be O(nonzeros) • Time for a sparse operation should be O(flops)(as nearly as possible)

  3. Matlab’s sparse matrix design principles • All operations should give the same results for sparse and full matrices (almost all) • Sparse matrices are never created automatically, but once created they propagate • Performance is important -- but usability, simplicity, completeness, and robustness are more important • Storage for a sparse matrix should be O(nonzeros) • Time for a sparse operation should be O(flops)(as nearly as possible) Star-P dsparse matrices: same principles, some different tradeoffs

  4. Sparse matrix operations • dsparse layout, same semantics as ddense • For now, only row distribution • Matrix operators: +, -, max, etc. • Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, 7) C ] ; • A \ b by direct methods

  5. Star-P sparse data structure • Full: • 2-dimensional array of real or complex numbers • (nrows*ncols) memory • Sparse: • compressed row storage • about (2*nzs + nrows) memory

  6. Star-P distributed sparse data structure P0 P1 • Each processor stores: • # of local nonzeros • range of local rows • nonzeros in CSR form P2 Pn

  7. Sorting in Star-P • [V, perm] = sort (V) • Useful primitive for some sparse array algorithms, e.g. the sparse constructor • Star-P uses a parallel sample sort

  8. The sparse( ) constructor • A = sparse (I, J, V, nr, nc) • Input: ddense vectors I, J, V, and dimensions nr, nc • A(I(k), J(k)) = V(k) for each k • Sort triples (i, j, v) by (i, j) • Adjust for desired row distribution • Locally convert to compressed row indices • Sum values with duplicate indices

  9. Sparse matrix times dense vector • y = A * x • The first call to matvec caches a communication schedule for matrix A. Later calls to multiply any vector by A use the cached schedule. • Communication and computation overlap.

  10. Sparse linear systems • x = A \ b • Matrix division uses MPI-based direct solvers: • SuperLU_dist: nonsymmetric static pivoting • MUMPS: nonsymmetric multifrontal • PSPASES: Cholesky ppsetoption(’SparseDirectSolver’,’SUPERLU’) • Iterative solvers implemented in Star-P • Ongoing work on preconditioners

  11. Sparse eigenvalue solvers • [V, D] = eigs(A); etc. • Implemented via MPI-based PARPACK [Lehoucq, Maschhoff, Sorensen, Yang] • Work in progress

  12. Combinatorial Scientific Computing • Sparse matrix methods; machine learning; search and information retrieval; adaptive multilevel modeling; geometric meshing; computational biology; . . . • How will combinatorial methods be used by people who don’t understand them in complete detail?

  13. Analogy: Matrix division in Matlab x = A \ b; • Works for either full or sparse A • Is A square? no => use QR to solve least squares problem • Is A triangular or permuted triangular? yes => sparse triangular solve • Is A symmetric with positive diagonal elements? yes => attempt Cholesky after symmetric minimum degree • Otherwise => use LU on A(:, colamd(A))

  14. Combinatorics in Star-P • A sparse matrix language is a good start on primitives for combinatorial computing. • Random-access indexing: A(i,j) • Neighbor sequencing: find (A(i,:)) • Sparse table construction: sparse (I, J, V) • Other primitives we are building: sorting, searching, pointer-jumping, connectivity and paths, . . .

  15. Matching and depth-first search in Matlab • dmperm: Dulmage-Mendelsohn decomposition • Square, full rank A: • [p, q, r] = dmperm(A); • A(p,q) is block upper triangular with nonzero diagonal • also, strongly connected components of a directed graph • also, connected components of an undirected graph • Arbitrary A: • [p, q, r, s] = dmperm(A); • maximum-size matching in a bipartite graph • minimum-size vertex cover in a bipartite graph • decomposition into strong Hall blocks

  16. Connected components (work in progress) • Sequential Matlab uses depth-first search (dmperm), which doesn’t parallelize well • Shiloach-Vishkin algorithm: • repeat • Link every (super)vertex to a random neighbor • Shrink each tree to a supervertex by pointer jumping • until no further change • Hybrid SV / search method under construction • Other potential graph kernels: • Breadth-first search (after Husbands, LBNL) • Bipartite matching (after Riedy, UCB) • Strongly connected components (after Pinar, LBNL)

  17. SSCA #2 -- Overview • Scalable data generator • Kernel 1 — Graph Construction • Kernel 2 — Classify Large Sets (Search) • Kernel 3 — Subgraph Extraction • Kernel 4 — Graph Clustering / Partitioning • Randomly sized “cliques” • Most intra-clique edges present • Random inter-clique edges,gradually 'thinning' with distance • Integer and character string edge labels • Randomized vertex numbers

  18. HPCS Benchmark SpectrumSSCA #2

  19. Scalable Data Generator • Produce edge tuples containing the start vertex, end vertex, and weight for each directed edge • Hierarchical nature, with random-sized cliques connected by edges between some of the cliques • Interclique edges are assigned using a random distribution that represents a hierarchical thinning as vertices are further apart • Weights are either random integer values or randomly selected character strings • Random permutations to avoid induced locality • Vertex numbers — local processor memory locality • Parallel — global memory locality • Data sets must be developed in their entirety before graph construction • May be parallelized — data tuple vertex naming schemes must be globally consistent • Data generator will not be timed • Statistics will be saved to be used in later verification stages

  20. Kernel 1 — Graph Construction • Construct the multigraph in a format usable by all subsequent kernels • No subsequent modifications will be permitted to benefit specific kernels • Graph construction will be timed

  21. Kernel 2 — Sort on Selected Edge Weights • Sort on edge weights to determine those vertex pairs with • The largest integer weights • With a desired string weight • The two vertex pair lists will be saved for use in the subsequent kernel • Start set SI for integer start sets • Start set SC for character start sets • Sorting weights will be timed

  22. Kernel 3 — Extract Subgraphs • Extract a series of subgraphs formed by following paths of length k from a start set of initial vertices • The set of initial vertices will be determined from the previous kernel • Produce a series of subgraphs consisting of vertices and edges on paths of length k from the vertex pairs start sets SI and SC • A possible computational kernel for graph extraction is Breadth First Search • Extracting graphs will be timed

  23. Kernel 4 — Partition Graph Using a Clustering Algorithm • Apply a graph clustering algorithm to partition the vertices of the graph into subgraphs no larger than a maximum size so as to minimize the number of edges that need be “cut”, e.g. • Kernighan and Lin • Sangiovanni-Vincentelli, Chen, and Chua • The output of this kernel will be a description of the clusters and the edges that form the interconnecting “network” • This step should identify the structure in the graph specified in the data generation phase • It is important that the implementation of this kernel does not utilize a priori knowledge of the details in the data generator or the statistics collected in the graph generation process • Identifying the clusters and the interconnecting “network” will be timed

  24. A serial Matlab implementation of SSCA#2 Coding style is simple and “data parallel” Our starting point for Star-P implementation Edge labels are only integers, not strings yet Concise SSCA#2

  25. Data generation • Using Lincoln Labs serial Matlab generator at present • Data-parallel Star-P generator under construction • Generate “edge triples” as distributed dense vectors

  26. Kernel 1: Graph construction • cSSCA#2 K1: ~30 lines of executable serial Matlab • Multigraph is a cell array of simple directed graphs • Graphs are dsparse matrices, created by sparse( )

  27. cSSCA#2 K2: ~12 lines of executable serial Matlab Uses max( ) and find( ) to locate edges of max weight Kernel 2: Search in large sets ng = length(G.edgeWeights); % number of graphs maxwt = 0; for g = 1:ng maxwt = max(maxwt, max(max(G.edgeWeights{g}))); end intedges = zeros(0,2); for g = 1:ng [intstart intend] = find(G.edgeWeights{g} == maxwt); intedges = [intedges ; intstart intend]; end;

  28. cSSCA#2 K3: ~25 lines of executable serial Matlab Returns subgraphs consisting of vertices and edges within specified distance of specified starting vertices Sparse matrix-vector product for breadth-first search Kernel 3: Subgraph extraction

  29. Kernel 4: Partitioning / clustering • cSSCA#2 K4: serial prototype exists, not yet parallel • Different algorithm from Lincoln Labs executable spec, which uses seed-growing breadth-first search • cSSCA#2 uses spectral methods for clustering, based on eigenvectors (Fiedler vectors) of the graph • One version seeks small balanced edge cuts (graph partitioning); another seeks clusters with low isoperimetric number (clustering)

More Related