1 / 20

Monte Carlo Linear Algebra Techniques and Their Parallelization

Monte Carlo Linear Algebra Techniques and Their Parallelization. Ashok Srinivasan Computer Science Florida State University. www.cs.fsu.edu/~asriniva. Outline. Background M onte C arlo Matrix-vector multiplication MC linear solvers Non-diagonal splitting Dense implementation

sun
Download Presentation

Monte Carlo Linear Algebra Techniques and Their Parallelization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University www.cs.fsu.edu/~asriniva

  2. Outline • Background • Monte Carlo Matrix-vector multiplication • MC linear solvers • Non-diagonal splitting • Dense implementation • Sparse implementation • Parallelization • Conclusions and future work

  3. Background • MC linear solvers are old! • von Neumann and Ulam (1950) • Were not competitive with deterministic techniques • Advantages of MC • Can give approximate solutions fast • Feasible in applications such as preconditioning, graph partitioning, information retrieval, pattern recognition, etc • Can yield selected components fast • Are very latency tolerant

  4. Matrix-vector multiplication pk0 • Compute Cj h, C eR nxn • Choose probability and weight matrices such that • Cij = PijWij and hi = pi wi • Take a random walk, based on these probabilities • Define random variables Xi • X0 = w0, and Xi = Xi-1 Wki ki-1 • E(Xjdikj) = (Cj h)i • Each random walk can be used to estimate the kj th component of Cj h • Convergence rate independent of n k0 Pk1k0 k1 Pk2k1 k2 kj Update (Cj h)kj

  5. Matrix-vector multiplication ... continued pk0 • Smj=0 Cj h too can be similarly estimated • Smj=0 (BC) jBh will be needed by us • It can be estimated using probabilities on both matrices, B and C • Length of random walk is twice that for the previous case k0 Pk1k0 k1 Pk2k1 k2 Pk3k2 k3 Update (Cj h)k2j+1 k2m+1

  6. MC linear solvers • Solve Ax = b • Split A as: A = N – M • Write the fixed point iteration • xm+1 = N-1 Mxm + N-1 b = Cxm + h • If we choose x0 = h, then we get • xm = Smj=0 Cj h • Estimate the above using the Markov chain technique mentioned earlier

  7. Current techniques • Current techniques • Choose N = a diagonal matrix • Ensures efficient computation of C • C is sparse when A is sparse • Example: N = Diagonal of A yields the Jacobi iteration, and the corresponding MC estimate

  8. Properties of MC linear solvers • MC techniques estimate the result of a stationary iteration • Errors from the iterative process • Errors from MC • Reduce the error by • Variance reduction techniques • Residual correction • Choose a better iterative scheme!

  9. Non-diagonal splitting • Observations • It is possible to construct an efficient MC technique for specific splittings, even if explicit construction of C were computationally expensive • It may be possible to implicitly represent C sparsely, even if C is not actually sparse

  10. Our example • Choose N to be the diagonal and sub-diagonal of A d1 s2 d2 sn dn * * * * * * . . . . . . * * * * * * * N = N-1 = • Computing N-1 C is too expensive • Compute xm = Smj=0 (N-1 M)j N-1 b instead

  11. Computing N-1 • Using O(n) storage and precomputation time, any element of N-1 can be computed in constant time • Define T(1) = 1, T(i+1) = T(i) si+1/di+1 • N-1ij = • 0, if i < j • 1/di, if i =j • (-1)i-j /dj T(i)/T(j), otherwise • The entire N-1, if needed, can be computed in O(n2) time

  12. Dense implementation • Compute N-1 and store in O(n2) space • Choose probabilities proportional to the weight of the elements • Use the alias method to sample • Precomputation time proportional to the number of elements • Constant time to generate each sample • Estimate Smj=0 (N-1 M)j N-1 b

  13. Experimental results Walk length = 2

  14. Sparse implementation • We cannot use O(n2) space or time! • Sparse implementation for M is simple • Sparse representation of N-1 • Choose Pij = • 0, if i < j • 1/(n-j+1) otherwise • Sampled from the uniform distribution • Choose Wij = N-1ij Pij • Constant time to determine any Wij • Minor modifications needed when si = 0

  15. Experimental results Walk length = 2

  16. Geometric sampling • Probabilities are better approximated by geometric sampling • Choose Pij = • 0, if i < j • Approximately pj(1-pj)I-j otherwise • X ~ Uniform(0,1), then • ceil [log(1-x)/log(1-p) - 1] ~ Geometric(p) • Choose Wij = N-1ij Pij

  17. Experimental results Walk length = 2

  18. Proc 1 RNG 1 Parallelization Proc 2 RNG 2 Proc 3 RNG 3 23.7 23.2 23.6 23.5 • MC is “embarrassingly” parallel • Identical algorithms are run independently on each processor, with the random number sequences alone being different

  19. MPI vs OpenMP on Origin 2000 • Cache misses cause poor performance of the OpenMP parallelization

  20. Conclusions and future work • Demonstrated that is possible to have effective MC implementations with non-diagonal splittings too • Need to extend this to better iterative schemes • Non-replicated parallelization needs to be considered

More Related