1 / 25

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware. Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys. University of Virginia. General-Purpose GPU Programming. Why do we port algorithms to the GPU?

hawa
Download Presentation

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory LewinDavid Luebke Greg Humphreys University of Virginia

  2. General-Purpose GPU Programming • Why do we port algorithms to the GPU? • How much faster can we expect it to be, really? • What is the challenge in porting?

  3. Case Study Problem: Implement a Boundary Value Problem (BVP) solver using the GPU Could benefit an entire class of scientific and engineering applications, e.g.: • Heat transfer • Fluid flow

  4. Related Work • Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms • Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid • Very similar to our system • Developed concurrently • Complementary approach

  5. Driving problem: Fluid mechanics sim Problem domain is a warped disc: regular grid regular grid

  6. BVPs: Background • Boundary value problems are sometimes governedby PDEs of the form: L=f • L is some operator •  is the problem domain • f is a forcing function (source term) • Given L and f, solve for .

  7. BVPs: Example Heat Transfer • Find a steady-state temperature distribution T in a solid of thermal conductivity k with thermal source S • This requires solving a Poisson equation of the form: k2T = -S • This is a BVP where L is the Laplacian operator 2 All our applications require a Poisson solver.

  8. BVPs: Solving • Most such problems cannot be solved analytically • Instead, discretize onto a grid to form a set of linear equations, then solve: • Direct elimination • Gauss-Seidel iteration • Conjugate-gradient • Strongly implicit procedures • Multigrid method

  9. Multigrid method • Iteratively corrects an approximation to the solution • Operates at multiple grid resolutions • Low-resolution grids are used to correct higher-resolution grids recursively • Very fast, especially for large grids: O(n)

  10. 1 1/8 1/2 1/4 1/4 1/16 1/16 1 -4 1 1 1/8 1/4 1/8 1/2 1/2 1 1/4 1/8 1/4 1/16 1/2 1/16 Multigrid method • Use coarser grid levels to recursively correct an approximation to the solution • Algorithm: • smooth • residual • restrict • recurse • interpolate  = Li-f

  11. Implementation For each step of the algorithm: • Bind as texture maps the buffers that contain the necessary data • Set the target buffer for rendering • Activate a fragment program that performs the necessary kernel computation • Render a grid-sized quad with multitexturing source buffer texture source buffer texture render target buffer render target buffer fragment program

  12. Optimizing the Solver • Detect steady-state natively on GPU • Minimize shader length • Special-case whenever possible • Avoid context-switching

  13. Optimizing the Solver: Steady-state • How to detect convergence? • L1 norm - average error • L2 norm – RMS error (common in visual sim) • L norm – max error (common in sci/eng apps) • Can use occlusion query! secs to steady statevs. grid size

  14. Optimizing the Solver: Shader length • Minimize number of registers used • Vectorize as much as possible • Use the rasterizer to perform computations of linearly-varying values • Pre-compute invariants on CPU

  15. Optimizing the Solver: Special-case • Fast-path vs. slow-path • write several variants of each fragment program to handle boundary cases • eliminates conditionals in the fragment program • equivalent to avoiding CPU inner-loop branching fast path, no boundaries slow path with boundaries

  16. Optimizing the Solver: Special-case • Fast-path vs. slow-path • write several variants of each fragment program to handle boundary cases • eliminates conditionals in the fragment program • equivalent to avoiding CPU inner-loop branching secs per v-cyclevs. grid size

  17. Optimizing the Solver: Context-switching • Find best packing data of multiple grid levelsinto the pbuffer surfaces

  18. Optimizing the Solver: Context-switching • Find best packing data of multiple grid levelsinto the pbuffer surfaces

  19. Optimizing the Solver: Context-switching • Find best packing data of multiple grid levelsinto the pbuffer surfaces

  20. Optimizing the Solver: Context-switching • Remove context switching • Can introduce operations with undefined results: reading/writing same surface • Why do we need to do this? • Can we get away with it? • What about superbuffers?

  21. Data Layout • Performance: secs to steady statevs. grid size

  22. Compute 4 values at a time Requires source, residual, solution values to be in different buffers Complicates boundary calculations Adds setup and teardown overhead Data Layout • Possible additional vectorization: Stacked domain

  23. Results: CPU vs. GPU • Performance: secs to steady statevs. grid size

  24. Conclusions What we need going forward: • Superbuffers • or: Universal support for multiple-surface pbuffers • or: Cheap context switching • Developer tools • Debugging tools • Documentation • Global accumulator • Ever increasing amounts of precision, memory • Textures bigger than 2048 on a side

  25. Hardware David Kirk Matt Papakipos Driver Support Nick Triantos Pat Brown Stephen Ehmann Fragment Programming James Percy Matt Pharr General-purpose GPU Mark Harris Aaron Lefohn Ian Buck Funding NSF Award #0092793 Acknowledgements

More Related