1 / 60

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications. Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011. “The real problem is not whether machines think but whether men do.” B. F. Skinner.

dennis
Download Presentation

ME964 High Performance Computing for Engineering Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ME964High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan Negrut, 2011ME964 UW-Madison

  2. Before We Get Started… • Last time • Midterm Project topics 1 and 2 • Discrete Element Method on the GPU. Area coordinator: Toby Heyn • Collision Detection on the GPU. Area coordinator: Arman Pazouki • Today • Midterm Project topics 3 and 4 • Finite Element Method on the GPU. Area coordinators: Prof. Suresh and Naresh Khude • Sparse direct solver on the GPU (Cholesky). Area coordinator: Dan Negrut • Midterm Project Related Issues • Midterm Project is due on 04/13 at 11:59 PM (use Learn@UW drop-box) • Intermediate report due on 03/22 at 11:59 PM (use the same Learn@UW drop-box) • Each area coordinator • Will provide a test problem for you to test your GPU implementation • Will also assist you with questions related to the non-programming aspects (the “theory”) behind the topic you chose • You can continue your Midterm Project (MP) and have it become your Final Project (FP) • In this case you will be expected to show how the FP implementation is superior to your MP implementation • Other issues • HW5 due tonight at 11:59 PM • Use Learn@UW drop-box to submit homework

  3. Finite Element Analysison the GPU?Krishnan Sureshsuresh@engr.wisc.eduAssociate Professor

  4. Finite Element Analysis • Computer simulation of engineering models • Physics: • Structural, thermal, fluid, … • Mode: • Static, modal, transient • Linear, non-linear, multi-physics

  5. [Gordon; JPL] Why GPU? Hours or even days of CPU time.

  6. Question Can one exploit graphics programmable units (GPU) to speed-up Finite Element analysis? +

  7. Discretize Element Stiffness Assemble/ Solve Post- process Structural Static FEA Model

  8. Nonlinear Optimization FEA: Variations Order/Hybrid Direct/Iterative Tet/Hex/… Model Discretize Element Stiffness Assemble/ Solve Post- process

  9. FEA: Challenges Order/Hybrid Direct/Iterative Tet/Hex/… Model Discretize Element Stiffness Assemble/ Solve Post- process • Accuracy • Automation • Speed Nonlinear Optimization

  10. Discretize Element Stiffness Post- process Typical Bottleneck Model Assemble/ Solve

  11. Discretize GPU & Engineering Analysis Model GPU? CPU Not a good candidate for GPU!? Discretization • Data: Small b-rep (+) • Logic: Complex (-) • Threads: Few (-)

  12. Hex 2nd Order Element Stiffness Hex Hybrid Element Stiffness Model Discretize CPU CPU GPU? Element Stiffness • Data: O(N) (+/-) • Logic: Simple (+) • Threads: N (+)

  13. (27 Nodes) Stiffness: Hex 2nd Order (8 Corners) • 8 Corners~100 Bytes Data (x y z) • 27 Nodes~ M = 81 DOF (u v w) • kij ~ Gaussian integration • 30 flops

  14. Discretize Element Stiffness Typical Bottleneck Model Assemble/ Solve

  15. Direct vs. Iterative K is sparse & usually symmetric P.D Iterative Direct (GPU Variation: Assembly-free) Note: Nvidia offers CuBLAS-3 dense matrix library

  16. Direct Sparse on GPU (1) (2006)

  17. Direct Sparse on GPU (1)

  18. Direct Sparse on GPU (1)

  19. Direct Sparse on GPU (2) (2008)

  20. Direct Sparse on GPU (2)

  21. Iterative Sparse on GPU (1) (2008) • Jacobi preconditioned conjugate gradient • ATI GPU • Speed-up 3.5.

  22. Iterative Sparse on GPU (2) • Double precision real world SpMv • CPU (2.3 GHz Dual Xeon): 1 GFLOPS • GPU (GTX 280): 16 GFLOPS • Speedup ~ 16

  23. FEA/GPU Class Projects? • Complete < 6 weeks • Important (publishable) • Pilot code

  24. FEA/GPU Class Projects? • GPU Friendly Preconditioners for Thin Structures • Research papers • OpenCL and ViennaCL Pilot Code • Topology Optimization • Research papers • CUDA code • Others • Can discuss …

  25. Thin Structure?

  26. Thin Structure? Large K

  27. Preconditioners? • Iterative Methods: • GPU methods available for K*u • Typical preconditioners: simple Jacobi, … • Poor preconditioner … slow convergence • Objective: • GPU friendly preconditioner for thin structures

  28. Research Publication

  29. Basic Idea

  30. Algorithm

  31. Why Preconditioner?

  32. Why Double Precision?

  33. How Expensive is Preconditioner?

  34. GPU Friendly Speed-up with Preconditioner Speed-up without Preconditioner

  35. FEA/GPU Class Projects? • GPU Friendly Preconditioners for Thin Structures • Research papers • OpenCL and ViennaCL Pilot Code • Topology Optimization • Research papers • CUDA code • Others • Can discuss …

  36. D Topology Optimization V = 50% Stiffest topology for a given volume? Where to remove material? [Sigmund 2001] Multi Objective + Topology Optimization = MOTO

  37. Demo Matlab code www.ersl.wisc.edu

  38. Pareto Optimal Designs • Purely pareto optimal

  39. Comparison D

  40. SIMP Pareto-Method 3-D

  41. 3-D GPU Implementation Multi-grid Topology Optimization on the GPU (IDETC conf. 2011)

  42. Motivation for Topic 4:Sparse Direct Solver

  43. Nomenclature& Simplifying Assumptions

  44. The Schur Complement Problem inMulti-Body Dynamics Applications

  45. Formulation Framework • Position: • Orientation: Euler parameters, • Translational Velocity: • Angular velocities

  46. Constrained Equations of Motion

  47. Numerical Solution of the Newton-Euler Constrained Equations of Motion • One has to solve a set of Differential Algebraic Equations (DAEs) to find the time evolution of a mechanical system • Most often the numerical solution of the DAEs requires the solution of a linear system of the form:

  48. Approach Followed • First solve the “Reduced System” for : • Then recover accelerations

  49. Iterative Solution of the Reduced System • Define positive definite Reduced Matrix • Preconditioned Conjugate Gradient • requires computation at time of • requires preconditioning:

  50. Computing Time step n, iteration (k): • A thread is associated with each body • We’ll look at how thread 9 does its share of work to compute

More Related