1 / 43

Parallelizing finite element PDE solvers in an object-oriented framework

Parallelizing finite element PDE solvers in an object-oriented framework. Xing Cai Department of Informatics University of Oslo. Outline of the Talk. Introduction & background 3 parallelization approach es Implementational aspects Numerical experiments. Faculty. Post Docs.

romney
Download Presentation

Parallelizing finite element PDE solvers in an object-oriented framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo

  2. Outline of the Talk • Introduction & background • 3 parallelization approaches • Implementational aspects • Numerical experiments

  3. Faculty Post Docs Ph.D. Students Part-time Aslak Tveito Glenn Terje Lines Linda Ingebrigtsen Are Magnus Bruaset (NO) Hans Petter Langtangen Aicha Bounaim Ola Skavhaug Øyvind Hjelle(SINTEF) Xing Cai Wen Chen Joakim Sundnes Bjørn Fredrik Nielsen (NR) Åsmund Ødegård Kent Andre Mardal Knut Andreas Lie (SINTEF) The Scientific Software Group Department of Informatics, University of Oslo http://www.ifi.uio.no/~tpv Tom Thorvaldsen

  4. Projects Simulation of electrical activity in human heart Simulation of the diastolic left ventricle Numerical methods for option pricing Software for numerical solution of PDEs Scientific computing using a Linux-cluster Finite element modelling of ultrasound wave propagation Multi-physics models by domain decomposition methods Scripting techniques for scientific computing Numerical modelling of reactive fluid flow in porous media http://www.ifi.uio.no/~tpv

  5. Diffpack • O-O software environment for scientific computation (C++) • Rich collection of PDE solution components - portable, flexible, extensible • http://www.nobjects.com • H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

  6. Structural mechanics Porous media flow Water waves Aero- dynamics Stochastic PDEs Incompressible flow Heat transfer Other PDE applications The Diffpack Philosophy I/O FDM Grid FEM Field Matrix Vector Ax=b

  7. The Question Starting point: sequential PDE solver How to do the parallelization? We need • a good parallelization strategy • a good and simple implementation of the strategy Resulting parallel solvers should have • good parallel efficiency • good overall numerical performance

  8. A generic finite element PDE solver • Time stepping t0, t1, t2… • Spatial discretization (computational grid) • Solution of nonlinear problems • Solution of linearized problems • Iterative solution of Ax=b

  9. An observation • The computation-intensive part is the iterative solution ofAx=b • A parallel finite element PDE solver needs to run the linear algebra operations in parallel • vector addition • inner-product of two vectors • matrix-vector product

  10. Several parallelization options • Automatic compiler parallelization • Loop-level parallelization (special compilation directives) • Domain decomposition • divide-and-conquer • fully distributed computing • flexible • high parallel efficiency

  11. A natural parallelization of PDE solvers • The global solution domain is partitioned into many smaller sub-domains • One sub-domain works as a ”unit”, with its sub-matrices and sub-vectors • No need to create global matrices and vectors physically • The global linear algebra operations can be realized by local operations + inter-processor communication

  12. Gridpartition

  13. Linear-algebra level parallelization • A SPMD model • Reuse of existing code for local linear algebra operations • Need new code for the parallelization specific tasks • grid partition (non-overlapping, overlapping) • inter-processor communication routines

  14. Object orientation • An add-on ”toolbox” containing all the parallelization specific codes • The ”toolbox” has many high-level routines • The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications • A seamless coupling between the huge sequential libraries and the add-on toolbox

  15. Straightforward Parallelization • Develop a sequential simulator, without paying attention to parallelism • Follow the Diffpack coding standards • Use the add-on toolbox for parallel computing • Add a few new statements for transformation to a parallel simulator

  16. A Simple Coding Example GridPartAdm* adm; // access to parallelization functionality LinEqAdm* lineq; // administrator for linear system & solver // ... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif // ... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

  17. Solving an elliptic PDE Highly unstructured grid Discontinuity in the coefficient K

  18. Measurements 130,561 degrees of freedom Overlapping subgrids Global BiCGStab using (block) ILU prec.

  19. Parallel Vortex-Shedding Simulation incompressible Navier-Stokes solved by a pressure correction method

  20. Simulation Snapshots Pressure

  21. Some CPU Measurements The pressure equation is solved by the CG method with “subdomain-wise” MILU prec.

  22. Animated Pressure Field

  23. Domain Decomposition • Solution of the original large problem through iteratively solving many smaller subproblems • Can be used as solution method or preconditioner • Flexibility -- localized treatment of irregular geometries, singularities etc • Very efficient numerical methods -- even on sequential computers • Suitable for coarse grained parallelization

  24. Overlapping DD Example: Solving the Poisson problem on the unit square

  25. Observations • DD is a good parallelization strategy • The approach is not PDE-specific • A program for the original global problem can be reused (modulo B.C.) for each subdomain • Must communicate overlapping point values • No need for global data • Data distribution implied • Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)

  26. Goals for the Implementation • Reuse sequential solver as subdomain solver • Add DD management and communication as separate modules • Collect common operations in generic library modules • Flexibility and portability • Simplified parallelization process for the end-user

  27. Generic Programming Framework

  28. SubdomainSimulator SubdomainFEMSolver Administrator Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator { // … just a small amount of code virtual void createLocalMatrix () { Simulator::makeSystem (); } }; Simulator SimulatorP

  29. Application • Poisson equation on unit square • DD as the global solution method • Subdomain solvers use CG+FFT • Fixed number of subdomains M=32 (independent of P) • Straightforward parallelization of an existing simulator P: number of processors

  30. A large scale problem Solving an elliptic boundary value problem on an unstructured grid

  31. Combined Approach • Use a CG-like method as basic solver (i.e. use a parallelized Diffpack linear solver) • Use DD as preconditioner (i.e. SimulatorP is invoked as a preconditioning solve) • Combine with coarse grid correction • CG-like method + DD prec. is normally faster than DD as a basic solver

  32. Elasticity • Test case: 2D linear elasticity, 241 x 241 global grid. • Vector equation • Straightforward parallelization based on an existing Diffpack simulator

  33. 2D Linear Elasticity • BiCGStab + DD prec. as global solver • Multigrid V-cycle in subdomain solves • I: number of global BiCGStab iterations needed • P: number of processors (P=#subdomains)

  34. 2D Linear Elasticity

  35. Two-Phase Porous Media Flow SEQ: PEQ: BiCGStab + DD prec. for global pressure eq. Multigrid V-cycle in subdomain solves

  36. Two-phase Porous Media Flow History of water saturation propagation

  37. Nonlinear Water Waves • Fully nonlinear 3D water waves • Primary unknowns:

  38. Nonlinear Water Waves • CG + DD prec. for global solver • Multigrid V-cycle as subdomain solver • Fixed number of subdomains M=16 (independent of P) • Subgrids from partition of a global 41x41x41 grid

  39. Parallel Simulation of 3D Acoustic Field • A linux-cluster: 48 Pentium-III 500Mhz procs, 100 Mbit interconnection • SGI Cray Origin 2000: MIPS R10000 • LAL parallelization; 2 cases: • Linear Model (linear wave equation), solved with an explicit method • Nonlinear Model, solved with an implicit method

  40. Mathematical Nonlinear Model

  41. Results - Linear Model

  42. Results - Nonlinear Model

  43. Summary • Goal: provide software and programming rules for easy parallelization of sequential simulators • Applicable to a wide range of PDE problems • Three parallelization approaches: • parallelization at the linear algebra level: “automatic” parallelization • domain decomposition: very flexible, compact visible code/algorithm • combined approach • Performance: satisfactory speed-up http://www.ifi.uio.no/~tpv

More Related