1 / 30

Outline:

Scaling the High Order Method Modeling Environment (HOMME) on Blue Gene/L Dr. Richard D. Loft Scientific Computing Division National Center for Atmospheric Research loft@ucar.edu. Outline:. Scalable system: Blue Gene/L Scalable model: HOMME

pahana
Download Presentation

Outline:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling the High Order Method Modeling Environment (HOMME) on Blue Gene/LDr. Richard D. LoftScientific Computing DivisionNational Center for Atmospheric Researchloft@ucar.edu

  2. Outline: • Scalable system: Blue Gene/L • Scalable model: HOMME • Explicit Dynamics: using SFC’s and process mapping to get scalability • Extensions to POP (Dennis) • Limitations and Future Directions

  3. Blue Gene/L: PetaFlops prototype • Blue Gene/L’s Petascale “DNA” • Massive parallelism. (up to 130 K cores) • Low power per core. (~12W/core) • High component reliability. • Achieves high packaging density. (2048 pes/rack) • Dedicated reduction network. (solver scalability) • Conventional programming model (usability) • xlf90, xlcc compiler • MPI

  4. Blue Gene/L @ NCAR “Frost”

  5. HOMME Framework

  6. HOMME Project Participants • Core Development Team (all NCAR) • John Dennis (POP scalability) • Jim Edwards • Ram Nair • Amik St-Cyr • Steve Thomas (talking about timestepping schemes) • Henry Tufo • Collaborators: • Hae-Won Choi, UCB postdoc • Jack Chen, UCB postdoc • Vani Cheruvu, NCAR ASP postdoc • Mike Levy, UCB graduate student • Michael Oberg, UCB undergraduate student • Phil Rasch, NCAR • Mark Taylor, Sandia • Theron Voran, UCB graduate student • Funding from NSF and DOE

  7. HOMME Framework • HOMME = High-Order Method Modeling Environment • Framework for developing scalable and efficient General Atmospheric Circulation Models (GACMs) to support climate science. • Serves as a prototype for the Community Atmospheric Model (CAM) component of the Community Climate System Model (CCSM). • Designed for high-order methods (e.g. spectral element and discontinuous Galerkin methods) on the cubed-sphere. • Configurable for shallow water and (dry/moist) primitive equations (hydrostatic). • Support for: • explicit and semi-implicit time stepping. • several vertical discretization schemes (e.g., Lin vertical Lagrangian method). • geometrically non-conforming elements and dynamically adaptive meshes (AMR).

  8. Advantages of High-Order Methods • Algorithmic Advantages of High Order Methods • h-p element-based method on quadrilaterals (Ne x N) • Exponential convergence in polynomial degree (N) • Computational Advantages of High Order Methods • Naturally cache-blocked N x N computations • Nearest-neighbor communication between elements (explicit) • Well suited to parallel µprocessor systems

  9. Ne=16 Degree of non-uniformity Geometry - Cube-Sphere • Sphere is decomposed into 6 identical regions using a central projection (Sadourny, 1972) with equiangular grid (Rancic et al., 1996). • Avoids pole problems, quasi-uniform. • Non-orthogonal curvilinear coordinate system with identical metric terms

  10. Computational Mesh • Elements: • A quadrilateral “patch” of N x N gridpoints • Gauss-Lobatto Grid • Typ. N=8 • Cube • Ne = Elements on an edge • 6 x Ne x Ne elements total

  11. Key Points • Only C0 continuity or flux conservation is enforced across element interfaces. • Locally the mesh is structured with solution, data, and geometry expressed as sums of Nth-order tensor-product Lagrange polynomials based on the Gauss or Gauss-Lobatto quadrature points. • Globally the mesh is an unstructured array of deformed quadrilaterals (layered in 3D). • Exponential convergence (large N ideal for transitional flows because of minimal numerical dispersion and dissipation). • Geometrically nonconforming formulation provides additional meshing flexibility and adaptivity.

  12. Domain Decomposition • Mapping the elements to processors is achieved using Hilbert space-filling curves (Sagan, 1994; Dennis et al., 2006). • Generates the best partitioning mappings when Ne = 2n 3m, where n and m are positive integers. • (Have also examined Metis and Chaco but we’ve found SFC to be superior at large processors counts.)

  13. Partitioning a cube-sphere on 8 processors

  14. Partitioning a cubed-sphere on 8 processors

  15. Mapping to SFC’s to Torus Network • Must map 1-D list of SFC domains to MPI processes on 3-D torus intelligently. • Need to maximize torus locality. • Need to minimize wire contention. • Basic idea: snake processes through the torus as well.

  16. Default “Lexical” Mapping (2-D Example) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Coprocessor Mode

  17. Default “Lexical” Mapping (2-D Example) 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 8 24 9 25 10 26 11 27 12 28 13 29 14 30 15 31 Virtual Node Mode

  18. Desirable “Grouped” Mapping (2-D Example) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Virtual Node Mode

  19. 2x2 “Snaked” Mapping (2-D Example) 0 1 6 7 8 9 14 15 2 3 4 5 10 11 12 13 16 17 22 23 24 25 30 31 18 19 20 21 26 27 28 29 Virtual Node Mode

  20. 2x2 “Snaked” Mapping (2-D Example) 0 1 6 7 8 9 14 15 2 3 4 5 10 11 12 13 16 17 22 23 24 25 30 31 18 19 20 21 26 27 28 29 Virtual Node Mode

  21. BG/L HOMME - Moist Dynamics CPM gets 8 TFlops VNM Improvement due to snaking Sustained MFLOP per second per processor for moist Held-Suarez. Explicit integration Dt = 4 seconds. 6 X 128 X 128 elements, 96 vertical levels.

  22. BG/L HOMME - Moist Dynamics with Physics 11.3 TFlops Sustained MFLOP per second per processor for Aquaplanet with Emanuel physics. Explicit integration Dt = 4 seconds. 6 X 128 X 128 elements, 40 vertical levels.

  23. Limitations of work/Future directions • Explicit result - integration rate @ 10 km is too low for useful climate work. • Solution: solvers and preconditioners (Thomas) • Some progress here, but no data on large systems • Lots of parallelism in 3-D element left to exploit, particularly in physics. • Solution: redistribution of work between physics and dynamics components.

  24. Logical View of AGCM Dynamics/Physics Coupling PHYSICS DYNAMICS

  25. Hardware View of Dyamics/Physics Coupling on Blue Gene/L Begin with elements laid out for dynamics scheme

  26. Hardware View of Dynamics/Physics Coupling on Blue Gene/L Begin scattering the columns

  27. Hardware View of CRCP-HOMME Coupling on Blue Gene/L Continue scattering columns

  28. Hardware View of CRCP-HOMME Coupling on Blue Gene/L Continue scattering columns

  29. Hardware View of CRCP Physics Layout on Blue Gene/L Colors denote elements Physics Columns Simplified 2-d BG/L topology

  30. Questions?

More Related