Richard R. Drake Dept 1443 - Computational Multiphysics , Sandia National Laboratories

The ALEGRA Production Application: Strategy, Challenges and Progress Toward Next Generation Platforms Richard R. Drake Dept 1443 - Computational Multiphysics, Sandia National Laboratories ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics, stochastic damage modeling and detailed interface mechanics in high strain rate regimes on unstructured meshes in an ALE framework. Nearly all the algorithms must accept dynamic, mixed-material elements, which are modified by remeshing, interface reconstruction, and advection components. Recent trends in computing hardware have forced application developers to think about how to address and improve performance on traditional CPUs and to look forward to next generation platforms. Core to the ALEGRA performance strategy is to improve and rewrite loop bodies to be conformant with the requirements of high performance kernels, such as accessing data in array form, no pointer dereferencing, no function calls, and thread safety. Necessary to achieve this, however, are changes to the underlying infrastructure. We report on recent progress in the infrastructure to support array-based data access and on iteration of mesh objects. The effects on performance on traditional platforms will be shown. We also discuss the practical realities and cost estimates for attempting to move an existing full featured production application like ALEGRA toward running effectively on future platforms and being maintainable at the same time. Algorithms & Abstractions for Assembly in PDE Codes, May 12-14, 2014

ALEGRA: Shock Hydro & MHD • 20 years of development & evolution • Operator split, multi-physics • Includes explicit and implicit PDE solvers • 2 and 3 spatial dimensions • Core hydro is multi-material Lagrangian plus remap • An XFEM capability is maturing • 650k LOC (not including libraries, such as Trilinos) • Mix of research, development, and production capabilities • Extensive material model choices Extensive material model choices 2D Magnetics 3D Resistive MHD Shock hydro

Some ALEGRA Core Algorithms • Mixed material cell treatment • Remap • Remesh • Material interface reconstruction • Material & field advection • Dynamic topology • Extended Finite Element Method (XFEM) • Spatial refinement/unrefinement • Flexible set of material models comprising each material • Central difference and midpoint time integration options Swept volume & intersection remap Material interface reconstruction XFEM requires topological enrichment

NEVADA Infrastructure (A Framework) In-SituViz Input Parsing In-Situ Processing Remesh Spatial Adaptivity Interface Reconstruction Structured Mesh XFEM Adaptivity Advection Unstructured Mesh Materials Load Balancing Halo Comm Physics Algorithms Contact Field I/O Everything depends on the “Mesh”

Performance • We need to run faster ! • Customer needs • NW needs • Optics (marketing) Can’t rely on faster CPUs anymore ! Muzia, 2D • It has become clear that: • There is no performance silver bullet • Application software must change • This will require a resource shift 60% 56%

The ALEGRA Performance Strategy Work in the present but aim for the future. • Focus on foundational concepts • Accessing bulk data in array form • Limit pointer dereferencing • Limit function calls (non-inlined) • Minimize the data read/writes • Thread safety • Incrementally reimplement algorithms • Remesh, interface reconstruction, advection • Lagrangian step pieces • Matrix assembly coding • Time step size computation • Refactor support infrastructure • Enable array-based access • Enable flat indexed based iteration • Enable thread safety (colorings?) • Consider new algorithms • Alternate formulations • New/different algorithms [Komatitsch]

Progress in Data Layout Object-based layout v1 v2 v3 “Transpose” the storage v4 . . . . . . Array-based layout obj_idx Common, existing access pattern: ndVector_Var( CURCOOR ) 0 1 2 “double**” Becomes, in object layout: in array layout: nddata[ CURCOOR ] v1 nddata[ CURCOOR ][ ndobj_idx ] v2 v3 v4 • Object-based layout has more direct access to memory. • Array-based layout has better cache & TLB behavior. • Depending on the algorithm and problem size, the better memory behavior may or may not offset the extra dereferencing. Indexed by “obj_idx”

Speedups: Object- versus Array-Based • Comparisons of unmodified versus array-based code • Intel chips: RedSky=Nehalem, TLCC2=SandyBridge • The memory behavior wins over the extra offset in many cases.

Algorithms Should Usethe Arrays Directly Object-based access: Element * el = 0; TOTAL_ELEMENT_LOOP(el) { const Vector vara = el->Vector_Var( VARA_IDX ); Vector & varb = el->Vector_Var( VARB_IDX ); el->Vector_Var( VARA_IDX ) += varb; el->Scalar_Var( VARC_IDX ) = vara * varb; } (Oversimplified, hypothetical loop) Array-based access: ArrayView<Vector> vara = mesh->getField( VARA_IDX ); ArrayView<Vector> varb = mesh->getField( VARB_IDX ); ArrayView<double> varc = mesh->getField( VARC_IDX ); Element * el = 0; TOTAL_ELEMENT_LOOP(el) { constintei= el->Idx(); const Vector va= vara[ei]; vara[ei] += varb[ei]; varc[ei] = va * varb[ei]; }

Object List & Iteration Improvements Doubly linked lists: Index sets: List: Nodes: Convert to use integer offsets … … List: Nodes: 2 1 0 2 1 0 Data: Data: for ( inti=0; i<N; ++i ){ intni = index_list[i]; vel[ni] = old_vel + dt * accl[ni]; ... } Can now do this: • Index based mesh object storage • Enables iteration without dereferencing objects • Performance comparison shows no improvement  • Algorithms would have to take advantage first

Object Ordering Exploration Order elements by space filling curve Order nodes by first touch element loop [wikipedia] • Improve cache locality by mesh object ordering • Hmm? No speedups over default ordering 

Summary • ALEGRA has adopted a low risk performance strategy • Main concept: incrementally rewrite algorithms towards NGP standards • Progress made on support infrastructure • Array-based field data • Integer index set object looping • 1.4X speedup realized on realistic simulations • Work continues on infrastructure & algorithms • Data: Topology storage, integer field data, material data • Algorithms: Remap, Lagrangian step

Richard R. Drake Dept 1443 - Computational Multiphysics , Sandia National Laboratories

Richard R. Drake Dept 1443 - Computational Multiphysics , Sandia National Laboratories

Presentation Transcript

U.S. Department of Energy Sandia National Laboratories

Electronic Certified Payroll Sandia National Laboratories

HAZARDOUS WASTE PERMITS Sandia National Laboratories Albuquerque, New Mexico

Murat Okandan Sandia National Laboratories February 27, 2014

Mark D. Tucker (505)844-7264 mdtucke@sandia Sandia National Laboratories

FRMAC Assessment WG Arthur Shanks Sandia National Laboratories

Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories

Alan P. Zelicoff, MD Sandia National Laboratories

Sandia National Laboratories Clinic Project: CASC Security System

Climbing the Technical Ladder at Sandia National Laboratories

Overview and Current Status Ron Brightwell Sandia National Laboratories

Richard Nygren (Sandia)

Data Sciences at Sandia National Laboratories

Sandia National Laboratories Site Update

Sandia National Laboratories Engineering Standards Program SCIP

Sandia National Laboratories Mobility Project Updates

Software Asset Management at Sandia National Laboratories

About Sandia National Laboratories

About Sandia National Laboratories

Climbing the Technical Ladder at Sandia National Laboratories

Sandia National Laboratories Clinic Project: CASC Security System