Massively Parallel Magnetohydrodynamics on the Cray XT3

Massively Parallel Magnetohydrodynamics on the Cray XT3 Joshua Breslau and Jin Chen Princeton Plasma Physics Laboratory Cray XT3 Technical Workshop Nashville, TN February 28, 2007

Motivation: Modeling Magnetic Confinement Fusion Experiments NSTX (Spherical Torus) ITER (Advanced Tokamak) NCSX (Compact Stellarator)

Characteristics of Magnetic Confinement Fusion Experiments • Multispecies hydrogen plasma at ~108°C (Te and Ti may differ): low collisionality, high electrical conductivity. • Toroidal topology with complex boundary geometry. • Strong toroidal magnetic field giving highly anisotropic transport, low . • Rotational transform gives nested flux surfaces. • Spatial scales range from electron skin depth, ~10-4 m to major radius, ~6 m. • Time scales range from Alfvén wave transit time (~s) to discharge time, ~100 s. • Susceptible to microinstabilities leading to loss of energy confinement; and macroinstabilities leading to large-scale rearrangment of plasma and possible disruption.

Extended MHD Equations

Physics models include ideal and resistive MHD; two-fluid; or hybrid with kinetic ions. Field and velocity variables are expressed in terms of potentials, keeping B divergence-free and separating compressible and incompressible components of flow. Uses linear, 2nd, or 3rd-order finite elements in-plane on an unstructured triangular mesh. Uses 4th-order finite differences between planes or pseudo-spectral derivatives. Partially implicit treatment allows efficient advance over dissipative time scales but requires small time steps relative to A. Linear and nonlinear modes of operation are available. The PETSc library is used for parallelization and linear solves with Krylov methods. M3D (multi-level 3D) is a 3D nonlinear extended MHD code in toroidal geometry maintained by a multi-institutional collaboration, designed for the study of macroscopic instabilities in tokamaks and stellarators. The M3D Code

M3D Mesh Single plane Full torus Radial zones align with flux surfaces. 2n+1 planes needed to resolve toroidal mode #n.

D = 1 F = 5 D = 3 F = 3 B = 16 Domain Decomposition Toroidal (overhead view) Poloidal (cross-section view) or Linear solves independent on each processor Linear solves parallel over processors

Porting M3D to the XT3 • Previously run on Cray T3E, IBM SP, SGI Origin 2000. • Few modifications to source code were necessary for the new platform. • Installation of HYPRE preconditioner in PETSc library made possible faster inversion of symmetric form of linear operators using CG. • Reducing interprocessor communication was key to improving scaling.

1D Weak Scaling, Single Core (SN) poloidal domain decomposition 560 radial zones (626,081 vertices/plane), 16 poloidal processors, 4 planes/processor, 4-320 toroidal processors

1D Weak Scaling, Dual Core (VN) 398 radial zones (316,013 vertices/plane), 16 poloidal processors, 4 planes/processor, 4-640 toroidal processors

3D Weak Scaling, Single Core Smallest run has 64 planes, 160,000 vertices/plane on 16 toroidal x 4 poloidal processors. Successive runs increase number of poloidal processors by 4, number of toroidal processors by 12, while maintaining 4 planes, 40,000 vertices/processor.

3D Strong Scaling, Single Core All runs have 32 planes, 474,151 vertices/plane. Smallest has 8 toroidal x 12 poloidal processors. Successive runs increase number of poloidal processors by 6, double number of toroidal processors.

Sample Application: CDX Sawteeth Low temperature m=1, n=1 island X-point (site of reconnection) • Small laboratory tokamak • Oscillations in X-ray signal during discharge consistent with sudden outward shift of hot plasma • Objective: predict effect and conditions for onset of instability. High temperature core q=1 surface (inversion radius)

Initialization • Equilibrium taken from a transport-timescale code. •  ~ 3.3% • q0 0.922 • Sawtooth instability is predicted when q0 is sufficiently below 1. toroidal current density Linear n=1 eigenmode: A  6  10-4 Perturbed temperature Perturbed current density Velocity stream function

Nonlinear Results24 planes, 79 radial grids24 toroidal x 6 poloidal processors221,856 vertices on 144 Jaguar CPUs (VN mode) Kinetic energy, by toroidal mode number 13,920 CPU hours (96:40 wallclock hours) Poincaré Sections

Conclusions • The XT3 has been a productive environment for tokamak simulations with M3D. • Improved scaling can be expected with the faster interconnects on the XT4. • Scaling to thousands of processors has been demonstrated, but may be impractical for real applications while the code remains explicit.

Massively Parallel Magnetohydrodynamics on the Cray XT3

Massively Parallel Magnetohydrodynamics on the Cray XT3

Presentation Transcript

Massively Parallel Processors

Massively Parallel LDPC Decoding on GPU

Magnetohydrodynamics

Massively Parallel Cloud Data Storage Systems

Programming Massively Parallel Graphics Processors

Massively Parallel/Distributed Data Storage Systems

Emulating Massively Parallel (Peta FLOPS ) Machines

A Massively Parallel Architecture for Bioinformatics

MagnetoHydroDynamics

Profiling S3D on Cray XT3 using TAU

Cray Supercomputers: The Cray X1

Cray XT3 Experience so far

Large Scale Visualization on the Cray XT3 Using ParaView

Massively Parallel Multgrid for Finite Elements

Massively Parallel Signature Sequencing (MPSS)

Steering Massively Parallel Applications Under Python

CM-5 Massively Parallel Supercomputer

Massively Parallel Database Dump and Reload

Massively Parallel Computing for Protein Alignment

Massively Parallel Cloud Data Storage Systems

Large Scale Visualization on the Cray XT3 Using ParaView