1 / 18

Massively Parallel Magnetohydrodynamics on the Cray XT3

Massively Parallel Magnetohydrodynamics on the Cray XT3. Joshua Breslau and Jin Chen Princeton Plasma Physics Laboratory Cray XT3 Technical Workshop Nashville, TN February 28, 2007. Motivation: Modeling Magnetic Confinement Fusion Experiments. NSTX (Spherical Torus). ITER

maisie
Download Presentation

Massively Parallel Magnetohydrodynamics on the Cray XT3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Massively Parallel Magnetohydrodynamics on the Cray XT3 Joshua Breslau and Jin Chen Princeton Plasma Physics Laboratory Cray XT3 Technical Workshop Nashville, TN February 28, 2007

  2. Motivation: Modeling Magnetic Confinement Fusion Experiments NSTX (Spherical Torus) ITER (Advanced Tokamak) NCSX (Compact Stellarator)

  3. Characteristics of Magnetic Confinement Fusion Experiments • Multispecies hydrogen plasma at ~108°C (Te and Ti may differ): low collisionality, high electrical conductivity. • Toroidal topology with complex boundary geometry. • Strong toroidal magnetic field giving highly anisotropic transport, low . • Rotational transform gives nested flux surfaces. • Spatial scales range from electron skin depth, ~10-4 m to major radius, ~6 m. • Time scales range from Alfvén wave transit time (~s) to discharge time, ~100 s. • Susceptible to microinstabilities leading to loss of energy confinement; and macroinstabilities leading to large-scale rearrangment of plasma and possible disruption.

  4. Extended MHD Equations

  5. Physics models include ideal and resistive MHD; two-fluid; or hybrid with kinetic ions. Field and velocity variables are expressed in terms of potentials, keeping B divergence-free and separating compressible and incompressible components of flow. Uses linear, 2nd, or 3rd-order finite elements in-plane on an unstructured triangular mesh. Uses 4th-order finite differences between planes or pseudo-spectral derivatives. Partially implicit treatment allows efficient advance over dissipative time scales but requires small time steps relative to A. Linear and nonlinear modes of operation are available. The PETSc library is used for parallelization and linear solves with Krylov methods. M3D (multi-level 3D) is a 3D nonlinear extended MHD code in toroidal geometry maintained by a multi-institutional collaboration, designed for the study of macroscopic instabilities in tokamaks and stellarators. The M3D Code

  6. M3D Mesh Single plane Full torus Radial zones align with flux surfaces. 2n+1 planes needed to resolve toroidal mode #n.

  7. D = 1 F = 5 D = 3 F = 3 B = 16 Domain Decomposition Toroidal (overhead view) Poloidal (cross-section view) or Linear solves independent on each processor Linear solves parallel over processors

  8. Porting M3D to the XT3 • Previously run on Cray T3E, IBM SP, SGI Origin 2000. • Few modifications to source code were necessary for the new platform. • Installation of HYPRE preconditioner in PETSc library made possible faster inversion of symmetric form of linear operators using CG. • Reducing interprocessor communication was key to improving scaling.

  9. 1D Weak Scaling, Single Core (SN) poloidal domain decomposition 560 radial zones (626,081 vertices/plane), 16 poloidal processors, 4 planes/processor, 4-320 toroidal processors

  10. 1D Weak Scaling, Dual Core (VN) 398 radial zones (316,013 vertices/plane), 16 poloidal processors, 4 planes/processor, 4-640 toroidal processors

  11. 3D Weak Scaling, Single Core Smallest run has 64 planes, 160,000 vertices/plane on 16 toroidal x 4 poloidal processors. Successive runs increase number of poloidal processors by 4, number of toroidal processors by 12, while maintaining 4 planes, 40,000 vertices/processor.

  12. 3D Strong Scaling, Single Core All runs have 32 planes, 474,151 vertices/plane. Smallest has 8 toroidal x 12 poloidal processors. Successive runs increase number of poloidal processors by 6, double number of toroidal processors.

  13. 3D Strong Scaling, Single Core All runs have 128 planes, 474,151 vertices/plane. Smallest has 32 toroidal x 12 poloidal processors. Successive runs increase number of poloidal processors by 6, double number of toroidal processors.

  14. 3D Strong Scaling, Single Core All runs have 208 planes, 474,151 vertices/plane. Smallest has 52 toroidal x 12 poloidal processors. Successive runs increase number of poloidal processors by 6, double number of toroidal processors.

  15. Sample Application: CDX Sawteeth Low temperature m=1, n=1 island X-point (site of reconnection) • Small laboratory tokamak • Oscillations in X-ray signal during discharge consistent with sudden outward shift of hot plasma • Objective: predict effect and conditions for onset of instability. High temperature core q=1 surface (inversion radius)

  16. Initialization • Equilibrium taken from a transport-timescale code. •  ~ 3.3% • q0 0.922 • Sawtooth instability is predicted when q0 is sufficiently below 1. toroidal current density Linear n=1 eigenmode: A  6  10-4 Perturbed temperature Perturbed current density Velocity stream function

  17. Nonlinear Results24 planes, 79 radial grids24 toroidal x 6 poloidal processors221,856 vertices on 144 Jaguar CPUs (VN mode) Kinetic energy, by toroidal mode number 13,920 CPU hours (96:40 wallclock hours) Poincaré Sections

  18. Conclusions • The XT3 has been a productive environment for tokamak simulations with M3D. • Improved scaling can be expected with the faster interconnects on the XT4. • Scaling to thousands of processors has been demonstrated, but may be impractical for real applications while the code remains explicit.

More Related