1 / 16

Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG

Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG. The Sheffield Advanced Code. The Sheffield Advanced Code (SAC) is a novel fully non-linear MHD code based on the Versatile Advection Code (VAC) designed for simulations of linear and non-linear wave propagation

marv
Download Presentation

Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG

  2. The Sheffield Advanced Code • The Sheffield Advanced Code (SAC) is a novel fully non-linear MHD code based on the Versatile Advection Code (VAC) • designed for simulations of linear and non-linear wave propagation • with gravitationally strongly stratified magnetised plasma. • Shelyag, S.; Fedun, V.; Erdélyi, R. Astronomy and Astrophysics, Volume 486, Issue 2, 2008, pp.655-662

  3. Full Perturbed MHD Equations for Stratified media

  4. Numerical Diffusion • Central differencing can generate numerical instabilities • Difficult to find solutions for shocked systems • We define a hyperviscosity parameter which is the ratio of the forward difference of a parameter to third order and first order • Tracking evolution of the hyperviscosity we can identify numerical noise and apply smoothing where necessary

  5. Why MHD Using GPU’s? • Consider a simplified 2d problem • Solving flux equation • Derivative using central diffrencing • Time step using RungeKutta • Excellent scaling with GPU’s but, • Central differencing requires numerical stabilisation • Stabilisation with GPU’s trickier, requires • Reduction/maximum routine • An additional and larger mesh

  6. Halo Messaging • Each proc has a “ghost” layer • Used in calculation of update • Obtained from neighbouring left and right processors • Pass top and bottom layers to neighbouring processors • Become neighbours ghost layers • Distribute rows over processors N/nproc rows per proc • Every processor stores all N columns • SMAUG-MPI implements messaging using a 2D halo model for 2D and 3D halo model for 3D • Consider a 2d model – for simplicity distribute layers over a line of processes

  7. N+1 N Processor 1 p1min p2max p1min p2max Processor 2 p2min Send top layer p3max Receive bottom layer p2min p3max Processor 3 Send bottom layer Receive top layer Processor 4 1 N+1

  8. MPI Implementation • Based on halo messaging technique employed in SAC code void exchange_halo(vector v) { gather halo data from v into gpu_buffer1cudaMemcpy(host_buffer1, gpu_buffer1,...);MPI_Isend(host_buffer1,...,destination,...);MPI_Irecv(host_buffer2,...,source,...); MPI_Waitall(...); cudaMemcpy(gpu_buffer2,host_buffer2,...); scatter halo data from gpu_buffer2 to halo regions in v}

  9. Halo Messaging with GPU Direct • Simpler faster call structure void exchange_halo(vector v) { gather halo data from v into gpu_buffer1MPI_Isend(gpu_buffer1,...,destination...);MPI_IRecv(gpu_buffer2,...,source...) MPI_Waitall(...); scatter halo data from gpu_buffer2 to halo regions in v}

  10. Progress with MPI Implementation • Successfully running two dimensional models under GPU direct • Wilkes GPU cluster at The University of Cambridge • N8 - GPU Facility, Iceberg • 2D MPI version is verified • Currently optimising communications performance under GPU direct • 3D MPI implementation is already implemented still requires testing

  11. Orszag-Tang Test 200x200 Model at t=0.1, t=0.26, t=0.42 and t=0.58s

  12. A Model of Wave Propagation in the Magnetised Solar Atmmosphere The model features a Flux Tube with Torsional Driver, with a fully stratified quiet solar atmosphere based on VALIIIC Grid size is 128x128x128, representing a box in the solar atmosphere of dimensions 1.5x2x2Mm Flux tube has a magnetic field strength of 1000G Driver Amplitude 200km/s

  13. Timing for Orszag-Tang Using SAC/SMAUG with Different Architetures

  14. Performance Results (Hyperdiffusion disabled) • Timings in seconds for 100 iterations (Orszag-Tang test)

  15. Performance Results (With Hyperdiffusion enabled) • Timings in seconds for 100 iterations (Orszag-Tang test)

  16. Conclusions • We have demonstrated that we can successfully compute large problems by distributing across multiple GPUs • For 2D problems the performance using messaging with and without GPUdirect is similar. • This is expected to change when 3D models are tested • It is likely that much of the communications overhead arises from routines used transfer data within the GPU memory • Performance enhancements possible through application architecture modification • Further work needed with larger models for comparisons with X86 implementation using MPI • The algorithm has been implemented in 3D testing of 3D models will be undertaken over the forthcoming weeks

More Related