1 / 14

M3D Scaling studies

This study focuses on optimizing M3D calculations for petascale computers, including strong and weak scaling tests and strategies for improving data copy efficiency.

flanders
Download Presentation

M3D Scaling studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M3D Scaling studies Jin Chen M3D Group M3D meeting, Sept 08, 2006 M3D meeting, Sept 29, 2006

  2. Motivation To prepare for petascale calculations: Total M3D Time = 726.879607 M3d-Par 13392 2.0653e+02 KSPSolve 241 2.7470e+02 • KSPSolve to solve linear systems arising from finite-element descritization of elliptic equations • m3d2par and Par2m3d to copy data back and forth between fortran data datadistribution and petsc data layout Their efficiencies are critical for optimization on petascale computers.

  3. Outline • 3D (r,θ,φ) strong scaling • 3D (r,θ,φ) weak scaling • 1D (φ) weak scaling • 2D (r,θ) weak scaling • Solutions

  4. 3D (r,θ,φ) strong scaling nstp=20. Configuration Parameters: A 32 32 32 B 8 8 8 C 436 436 436 D 2 2 2 E 5 5 5 F 11 17 23

  5. 3D (r,θ,φ) weak scaling nstp=20. Configuration Parameters: A 16 32 64 B 4 8 16 C 283 356 398 D 1 2 2 E 4 5 6 F 4 7 11

  6. 1D (φ) weak scaling nstp=100. Configuration Parameters: Changing A B 16 4 32 8 64 16 128 32 256 64 512 128 1024 256 2048 512 Fixing: C=201,D=1,E=4,F=4

  7. 2D (r,θ) weak scaling nstp=100. Configuration Parameters: Fixing: A=16, B=4 Changing C D E F 283 1 4 4 356 2 5 7 398 2 6 11 427 2 7 15 446 2 8 19 459 2 9 23 469 2 10 27 479 2 11 31 487 2 12 35 496 2 13 39 501 2 14 43 505 2 15 47

  8. Strategy to improve M3D-PAR data copy • Reduce toroidal ghost changes (m3d2par, m3d2part) • Different poloidal partition to reduce poloidal ghost changes: 2 times faster on seaborg

  9. Strategy to improve M3D-PAR data copy – II

  10. 1D (φ) weak scaling for G Fu on Jaguar R=201

  11. 2D (r,θ) weak scaling – improved 9/29/06

  12. 3D (r,θ,φ) weak scaling – improved 9/29/06

  13. Problems fix on Jaguar • Runtime memory limitation • Solution: use only 1 processor per node yod –SN m3dp_fsymm_opt.x … • Code crashes when the number of processor increases from 2048 to 3076 or 4096: module load gmalloc link –gmalloc as the last library to build m3dp.x • Wait too long when debugging code • We need dedicated time to fix bugs only appeared on large number of processors. • Fortran static array (stack) yod –SN –stack 500M m3dp_fsymm_opt.x …

  14. BGL • I got an account.

More Related