1 / 6

NAMD Parallel Performance on Ranger: MPI Tuning

NAMD Parallel Performance on Ranger: MPI Tuning. Philip Blood Scientific Specialist Pittsburgh Supercomputing Center. NAMD. NAMD ( NA noscale M olecular D ynamics) is a highly scalable molecular dynamics code used ubiquitously on the TeraGrid and other HPC systems

manchu
Download Presentation

NAMD Parallel Performance on Ranger: MPI Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NAMD Parallel Performance on Ranger: MPI Tuning Philip Blood Scientific Specialist Pittsburgh Supercomputing Center

  2. NAMD • NAMD (NAnoscale Molecular Dynamics) is a highly scalable molecular dynamics code used ubiquitously on the TeraGrid and other HPC systems • Improving parallel performance without touching the code: • Tune NAMD input parameters for specific systems • Tune MPI implementations for NAMD’s message-driven Charm++ parallel framework Image courtesy of Ivaylo Ivanov and J. Andrew McCammon, UCSD

  3. Results of Tuning: Apoa1 Benchmark • 92224 atoms (protein + water) • 1 angstrom PME grid • PME every 4 timesteps (1fs step) • NVE

  4. Results of Initial Tuning: Actual User Simulation

  5. Tuning OpenMPI 1.2.6 to Improve NAMD Scaling on Ranger • It is important that eager message passing is optimized to efficiently handle the message-driven execution used by Charm++ in NAMD. • The following pertains to the OpenMPI 1.2.6 installation and default settings on Ranger: • Set cpu and memory affinity: export OMPI_MCA_mpi_paffinity_alone=1 • Turn off use of RDMA for eager message passing over infiniband: export OMPI_MCA_btl_openib_use_eager_rdma=0 • Increase eager limit over infiniband from 12K to just under 32K: export OMPI_MCA_btl_openib_eager_limit=32767 • setting it one byte shy of 32K is significant for some reason, perhaps because btl_openib_min_send_size is 32768. This and the max send size could be other parameters to try tweaking. • Increase self eager limit: export OMPI_MCA_btl_self_eager_limit=32767 • Increase sm eager limit: export OMPI_MCA_btl_sm_eager_limit=32767. • You may need to reduce this to OMPI_MCA_btl_sm_eager_limit=16384 for higher processor counts if you run out of memory

  6. Tuning MVAPICH 1.0.1 for NAMD on Ranger • For these benchmarks: • Turn off RDMA for short messages: • VIADEV_ADAPTIVE_RDMA_LIMIT=0 • Set overall eager message size limit: • VIADEV_RENDEVOUS_THRESHOLD=50000 • Set eager limit and buffer size for intranode (shared-memory) communication: • VIADEV_SMP_EAGERSIZE=64 • VIADEV_SMPI_LENGTH_QUEUE=256 • NO tacc_affinity: at some processor counts tacc_affinity may help • For higher processor counts may need to adjust VIADEV_SMP_EAGERSIZE and VIADEV_SMPI_LENGTH_QUEUE

More Related