1 / 13

NAMD and BG/L

Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu. NAMD and BG/L. Outline. BG/L Platform overview Optimization Efforts: Context Optimization Efforts: Approaches Topology Awareness

crete
Download Presentation

NAMD and BG/L

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu NAMD and BG/L

  2. Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Topology Awareness • Load Balancing • Parallelism • Computation/Communication Overlap • Results

  3. Bluegene/L Platform Review • Hardware characteristics: • PowerPC 440 700 Mhz 32-bit processors • 2 Processors per node, no cache coherence • 4MB L3 Cache • 512 MB memory per node • 6 outgoing FIFO links per node • 3D Torus interconnect

  4. Bluegene/L Platform Review (2) • Other characteristics: • Microkernel on compute nodes, minimal OS interference.

  5. Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Topology Awareness • Load Balancing • Parallelism • Computation/Communication Overlap • Results

  6. Objectives • Scale the 92,000 atom benchmark apoa1 as far as possible. • Sought understanding of scaling issues involved on the BG/L machine.

  7. Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Topology Awareness • Load Balancing • Parallelism • Computation/Communication Overlap • Results

  8. Topology Awareness • Distribute Patches according to the topology. • Logically align the NAMD 3D patch grid to BG/L's processor grid. • Patch Grid divided by Orthogonal Recursive Bisection (ORB) scheme. • Processor Grid is divided in similar proportions and assigned to corresponding Patch subgrids. • Topology aware spanning tree for multicasts.

  9. Load Balancing • Framework optimizations • Memory footprint had to be reduced to accommodate the desired number of processors. • Spanning Tree implemented to handle large numbers of incoming messages to pe 0. • Spread non-migratable work better • Bonded computations (eg. Dihedrals) allocated off processors with Patch work where possible.

  10. More Parallelism • 2-away computation. Patches interact with neighbors of neighbors. • User-tunable configuration option. • Break up compute objects. • Another User-tunable configuration option. • Balance tradeoffs in grainsize vs overheads. • PME pencil decomposition efforts.

  11. Overlap of Computation and Communication • Hurt by lack of cache-coherence. • One processor can serve as communication co-processor if the L1 caches are flushed for large messages. Hurts too much. • Make use of FIFO link buffers. Every so often in NAMD's outer loop, we make AdvanceCommunication() calls.

  12. Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Results

  13. Results Nodes Processors Mode Time (watson) 32 32 co 347 ms 128 128 co 97.2 ms 512 512 co 23.7 ms 1024 1024 co 13.8 ms 2048 2048 co 8.6 ms 4096 4096 co 6.2 ms 8192 Processor scaling was achieved at 5.2ms per step

More Related