1 / 23

230 likes | 543 Views

Abhinav Bhatelé Laxmikant V. Kalé. Application-specific Topology-aware Mapping for Three Dimensional Topologies. Outline. Motivation The Mapping Problem Static Mapping: 3D Stencil Load Balancing: NAMD Future Work. The network latency for wormhole routing is (L f /B)*D + L/B

Download Presentation
## Application-specific Topology-aware Mapping for Three Dimensional Topologies

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Abhinav Bhatelé**Laxmikant V. Kalé Application-specific Topology-aware Mapping for Three Dimensional Topologies**Outline**• Motivation • The Mapping Problem • Static Mapping: 3D Stencil • Load Balancing: NAMD • Future Work**The network latency for wormhole routing is**(Lf/B)*D + L/B Lf = Length of each flit, B = bandwidth D = number of hops, L = length of message Lionel M. Ni and Philip K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, Computer, Volume 26, Issue 2, pages 62-76, 1993**Message Latencies**NN = Near Neighbor, RND = Random**Hardware Latencies**• Blue Gene/L • Near neighbor: < 1 µs • Worst case: 7 µs • Blue Gene/P • Near neighbor: < 1 µs • Worst case: 5 µs • Corresponding differences for MPI messages**Topology-aware mapping**• Problem: Given a object communication graph and a processor graph, find an optimal mapping • Minimizes communication • Ensure load balance • Metric for communication traffic • Hop-bytes = number of links (hops) traversed X message size**Machine Topology**• Information required at runtime • No. of processors in the allocated partition • No. of processors along each dimension • Physical coordinates of each processor**Communication Graph**• Static • 3D Stencil: regular communication graph • Dynamic • Molecular dynamics application • Changes as atoms migrate from one processor to another**Dynamic Graph - NAMD**• Molecular Dynamics (MD) application • Simulation box is a 3D cell full of atoms**Load Balancing in NAMD**• Measurement-based (Charm++) • Principle of persistence • Patches are statically mapped • Orthogonal recursive bisection • Computes can be migrated • Load balancing framework gathers the communication information • Goal • Minimize communication • Maximize load balance**Old strategy**• Greedy approach • Pick the heaviest compute • Place it on a processor with one of the patches OR • On a processor which already has a compute for this patch**Hop-bytes**~17 %**Future Work**• Reason for contention • Heavy communication exceeding bandwidth • Link contention (such as in deterministic routing) • Use UPC/PAPI on Blue Gene/L and P**Future Work**• Automatic Mapping • Initial Static Mapping • Use case – meshing applications • Extend work on the Charm++ load balancers • Section-multicast aware load balancers • Useful in matrix multiplication**Future Work**• Optimization on other topologies • SiCortex (Kautz Graph) • Infiniband clusters (Fat-tree)**Summary**• Topology mapping helps! • Especially heavily communication bound applications • Static mapping • Dynamic mapping during load balancing • Automatic mapping to relieve the user

More Related