1 / 15

Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks

Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks. Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002. Mapping/Adapting Distributed Applications on Networks. Model. Data. Sim 2. Vis.

meadow
Download Presentation

Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002

  2. Mapping/Adapting Distributed Applications on Networks Model Data Sim 2 Vis Sim 1 Pre Stream ? Application Network

  3. m-1 m-3 Automatic node selection Select 4 nodes for execution : Choice is easy m-6 Congested route Busy nodes m-7 m-5 m-8 m-4 selected nodes Compute nodes Routers m-2

  4. m-1 m-3 Automatic node selection Select 5 nodes: choice depends on application m-6 Congested route Busy nodes m-7 m-5 m-8 m-4 selected nodes Compute nodes Routers m-2

  5. Mapping/Adapting Distributed Applications on Networks Model Data Sim 2 Vis Pre Sim 1 Stream ? Application Network • Discover application characteristics and model performance in a shared heterogeneous environment • Discover network structure and available resources (e.g., NWS, REMOS) • Algorithms to map/remap applications to networks

  6. Methodology for Building Application Performance Signature Performance signature = model to predict application execution time under given network conditions • Execute the application on a controlled testbed • Measure system level activityduring execution • such as CPU, communication and memory usage • Analyze and discover program level activity(message sizes, sequences, synchronization waits) • Develop a performance signature • No access to source code/libraries assumed

  7. Discovering application characteristics Benchmarking on a controlled testbed and analysis Model as a Performance Signature Executable Application Code ethernet switch (crossbar) 100 Mbps links • capture patterns of CPU loads and traffic during execution 500MHz Pentium Duos

  8. Results in this paper Measure performance with resource sharing Executable Application Code Benchmarking on a controlled testbed ethernet switch (crossbar) 100 Mbps links • capture patterns of CPU loads and traffic during execution 500MHz Pentium Duos Demonstrate that measured resource usage on a testbed is a good predictor of performance on a shared network for NAS benchmarks

  9. Experiment Procedure • Resource utilization of NAS benchmarks measured on a dedicated testbed • CPU probes based on “top” and “vmstat” utility • Bandwidth using “iptraf”, “tcpdump”, SNMP queries • Performance of NAS benchmark measured with competing loads and limited bandwidth • Employ dummynet and NISTnet to limit bandwidth • All measurements presented are on 500MHz Pentium Duos, 100 Mbps network, TCP/IP, FreeBSD • All results on Class A, MPI, NAS Benchmarks

  10. Discovered Communication Structure of NAS Benchmarks 1 1 1 0 0 0 2 2 3 3 3 2 BT CG IS 1 1 1 0 0 0 2 2 2 3 3 3 LU MG SP 1 0 2 3 EP

  11. Performance with competing computation loads • Increase beyond 50% due to lack of coordinated (gang) scheduling and synchronization • Correlation between low CPU utilization and smaller increase in execution time (e.g. MG shows only ~60% CPU utilization) • Execution time is lower if least busy node has a competing load (20% difference in the busyness level for CG)

  12. Performance with Limited Bandwidth (reduced from 100 to 10Mbps) on one link Close correlation between link utilization and performance with a shared or slow link

  13. Performance with Limited Bandwidth (reduced from 100 to 10 Mbps) on all links Close correlation between total network traffic and performance with all shared or slow links

  14. Results and Conclusions (not the last slide) • Computation and communication patterns can be captured by passive, near non-intrusive, monitoring • Benchmarked resource usage pattern is a strong indicator of performance with sharing • strong correlation between application traffic and performance with low bandwidth links • CPU utilization during normal execution a good indicator of performance with node sharing Synchronization and timing effects were not dominant for NAS Benchnmarks

  15. Discussion and Ongoing Work (the last slide) • Capture application level data exchange pattern from network probes (e.g. MPI message sequence, sizes) • slowdown different for different message sizes • Infer the main synchronization/waiting patterns • Impact of unbalanced execution and lack of gang scheduling • Capture impact of CPU scheduling policy for accurate prediction with sharing • Policies try to compensate for waits Goal is to build a quantitative “performance signature” to estimate execution time under any given network conditions, and use it in a resource management prototype system

More Related