1 / 25

NAVO MSRC PET Program Towards More Meaningful Machine Comparisons

NAVO MSRC PET Program Towards More Meaningful Machine Comparisons. Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader www.sdsc.edu/PMaC SDSC. PMaC Mission. To bring scientific rigor to the art or performance prediction for procurement

osanna
Download Presentation

NAVO MSRC PET Program Towards More Meaningful Machine Comparisons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NAVO MSRC PET ProgramTowards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC (Performance Modeling & Characterization) Group Leader www.sdsc.edu/PMaC SDSC

  2. PMaC Mission • To bring scientific rigor to the art or performance prediction • for procurement • for architectural tradeoffs • for guiding applications to best-suited machine • for performance tuning

  3. PMaC Mission • To bridge the gap between benchmarks and cycle-accurate simulation • Benchmarks have dubious relevancy to real apps, particularly on future machines • Cycle-accurate simulations take too long

  4. Projects • MAPS (Memory Access Patterns) • memory subsystem & interconnect signatures • MetaSim • an on-the-fly simulator for playing “what if?” (4 orders of magnitude faster than cycle-accurate simulation) • Pseudocode Cache Simulator • Scientific Application Loop Set • Terascale Application Information • IDC HPC List

  5. People • Dr. Allan Snavely, Group Leader • Dr. Laura Carrington, Xiaofeng Gao (MAPS) • Dr.Stuart Johnson (Pseudocode simulator) • Dr. Larry Carter (senior technical advisor) • Dr. Wayne Pfeiffer (Scientific Application Loop Set) • Nicole Wolter (Paraver/Dimemas) • Dr. Bob Leary (resident mathemeticain)

  6. What’s wrong with benchmarks? • May anti-correlate to actual performance1 1: Conventional Benchmarks as a Sample of the Performance Spectrum John L. Gustafson, Rajat Todi Ames Laboratory, USDOE

  7. PMaC Methods • Performance modeling via separation of concerns • Machine signatures • Application profiles • Convolution methods

  8. L1 8192 word 128 way 16 block TLB 131072 word 4KB pages 2 way L2 1048576 word 4 way 16 block

  9. MAPS • Useful in its own right for more meaningful machine comparisons at a glance • Work going forward to port to Compaq TCS1, SX-5, T90, Sv1, MTA, Sun HPC 10K, Origin, others? • Provides input to MetaSim (next)

  10. Meta-SimA meta-simulator tool

  11. Meta-Sim • Takes 2 inputs • a program • a description of a machine • Consumes instrumented trace data “on-the-fly” • 100 fold slowdown (as opposed to 1M fold!) • Performs an automated predictive convolution

  12. Meta-Sim • Models caches and TLB • any number of levels • arbitrary sizes, line lengths, associativities • Does accounting on the Basic Block level • Looks for memory access patterns

  13. A (simplistic) Convolution i=1  = Wt. BB Rate BB Intensity BB * * MFLOPS i i i n Wt. BB = % of total memory references i Rate BB = sustained rate of memory references i Intensity BB = ratio of floating point ops to memory ops i

  14. How to determine rate of memory access for BB? • sum = sum + a(k)*b(colidx(k)) • Even if only 33% of memory references in a BB fall out to MM, they may slow down the whole BB to the speed of MM accesses • Why?

  15. Results

  16. Results

  17. Occam’s Razor • Only add complexity if required to explain observed phenomena • Observation - this approach just as accurate as SMTSIM (Tullsen, Snavely, et al) but 4 orders of magnitude faster!

  18. Conventional Benchmarks as a Sample of the Performance Spectrum

  19. Apps Results

  20. Apps Results

  21. Apps as a Sample of the Performance Spectrum (?)

  22. Work going forward • Development of probes ala MAPS for floating point and integer functional unit issue, logical operations, I/O • Increase sophistication of convolutions as required to fit observed facts • Big goal; a robust set of metrics and methods for performance modeling and characterization

  23. PMaC Thanks Our Sponsors • Now includes DOE SciDac award (SUPREME) • Support from HPC Users Forum • DoD HPC Modernization was 1st to fund us and their vision made this work possible

More Related