1 / 35

Efficient HPC Data Motion via Scratchpad Memory

Efficient HPC Data Motion via Scratchpad Memory. Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua Peraza, Pietro Cicotti, Laura Carrington. Question 1 Do HPC workloads benefit from software managed Scratchpads? YES! If, so how will we manage it?. Outline. Motivation

dacey
Download Presentation

Efficient HPC Data Motion via Scratchpad Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua Peraza, Pietro Cicotti, Laura Carrington

  2. Question 1Do HPC workloads benefit from software managed Scratchpads? YES! If, so how will we manage it?

  3. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  4. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  5. Problem: HPC Powerwall • Can't scale old systems • Powerwall already reached by petaflop systems • Must redesign for power savings • Efficiency must increase by 2x Source: Exascale Report (Kogge, 2008)

  6. How to get Energy Savings • Redesign Hardware • Simpler hardware • Transfer complexity to software • Minimize expensive data movement • Memory slower • More cores=more contention • HPC codes have large working set sizes

  7. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  8. What is a Scratchpad? • Scratchpad (SPM)? • Local memory (like a cache) • SPM: software allocated memory • Simpler Hardware Tagging Array Memory Array Decoder Memory Array VS Decoder

  9. Scratchpad Allocation • Dynamic • Move block of code • Iterate over code • Move another block • Static: Move block of code once • Strategies • Knapsack • Graph Coloring • register allocation problem

  10. The Idea: Less Data Movement • Scratchpad saves energy • Allocation burden now on software • Less complexity on hardware • Move only what you use • Uses temporal locality • Cache • Spatial locality can fail: Superfluous data movement(Spatial locality is built into cache design – note the 8-word linesize in most architectures) A B C D E Moved into Cache

  11. Implication of Scratchpads • Current use: Embedded Systems • Smaller working set size • Predictable code • GPU's • Coding overhead • Issue: HPC codes • Large unpredictable codes • How to generalize codes? • How to make it practical and efficient

  12. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  13. Question 2Are there computation patterns which get the most benefit from SPM?

  14. Why idioms? • Pattern of computation/memory access • Characterize Application Data Movement • Metric to compare different scientific codes (good coverage) • Easy to port HPC Code

  15. The Methodology • Idiom characterization study: idioms SPM vs. Cache favorability • Find idioms on HPC codes • Port SPM favorable idioms in HPC codes to scratchpad

  16. Tool: PEBIL Executable Binary • Binary instrumentation tool • Executable Binary => Identify Basic Blocks => Cache Simulation • Cache Simulator built on top of PEBIL • User Defined Cache Structures • Profiles executables (hit/miss) Stage 1 A op B A=b+3 ….. Block1 Block2 PEBIL Output Stage 2 Block 1 {#hits} {#misses} Block 2 {#hits} {#misses} ……. Cache Block1 Block2

  17. Simulation Environment

  18. Cache/SPM only Executable Binary Stage 1 Block1 Block2 Stage 2 Cache SPM Block1 Block1 Block2 Block2

  19. Hybrid System Executable Binary Stage 1 Block1 Block2 Stage 2 Hybrid SPM Cache Block1 Block2

  20. Tool: PIR (find Idioms in HPC) • Used for: automatically identifies idioms in large-scale HPC applications • Input: Idioms.txt • Idioms are defined using a pattern language • Output: • Idioms matched to source line number Gather Loop1 Transpose Loop2

  21. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  22. Under the hood: HPC Results • Under the hood: HPC ResultsFundamental question: Is there a benefit of SPM for HPC codes? • Simulate full apps on cache and SPM • Use simple heuristic to define the mappings • Simulate on hybrid • Pitfalls: • Sometime SPM moves more than cache: LRU

  23. Metrics Data Moved=(Cache Misses)*Cache Line Size Data Movement Ratio (SPM Data Movement) (Cache Data Movement)

  24. HPC Applications • Graph500 • Construct and traverse weighted undirected graph • HYCOM • Ocean model: hybrid isopycnal-sigma-pressure, generalized coordinate • SMG2000 • Parallel semi-coarsening Multi-grid Solver • Sequoia Benchmarks • SPHOT • Monte Carlo photon transport code • UMT • Unstructured-mesh deterministic radiation transport code • AMG2006 • Algebraic mult-grid linear system solver for unstructured mesh

  25. HPC Results

  26. Question 1Do HPC workloads benefit from software managed Scratchpads?YES!

  27. Idiom Gather/Scatter

  28. Using Methodology for HYCOM • Gather Idiom: Prefers SPM • Find gather in HYCOM: 33 instances • Port Idiom Blocks: Hybrid Structure • Port Gather Basic Blocks to SPM • Rest on Cache Result HYCOM (Ocean Modeling Code) Savings: 20% in data motion

  29. Outline • Motivation • Scratchpad Background • Simulation Framework and Methodology • Initial Study • Current Direction

  30. Real SPM for PEBIL? • Extension of PEBIL Simulator • Fully associative cache • Rethink replacement policy • Dynamic Allocation Scheme • Idioms determine loops for allocation • Reuse distance library • Track how often used • Track distance of use Reuse Distance = 2 A B C A

  31. Results Summary • SPM • Simpler Hardware • Efficient Data Movement • Developed Methodology for SPM • Idiom characterization • Idiom identification in HPC codes • Port SPM hotspots • 20% Data Movement Savings for HYCOM • Scratchpad shows potential • Good when spatial locality fails • HPC applications • SPM only: Average 22% Data Movement Saved • Hybrid: Average 39% Max 69% Data Movement Saved • 4x Improvement for Gather idiom • Current work on creating SPM for PEBIL

  32. Acknowledgements • AcknowledgementsPMaC team • Laura Carrington • Ananta Tiwari • Michael Laurenzano • Pietro Cicotii • Mitesh Meswani • Dedicated to: Allan Snavely

  33. EXTRA

  34. Idioms: Strided Access i=i+stride

  35. Looking Forward • Idiom Driven Allocation • PIR-determines loops for allocation • Pre-Allocated array for SPM • Pointers to loops: trigger replacement • Mimic Dynamic Compiler Replacement Policy

More Related