1 / 20

Framework for Profile-Analysis Data-Layout Optimizations

Framework for Profile-Analysis Data-Layout Optimizations. Shai Rubin. Ras Bodik. Trishul Chilimbi. University of Wisconsin. University of Wisconsin. Microsoft Research. DL Optimization. Data Layout Optimization (What). References sequence: A.x, B, A.z. Original data layout.

varuna
Download Presentation

Framework for Profile-Analysis Data-Layout Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Framework for Profile-Analysis Data-Layout Optimizations Shai Rubin Ras Bodik Trishul Chilimbi University of Wisconsin University of Wisconsin Microsoft Research

  2. DL Optimization Data Layout Optimization (What) References sequence: A.x, B, A.z Original data layout Modified data layout CPU cache blocks cache blocks 1 cycle 4 4 Cache B B 3 3 2 2 A.z 1 A.x 1 A.x A.z A.x A.z time 102 cycles time A.x B A.z A.x B A.z A.x B A.z A.x B A.z Memory Pages Memory Pages Memory B 2 2 A A A A A 1 1 B B B 106 cycles time A.x B A.z A.x B A.z time A.x B A.z A.xBA.z DL optimization: increase spatial locality of data to prevent memory faults. Disk

  3. Optimal Layout “Good” Layout Data Layout Optimization (How) Layout Space Reference Summary Data Layout Optimizer Optimal for simple loops Heuristic Array Dep. Analysis (static) Ref. Trace (dynamic) Data Layout Enforce layout Program′ Program Compile Time 1. Compile Time 2. Runtime Scientific (array based) General purpose (pointer based)

  4. Problems with Current Data-Layout Optimization • Computationally hard to find the optimal layout [Petrank]. • Computationally hard to approximate the optimal layout [Petrank]. • Implication - heuristics are not robust: • will not work for all programs. • From our experience with heuristics: • Field Reordering [Chilimbi PLDI’99] – no improvement (on perl). • Custom Memory Allocator [Seidl ASPLOS’98] degrades performance (on espresso). • Our approach: replace heuristic with feedback-driven search.

  5. Current program data layout “Good” + “easy” to enforce layouts Optimal data layout Searching For a Data Layout • Problem: Perform a search in the data layout space. • Look for: Data Layout Space • a “good” layout. “Good” Layouts • an “easy” to enforce layout. • Search advantage: • Robust, for each program finds a “good” layout.

  6. Is Search Practical? • Not clear: Possible layouts Reference Trace Optimizer (Heuristic) Data Layout Enforce Enforce layout Edit Compile Execute Evaluate Continue? End

  7. Outline • Background and Problem Definition • Search is a solution, but may not practical • Making the search practical • Applications • Summary

  8. Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data Layout Optimization Making the Search Practical Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Trace T Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST Search Strategy T Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout AL(DL,CST)NT Evaluate Simulate(NT)B Continue(B) Continue? Edit Compile Execute Evaluate New Trace T Benefit End

  9. Trace Representation • Problem: reference trace cannot be easily manipulated since it is too large (>10GB, >100M references). • Solution: compressed trace (using modified SEQUITUR). • Example: SEQUITUR Representation SacBBBAAe Bbc ACC Cbd • Trace: acbcbcbcbdbdbdbde • Representation advantage: • Compact; fits into main memory [ChilimbiPLDI’01]. • Expose repetitions (we use this later). • It produces a symbolic trace (i.e., a terminal is a data object).

  10. Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data-Layout Optimization Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST  Search Strategy Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout EL(DL,CST)CST’ Evaluate Simulate(NT)B Continue(B) Continue? Compile New Trace Benefit End

  11. Avoid re-compilation • Problem: data layout evaluation  (edit+compilation+simulation). • Solution: “pretend” that the program was edited and compiled. • Symbolic trace + data layout  concrete address trace. A.x10 A.z14 B20 A.x30 A.z34 B20 User (Optimizer) Compile Run (simulate) Edit program Simulate Enforce Layout 30,20,34,20 30,20,34,20 New concrete trace A.x, B, A.z, B Single symbolic trace • Simple, but crucial for an efficient search.

  12. Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data-Layout Optimization Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST  Search Strategy Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout EL(DL,CST)CST’ Evaluate Simulate(CST’)B  Continue(B) Continue? Compile New Trace Benefit End

  13. Memoization: Efficient Trace Simulation • Evaluation using simulation: MissRateT=Simulate(T); • Problem: simulation of the whole trace (T) is too expensive. • Solution: avoids re-simulation of repeated sub-traces. • Memoization: • Simulate each “low level” rule, compute its memoization value. • For cache simulation: memoization value = CacheState [CS]. • Recursively compose memoization values for “higher” rules. SEQUITUR Representation SBBBAA Bbc ACC Cbd CSC=Simulate′(C) CSB=Simulate′(B) CSA = CSCCSC CSS = CSBCSBCSBCSACSA T: bcbcbcbdbdbdbd MissRateT =

  14. Outline • Background and Problem Definition • Search is a solution, but maybe not feasible • Making the search practical: • Trace representation • Avoid recompilation • Efficient simulation • Applications • Summary

  15. Framework Application (1) • Application: an implementation of the framework that searches in a sub-space of the layout space. • Field Reordering: • Objective: reduce number of cache misses. • Sub-space: all possible (legal) orders of fields in (heap) objects. • Our search strategy: (almost) exhaustive search.

  16. Field Reordering: Exhaustive Search • We compared: • Best field order found by our iterative search. • Field orders produced by existing heuristics: • Fields Temporal Affinity [ChilimbiPLDI’99] • Fields Access Frequency [TruongPACT’98]. Runtime improvement: 0%-4.5%.

  17. Custom Memory Allocator (CMA) • Objective: reduce number of page faults. Allocator 2 Allocator 1 Reference trace: ABABA address address Page 2 Page 2 B B Page 1 Page 1 A A A A B A B A time time Poor locality Good locality • CMA can work well if it has a good placement function: • assigns dynamically allocated heap objects to memory pages (heaps).

  18. Size size24 size<24 1 2 CMA Placement Function (PF) malloc(size s){ } PF: Map objects to heaps PF(heap object)int • How we can find a placement function using our framework? • A placement function defines a data layout. • Learn by measuring the benefits of its data layout. • How: use a learning algorithm. Decision Tree Learner Profiling Information Profile(Heap objects) runtime attributes PF(Attributes)int Learner Use Framework to Evaluate PF

  19. CMA Results 1Relative to original working set size.

  20. Contributions and Future Work • Formulate data layout optimization as a search process. • Build a framework for efficient search process. • Improve existing optimizations; enable new optimizations. • Framework limitations: • Difficult to handle very large traces (>0.5B references). • Requires some guidance from the programmer (search strategy). • Future work • Advanced search strategies that combine several optimizations. • Other non-data-layout optimization – prefetching.

More Related