1 / 46

Learning for Optimizing Compilers

John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst. Learning for Optimizing Compilers. Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid evaluation

roddy
Download Presentation

Learning for Optimizing Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst Learning for Optimizing Compilers

  2. Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid evaluation Generating heuristics manually is slow, complicated, and ad hoc. Motivation

  3. Propose Supervised Learning • Induces heuristics automatically • Training examples • a,b,c,…,z label • a,b,c,…z : properties of problem • label : proper decision to make • Two objectives: • Minimize error • Prefer less complicated function • LOCO (Learning for Optimizing COmpilers)

  4. Benefits of Supervised Learning • Heuristic construction sped up • Determines relative importance of features • Effective heuristics • Comparable to hand-tuned heuristics • Theoretically sound • Traditional approach ad hoc

  5. What Order to Apply Optimizations Phase-ordering heuristics When to Optimize Filters Which Optimization Algorithm to Apply Hybrid Optimizations How to Optimize Priority Functions Taxonomy of Compiler Heuristics

  6. The LOCO Methodology • Determine class of heuristic • Generate raw data • Instrument compiler • Process raw data • Thresholds • Generates training data • Induce heuristic • Integrate into compiler

  7. The LOCO Methodology LOCO Training Set Instrumented Compiler Supervised Learning Production Compiler Generate raw learning data Ruleinduction Processrawdata (Thresholding) Inducesheuristic

  8. Experimental Setup • Java JIT compiler • Jikes RVM 2.0.2 • PowerPC 533 MHz G4, model 7410 • Case Study 1: SPEC JVM benchmarks • Case Study 2: Scientific benchmarks • Scheduling improves by 4% or more

  9. Case Study 1 Hybrid Register Allocation

  10. Motivation • Register Allocation: important • Effective use of registers • Different Algorithms to choose from • Graph coloring: possibly expensive • Linear scan: not always effective • Which algorithm to apply?

  11. Solution • Features predict which algorithm to use • Heuristic function controls allocator • Reduces cost significantly • Retains most benefit • Successful with simple features • Applicable to other optimizations

  12. Hybrid Register Allocation

  13. Features of Methods

  14. Hybrid Register Allocation

  15. Inducing Heuristic Controller • For each block generate raw training data • Features of method • Additional spills incurred • Cost of allocation algorithms • Process raw data to generate training set • Leave-one-out cross-validation • Output of LOCO = heuristic controller

  16. Labeling Training Instances • Two factors: • Cost of register allocation • Spill benefit of different allocators • Prefer graph coloring • If benefit above threshold • Prefer linear scan • If graph coloring cost above threshold • No spill benefit

  17. Motivation for Threshold Technique • Noise reduction technique • Simplifies learning • Removes cases of fine distinction • Separation by a threshold gap • For example: • T=10% model estimates improvement by 10%

  18. Thresholding Linear Scan Graph Coloring No Instance Spill Threshold(8192) Cost Threshold (0.5)

  19. Labeling Training Instances If (LS_Spill – GC_Spill > Spill_Threshold) Print “GC”; Else If (LS_Cost/GC_Cost > Cost_Threshold) Print “LS”; Else if (LS_Spill – GC_Spill <= 0) Print “LS”; Else { // No Label } High Spill Benefit High Cost No Spill Benefit Skip Training Instance

  20. ThresholdExample

  21. Spill Loads(Opt Level 3, 8 Regs)

  22. Benchmark Running Times(Opt Level 3, 8 Regs)

  23. Register Allocation Stats(Opt Level 3, 8 Regs)

  24. Register Allocation Cost(Opt Level 3, 8 Regs)

  25. Significantly reduce register allocation time Reduced allocation time by 60% Preserve benefit of graph coloring Achieved 93% of graph coloring benefit LOCO effective for this heuristic Hybrid Register Allocation is Successful

  26. Case Study 2: Instruction Scheduling Filters

  27. Motivation • Instruction scheduling: important • Improvements over 15% • But: • Expensive • Frequently not beneficial • Problem: Can we predict which blocks benefitfrom scheduling?

  28. Solution • Features of block predict when to schedule • Heuristic controls scheduling • Reduces cost of scheduling • Retains benefit of scheduling • Successful with simple features • Filter for applying scheduler

  29. An Optimization Filter

  30. Features of Block

  31. Construct cheap-to-compute features of a block Obtain training instances that include: Features of the block Labels (Scheduling benefit to block) Induce a filter using LOCO We used rule induction Use the filter to control when compiler schedules Inducing a Filter

  32. Block Timing Estimator • Estimate of cycles to execute block • Simple model of real machine • Determines cost of block in isolation • Relative cycle differences important • Not absolute cycle counts

  33. Labeling using Thresholds

  34. Running Time with Filtering

  35. Running Time with Filtering

  36. Running Time with Filtering

  37. Scheduling Time with Filtering

  38. Scheduling Time with Filtering

  39. Filtering Statistics

  40. Significantly reduce scheduling time Reduced scheduling time by 75% Preserve benefit of scheduling Achieved 93% of scheduling benefit LOCO effective for this heuristic Filters are Successful

  41. Supervised learning Loop-unrolling and tiling Genetic algorithms Hyperblocks, reg allocation, prefetching (MIT) Application-specific compilation strategy (Rice) Reinforcement learning Used to induce heuristic for scheduling (UMass) We argue LOCO is better Related Work

  42. More work on filters Inlining and SSA-based opts More work on hybrid optimizations Garbage collection More work on priority functions Register allocation spill heuristic Use LOCO anywhere a heuristic is used Future Work

  43. LOCO effective at constructing heuristics Faster than most alternatives LOCO can lead to insights More readable than other alternatives LOCO heuristics competitive Comparable to hand-tuned heuristics LOCO easier to use Conclusion

  44. Spill Loads(Opt Level 1, 8 Regs)

  45. Register Allocation Cost(Opt Level 1, 8 Regs)

  46. Benchmark Running Times (Opt Level 1, 8 Regs)

More Related