Learning for Optimizing Compilers

John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst Learning for Optimizing Compilers

Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid evaluation Generating heuristics manually is slow, complicated, and ad hoc. Motivation

Propose Supervised Learning • Induces heuristics automatically • Training examples • a,b,c,…,z label • a,b,c,…z : properties of problem • label : proper decision to make • Two objectives: • Minimize error • Prefer less complicated function • LOCO (Learning for Optimizing COmpilers)

Benefits of Supervised Learning • Heuristic construction sped up • Determines relative importance of features • Effective heuristics • Comparable to hand-tuned heuristics • Theoretically sound • Traditional approach ad hoc

What Order to Apply Optimizations Phase-ordering heuristics When to Optimize Filters Which Optimization Algorithm to Apply Hybrid Optimizations How to Optimize Priority Functions Taxonomy of Compiler Heuristics

The LOCO Methodology • Determine class of heuristic • Generate raw data • Instrument compiler • Process raw data • Thresholds • Generates training data • Induce heuristic • Integrate into compiler

The LOCO Methodology LOCO Training Set Instrumented Compiler Supervised Learning Production Compiler Generate raw learning data Ruleinduction Processrawdata (Thresholding) Inducesheuristic

Experimental Setup • Java JIT compiler • Jikes RVM 2.0.2 • PowerPC 533 MHz G4, model 7410 • Case Study 1: SPEC JVM benchmarks • Case Study 2: Scientific benchmarks • Scheduling improves by 4% or more

Case Study 1 Hybrid Register Allocation

Motivation • Register Allocation: important • Effective use of registers • Different Algorithms to choose from • Graph coloring: possibly expensive • Linear scan: not always effective • Which algorithm to apply?

Solution • Features predict which algorithm to use • Heuristic function controls allocator • Reduces cost significantly • Retains most benefit • Successful with simple features • Applicable to other optimizations

Hybrid Register Allocation

Features of Methods

Hybrid Register Allocation

Inducing Heuristic Controller • For each block generate raw training data • Features of method • Additional spills incurred • Cost of allocation algorithms • Process raw data to generate training set • Leave-one-out cross-validation • Output of LOCO = heuristic controller

Labeling Training Instances • Two factors: • Cost of register allocation • Spill benefit of different allocators • Prefer graph coloring • If benefit above threshold • Prefer linear scan • If graph coloring cost above threshold • No spill benefit

Motivation for Threshold Technique • Noise reduction technique • Simplifies learning • Removes cases of fine distinction • Separation by a threshold gap • For example: • T=10% model estimates improvement by 10%

Thresholding Linear Scan Graph Coloring No Instance Spill Threshold(8192) Cost Threshold (0.5)

Labeling Training Instances If (LS_Spill – GC_Spill > Spill_Threshold) Print “GC”; Else If (LS_Cost/GC_Cost > Cost_Threshold) Print “LS”; Else if (LS_Spill – GC_Spill <= 0) Print “LS”; Else { // No Label } High Spill Benefit High Cost No Spill Benefit Skip Training Instance

ThresholdExample

Spill Loads(Opt Level 3, 8 Regs)

Benchmark Running Times(Opt Level 3, 8 Regs)

Register Allocation Stats(Opt Level 3, 8 Regs)

Register Allocation Cost(Opt Level 3, 8 Regs)

Significantly reduce register allocation time Reduced allocation time by 60% Preserve benefit of graph coloring Achieved 93% of graph coloring benefit LOCO effective for this heuristic Hybrid Register Allocation is Successful

Case Study 2: Instruction Scheduling Filters

Motivation • Instruction scheduling: important • Improvements over 15% • But: • Expensive • Frequently not beneficial • Problem: Can we predict which blocks benefitfrom scheduling?

Solution • Features of block predict when to schedule • Heuristic controls scheduling • Reduces cost of scheduling • Retains benefit of scheduling • Successful with simple features • Filter for applying scheduler

An Optimization Filter

Features of Block

Construct cheap-to-compute features of a block Obtain training instances that include: Features of the block Labels (Scheduling benefit to block) Induce a filter using LOCO We used rule induction Use the filter to control when compiler schedules Inducing a Filter

Block Timing Estimator • Estimate of cycles to execute block • Simple model of real machine • Determines cost of block in isolation • Relative cycle differences important • Not absolute cycle counts

Labeling using Thresholds

Running Time with Filtering

Scheduling Time with Filtering

Filtering Statistics

Significantly reduce scheduling time Reduced scheduling time by 75% Preserve benefit of scheduling Achieved 93% of scheduling benefit LOCO effective for this heuristic Filters are Successful

Supervised learning Loop-unrolling and tiling Genetic algorithms Hyperblocks, reg allocation, prefetching (MIT) Application-specific compilation strategy (Rice) Reinforcement learning Used to induce heuristic for scheduling (UMass) We argue LOCO is better Related Work

More work on filters Inlining and SSA-based opts More work on hybrid optimizations Garbage collection More work on priority functions Register allocation spill heuristic Use LOCO anywhere a heuristic is used Future Work

LOCO effective at constructing heuristics Faster than most alternatives LOCO can lead to insights More readable than other alternatives LOCO heuristics competitive Comparable to hand-tuned heuristics LOCO easier to use Conclusion

Spill Loads(Opt Level 1, 8 Regs)

Register Allocation Cost(Opt Level 1, 8 Regs)

Benchmark Running Times (Opt Level 1, 8 Regs)

Learning for Optimizing Compilers

Learning for Optimizing Compilers

Presentation Transcript

Optimizing and Learning for Super-resolution

Optimizing Compilers CISC 673 Spring 2011 Inlining

Optimizing Compilers CISC 673 Spring 2009 Control Flow

Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM

Blending Assessments for Optimizing Learning

Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

Optimizing Lifelong Learning

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Optimizing Compilers

Optimizing Compilers CISC 673 Spring 2009 Data flow analysis

Optimizing Compilers CISC 673 Spring 2011 Register Allocation

Optimizing Distributed Learning Models:

Optimizing Compilers CISC 673 Spring 2009 Feedback Directed Optimization

Interprocedural Symbolic Range Propagation for Optimizing Compilers

Optimizing Compilers CISC 673 Spring 2011 Data flow analysis

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Optimizing Compilers CISC 673 Spring 2011 Register Allocation

Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

Optimizing Compilers CISC 673 Spring 2011 Overview of Compilers and JikesRVM

Optimizing the Learning Environment

Interprocedural Symbolic Range Propagation for Optimizing Compilers

Optimizing Compilers CISC 673 Spring 2009 More Control Flow