improving compiler heuristics with machine learning n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Improving Compiler Heuristics with Machine Learning PowerPoint Presentation
Download Presentation
Improving Compiler Heuristics with Machine Learning

Loading in 2 Seconds...

play fullscreen
1 / 46

Improving Compiler Heuristics with Machine Learning - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Improving Compiler Heuristics with Machine Learning. Mark Stephenson Una-May O’Reilly Martin C. Martin Saman Amarasinghe Massachusetts Institute of Technology. System Complexities. Compiler complexity Open Research Compiler ~3.5 million lines of C/C++ code Trimaran’s compiler

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Improving Compiler Heuristics with Machine Learning' - elisabeth


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
improving compiler heuristics with machine learning

Improving Compiler Heuristics with Machine Learning

Mark Stephenson

Una-May O’Reilly

Martin C. Martin

Saman Amarasinghe

Massachusetts Institute of Technology

system complexities
System Complexities
  • Compiler complexity
    • Open Research Compiler
      • ~3.5 million lines of C/C++ code
    • Trimaran’s compiler
      • ~ 800,000 lines of C code
  • Architecture Complexity
np completeness
NP-Completeness
  • Many compiler problems are NP-complete
    • Thus, implementations can’t be optimal
  • Compiler writers rely on heuristics
    • In practice, heuristics perform well
    • …but, require a lot of tweaking
  • Heuristics often have a focal point
    • Rely on a single priority function
priority functions
Priority Functions
  • A heuristic’s Achilles heel
    • A single priority or cost function often dictates the efficacy of a heuristic
  • Priority functions rank the options available to a compiler heuristic
    • List scheduling (identifying instructions in worklist to schedule first)
    • Graph coloring register allocation (selecting nodes to spill)
    • Hyperblock formation (selecting paths to include)
  • Any priority function is legal
our proposal
Our Proposal
  • Use machine-learning techniques to automatically search the priority function space
    • Increases compiler’s performance
    • Reduces compiler design complexity
qualities of priority functions
Qualities of Priority Functions
  • Can focus on a small portion of an optimization algorithm
    • Don’t need to worry about legality checking
  • Small change can yield big payoffs
  • Clear specification in terms of input/output
  • Prevalent in compiler heuristics
an example optimization hyperblock scheduling
An Example OptimizationHyperblock Scheduling
  • Conditional execution is potentially very expensive on a modern architecture
  • Modern processors try to dynamically predict the outcome of the condition
    • This works great for predictable branches…
    • But some conditions can’t be predicted
  • If they don’t predict correctly you waste a lot of time
example optimization hyperblock scheduling

Misprediction

Example OptimizationHyperblock Scheduling

Assume a[1] is 0

time

if (a[1] == 0)

else

resources

example optimization hyperblock scheduling1
Example OptimizationHyperblock Scheduling

Assume a[1] is 0

time

if (a[1] == 0)

else

resources

example optimization hyperblock scheduling using predication
Example OptimizationHyperblock Scheduling (using predication)

Assume a[1] is 0

time

if (a[1] == 0)

else

resources

Processor simply discards results of instructions that weren’t supposed to be run

example optimization hyperblock scheduling2
Example OptimizationHyperblock Scheduling
  • There are unclear tradeoffs
    • In some situations, hyperblocks are faster than traditional execution
    • In others, hyperblocks impair performance
  • Many factors affect this decision
    • Accuracy of branch predictor
    • Availability of parallel execution resources
    • Effectiveness of the compiler’s scheduler
    • Parallelizability and predictability of the program
  • Hard to model
example optimization hyperblock scheduling mahlke
Example OptimizationHyperblock Scheduling [Mahlke]
  • Find predicatable regions of control flow
  • Enumerate paths of control in region
    • Exponential, but in practice it’s okay
  • Prioritize paths based on four path characteristics
    • The priority function we want to optimize
  • Add paths to hyperblock in priority order
trimaran s priority function

Favor frequently

executed paths

Favor short paths

Penalize paths with hazards

Favor parallel paths

Trimaran’s Priority Function
our approach
Our Approach
  • Trimaran uses four characteristics
  • What are the important characteristics of a hyperblock formation priority function?
  • Our approach: Extract all the characteristics you can think of and let a learning technique find the priority function
genetic programming

*

/

-

predictability

num_ops

2.3

4.1

Genetic Programming
  • Searching algorithm analogous to Darwinian evolution
    • Maintain a population of expressions
genetic programming1
Genetic Programming
  • Searching algorithm analogous to Darwinian evolution
    • Maintain a population of expressions
    • Selection
      • The fittest expressions in the population are more likely to reproduce
    • Reproduction
      • Crossing over subexpressions of two expressions
    • Mutation
general flow
General Flow
  • Randomly generated initial population seeded with the compiler writer’s best guess

Create initial population

(initial solutions)

Evaluation

done?

Selection

Create Variants

general flow1
General Flow
  • Compiler is modified to use the given expression as a priority function
  • Each expression is evaluated by compiling and running the benchmark(s)
  • Fitness is the relative speedup over Trimaran’s priority function on the benchmark(s)

Create initial population

(initial solutions)

Evaluation

done?

Selection

Create Variants

general flow2
General Flow
  • Just as with Natural Selection, the fittest individuals are more likely to survive

Create initial population

(initial solutions)

Evaluation

done?

Selection

Create Variants

general flow3
General Flow
  • Use crossover and mutation to generate new expressions
  • And thus, generate new compilers

Create initial population

(initial solutions)

Evaluation

done?

Selection

Create Variants

experimental setup
Experimental Setup
  • Collect results using Trimaran
    • Simulate a VLIW machine
      • 64 GPRs, 64 FPRs, 128 PRs
      • 4 fully pipelined integer FUs
      • 2 fully pipelined floating point FUs
      • 2 memory units (L1:2, L2:7, L3:35)
  • Replace priority functions in IMPACT with our GP expression parser and evaluator
outline of results
Outline of Results
  • High-water mark
    • Create compilers specific to a given application and a given data set
    • Essentially partially evaluating the application
  • Application-specific compilers
    • Compiler trained for a given application and data set, but run with an alternate data set
  • General-purpose compiler
    • Compiler trained on multiple applications and tested on an unrelated set of applications
training the priority function
Training the Priority Function

A.c

B.c

C.c

D.c

Compiler

1

2

A

B

C

D

hyperblock results application specific compilers high water mark

1.23

Hyperblock ResultsApplication-Specific Compilers (High-Water Mark)

3.5

Training input

Novel input

3

(add (sub (cmul (gt (cmul $b0 0.8982 $d17)…$d7)) (cmul $b0 0.6183 $d28)))

2.5

(add (div $d20 $d5) (tern $b2 $d0 $d9))

2

Speedup

1.5

1.54

1

0.5

0

toast

Average

huff_dec

huff_enc

rawdaudio

mpeg2dec

rawcaudio

g721encode

g721decode

129.compress

hyperblock results general purpose compiler

3.5

Train data set

Novel data set

3.0

2.5

2.0

Speedup

1.5

1.44

1.25

1.0

0.5

0.0

toast

codrle4

Average

huff_enc

huff_dec

decodrle4

mpeg2dec

rawdaudio

rawcaudio

124.m88ksim

g721decode

g721encode

129.compress

Hyperblock ResultsGeneral-Purpose Compiler
validation of generality testing general purpose applicability

1.6

1.4

1.2

1.09

1.0

Speedup

0.8

0.6

0.4

0.2

0.0

art

rasta

djpeg

130.li

unepic

osdemo

mipmap

085.cc1

132.ijpeg

052.alvinn

Average

147.vortex

023.eqntott

Validation of GeneralityTesting General-Purpose Applicability
running time
Running Time
  • Application specific compilers
    • ~1 day using 15 processors
  • General-purpose compilers
    • Dynamic Subset Selection [Gathercole]
      • Run on a subset of the training benchmarks at a time
    • Memoize fitnesses
    • ~1 week using 15 processors
  • This is a one time process!
    • Performed by the compiler vendor
gp hyperblock solutions general purpose
GP Hyperblock SolutionsGeneral Purpose

(add

(sub (mulexec_ratio_mean 0.8720) 0.9400)

(mul 0.4762

(cmul (not has_pointer_deref)

(mul 0.6727 num_paths)

(mul 1.1609

(add (sub

(mul (divnum_opsdependence_height) 10.8240)

exec_ratio)

(sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838)

(sub 1.1039 num_ops_max))

(sub (muldependence_height_mean num_branches_max) num_paths)))))))

Intron that doesn’t affect solution

gp hyperblock solutions general purpose1
GP Hyperblock SolutionsGeneral Purpose

(add

(sub (mulexec_ratio_mean 0.8720) 0.9400)

(mul 0.4762

(cmul (not has_pointer_deref)

(mul 0.6727 num_paths)

(mul 1.1609

(add (sub

(mul (divnum_opsdependence_height) 10.8240)

exec_ratio)

(sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838)

(sub 1.1039 num_ops_max))

(sub (muldependence_height_mean num_branches_max) num_paths)))))))

Favor paths that don’t have pointer dereferences

gp hyperblock solutions general purpose2

Favor highly parallel (fat) paths

GP Hyperblock SolutionsGeneral Purpose

(add

(sub (mulexec_ratio_mean 0.8720) 0.9400)

(mul 0.4762

(cmul (not has_pointer_deref)

(mul 0.6727 num_paths)

(mul 1.1609

(add (sub

(mul (divnum_opsdependence_height) 10.8240)

exec_ratio)

(sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838)

(sub 1.1039 num_ops_max))

(sub (muldependence_height_mean num_branches_max) num_paths)))))))

gp hyperblock solutions general purpose3

If a path calls a subroutine that may have side effects, penalize it

GP Hyperblock SolutionsGeneral Purpose

(add

(sub (mulexec_ratio_mean 0.8720) 0.9400)

(mul 0.4762

(cmul (not has_pointer_deref)

(mul 0.6727 num_paths)

(mul 1.1609

(add (sub

(mul (divnum_opsdependence_height) 10.8240)

exec_ratio)

(sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838)

(sub 1.1039 num_ops_max))

(sub (muldependence_height_mean num_branches_max) num_paths)))))))

our proposal1
Our Proposal
  • Use machine-learning techniques to automatically search the priority function space
    • Increases compiler’s performance
    • Reduces compiler design complexity
eliminate the human from the loop
Eliminate the Human from the Loop
  • So far we have tried to improve existing priority functions
    • Still a lot of person-hours were spent creating the initial priority functions
  • Observation: the human-created priority functions are often eliminated in the 1st generation
  • What if we start from a completely random population (no human-generated seed)?
another example register allocation
Another ExampleRegister Allocation
  • An old, established problem
    • Hundreds of papers on the subject
  • Priority-Based Register Allocation [Chow,Hennessey]
    • Uses a priority function to determine the worth of allocating a register
  • Let’s throw our GP system at the problem and see what it comes up with
register allocation results general purpose compiler
Register Allocation ResultsGeneral-Purpose Compiler

1.20

Train data set

Novel data set

1.15

1.10

1.05

Speedup

1.03

1.03

1.00

0.95

0.90

average

huff_enc

huff_dec

rawdaudio

rawcaudio

mpeg2dec

g721encode

g721decode

129.compress

validation of generality testing general purpose applicability1
Validation of GeneralityTesting General-Purpose Applicability

1.08

1.06

1.04

1.02

1.02

Speedup

1.00

0.98

0.96

0.94

130.li

djpeg

unepic

average

codrle4

085.cc1

132.ijpeg

decodrle4

147.vortex

023.eqntott

124.m88ksim

importance of priority functions speedup over a constant priority function
Importance of Priority FunctionsSpeedup over a constant priority function

2.20

Trimaran's function

GP's function

2.00

1.80

1.60

Speedup (over no function)

1.40

1.36

1.32

1.20

1.00

0.80

huff_enc

average

huff_dec

rawcaudio

rawdaudio

mpeg2dec

g721encode

g721decode

129.compress

advantages our system provides
Advantages our System Provides
  • Engineers can focus their energy on more important tasks
  • Can quickly retune for architectural changes
  • Can quickly retune when compiler changes
  • Can provide compilers catered toward specific application suites
    • e.g., consumer may want a compiler that excels on scientific benchmarks
related work
Related Work
  • Calder et al. [TOPLAS-19]
    • Fine tuned static branch prediction heuristics
    • Requires a priori classification by a supervisor
  • Monsifrot et al. [AIMSA-02]
    • Classify loops based on amenability to unrolling
    • Also used a priori classification
  • Cooper et al. [Journal of Supercomputing-02]
    • Use GAs to solve phase ordering problems
conclusion
Conclusion
  • Performance talk and a complexity talk
  • Take a huge compiler, optimize one priority function with GP and get exciting speedups
  • Take a well-known heuristic and create priority functions for it from scratch
  • There’s a lot left to do
why genetic programming
Why Genetic Programming?
  • Many learning techniques rely on having pre-classified data (labels)
    • e.g., statistical learning, neural networks, decision trees
  • Priority functions require another approach
    • Reinforcement learning
    • Unsupervised learning
  • Several techniques that might work well
    • e.g., hill climbing, active learning, simulated annealing
why genetic programming1
Why Genetic Programming
  • Benefits of GP
    • Capable of searching high-dimensional spaces
    • It is a distributed algorithm
    • The solutions are human readable
  • Nevertheless…there are other learning techniques that may also perform well
genetic programming reproduction

/

1.1

*

/

/

-

-

*

predictability

predictability

num_ops

num_ops

7.5

7.5

2.3

2.3

4.1

4.1

branches

Genetic ProgrammingReproduction

*

/

*

1.1

branches