Learning A Better Compiler

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning

Predicting Unroll Factors • Loop Unrolling sensitive to unroll factor • Current solution: expert design • Difficult: Hand-tuned heuristics • Must be rewritten frequently • Predict parameters with machine learning • Easy: data collection takes ~1wk • No human time • Algorithm does not change with compiler

Loop Unrolling • Combines multiple iterations loop body • Fewer Iterations  Less Branching • Allows other transformations: • Exposes adjacent memory locations • Allows instruction reordering across iterations

Unroll Factors • How many iterations to combine? • Too few? • Provides little benefit • Too large • Increased cache pressure • Increase live rangeregister pressure

Optimal Unroll Factors

Classification Problems • Input a vector of features • E.g. nest depth, # of branches, # of ops • Output a class • E.g. unroll factor, 1-8 • No prior knowledge required • Meaning of features/classes • Relevance of features • Relationships between features

Nearest Neighbors • Paper describes Kernel Density Estimator • All dimensions normalized to [0,1] • Given a test point p: • Consider training points “close” to p • Within fixed distance, e.g. 0.3 • Majority vote among qualifying training points

Nearest Neighbors

Support Vector Machine • Assume two classes, easily generalized • Transform data • Make classes linearly separable • Find line to maximize sep. margin • For test point: • Perform transformation • Classify based on learned line

Maximal Margin

Non-Linear SVM

# operands Live range size Critical path length # operations Known tripcount # floating point ops Loop nest level # branches # memory ops Instruction fan-in in DAG # instructions Language: C, fortran # memory ops # Implicit instructions & more (38 total) Some Features

Results: No Software Parallelism

Results: With Software Parallelism

Big Idea: Easy Maintenance • Performance improvements modest • Sometimes worse, sometimes much better • Usually little change • Requires no re-tuning to change compiler • Gathering data takes ~1wk, no human time • General mechanism • Can be applied to all parameters • No model of system needed • Can be applied to new transformations where expert knowledge is unavailable

Integrated CPU and L2 Cache Voltage Scaling using Machine Learning

Dynamic Voltage Control • Monitor system • When activity is low, reduce power • Also reduces computational capacity • May need more energy if work takes longer

Multiple Clock Domains • Adjust separate components independently • Better performance/power • E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance • More complex DVM policy

Motivation • Applications go through phases • Frequency/voltages should change too • Focus on core, L2 cache • Consume large fraction of total power • Best policy may change over time • On battery: conserve power • Plugged in: maximize performance

Learning a DVM Policy • Compiler automatically instruments code • Insert sampling code to record perf. Counters • Instrument code only to gather data • Use machine learning to create policy • Implement policy in microcontroller

ML Parameters • Features • Clock cycles per instruction • L2 accesses per instruction • Memory access per instruction • Select voltage to minimize: • Total energy • Energy*delay

Machine Learning Algorithm • Automatically learn set of if-then rules • E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz • Compact, expressive • Can be implemented in hardware

Results • Compared to independently managing core and L2: • Saves 22% on average, 46% max • Learns effective rules from few features • Compiler modifications instrument code • Learned policy offline • Implemented policy in microcontroller

Conclusion • Machine learning derives models from data automatically • Allows easy maintenance of heuristics • Creates models that are more effective than hand-tuned

Learning A Better Compiler

Learning A Better Compiler

Presentation Transcript

Picking a Better Learning Management System (LMS)

Improving Compiler Heuristics with Machine Learning

better learning

Learning to be Better

Better Learning, Better Teaching, Better Schools

Blended Learning , Better Learning

Learning from Better Connected

Better Learning, Better Teaching, Better Schools

Machine Learning in Compiler Optimization

Transformative Learning TO BE A BETTER EDUCATOR?

A Pseudocode Compiler

Better Feedback for Better Learning

Phases of a compiler

What is a compiler?

Modules in a Compiler

Using a PC Compiler

COMPILER

Compiler

Compiler

Better Learning, Better Teaching, Better Schools

Better Learning, Better Teaching, Better Schools