1 / 26

Learning A Better Compiler

Learning A Better Compiler. Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning. Predicting Unroll Factors. Loop Unrolling sensitive to unroll factor Current solution: expert design

Download Presentation

Learning A Better Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning

  2. Predicting Unroll Factors • Loop Unrolling sensitive to unroll factor • Current solution: expert design • Difficult: Hand-tuned heuristics • Must be rewritten frequently • Predict parameters with machine learning • Easy: data collection takes ~1wk • No human time • Algorithm does not change with compiler

  3. Loop Unrolling • Combines multiple iterations loop body • Fewer Iterations  Less Branching • Allows other transformations: • Exposes adjacent memory locations • Allows instruction reordering across iterations

  4. Unroll Factors • How many iterations to combine? • Too few? • Provides little benefit • Too large • Increased cache pressure • Increase live rangeregister pressure

  5. Optimal Unroll Factors

  6. Classification Problems • Input a vector of features • E.g. nest depth, # of branches, # of ops • Output a class • E.g. unroll factor, 1-8 • No prior knowledge required • Meaning of features/classes • Relevance of features • Relationships between features

  7. Nearest Neighbors • Paper describes Kernel Density Estimator • All dimensions normalized to [0,1] • Given a test point p: • Consider training points “close” to p • Within fixed distance, e.g. 0.3 • Majority vote among qualifying training points

  8. Nearest Neighbors

  9. Support Vector Machine • Assume two classes, easily generalized • Transform data • Make classes linearly separable • Find line to maximize sep. margin • For test point: • Perform transformation • Classify based on learned line

  10. Maximal Margin

  11. Non-Linear SVM

  12. # operands Live range size Critical path length # operations Known tripcount # floating point ops Loop nest level # branches # memory ops Instruction fan-in in DAG # instructions Language: C, fortran # memory ops # Implicit instructions & more (38 total) Some Features

  13. Results: No Software Parallelism

  14. Results: With Software Parallelism

  15. Big Idea: Easy Maintenance • Performance improvements modest • Sometimes worse, sometimes much better • Usually little change • Requires no re-tuning to change compiler • Gathering data takes ~1wk, no human time • General mechanism • Can be applied to all parameters • No model of system needed • Can be applied to new transformations where expert knowledge is unavailable

  16. Integrated CPU and L2 Cache Voltage Scaling using Machine Learning

  17. Dynamic Voltage Control • Monitor system • When activity is low, reduce power • Also reduces computational capacity • May need more energy if work takes longer

  18. Multiple Clock Domains • Adjust separate components independently • Better performance/power • E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance • More complex DVM policy

  19. Motivation • Applications go through phases • Frequency/voltages should change too • Focus on core, L2 cache • Consume large fraction of total power • Best policy may change over time • On battery: conserve power • Plugged in: maximize performance

  20. Learning a DVM Policy • Compiler automatically instruments code • Insert sampling code to record perf. Counters • Instrument code only to gather data • Use machine learning to create policy • Implement policy in microcontroller

  21. ML Parameters • Features • Clock cycles per instruction • L2 accesses per instruction • Memory access per instruction • Select voltage to minimize: • Total energy • Energy*delay

  22. Machine Learning Algorithm • Automatically learn set of if-then rules • E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz • Compact, expressive • Can be implemented in hardware

  23. Results • Compared to independently managing core and L2: • Saves 22% on average, 46% max • Learns effective rules from few features • Compiler modifications instrument code • Learned policy offline • Implemented policy in microcontroller

  24. Conclusion • Machine learning derives models from data automatically • Allows easy maintenance of heuristics • Creates models that are more effective than hand-tuned

More Related