1 / 27

Katherine E. Coons,

Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning. Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley. Motivation. Programmer time is expensive Time-to-market is short

malana
Download Presentation

Katherine E. Coons,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley

  2. Motivation • Programmer time is expensive • Time-to-market is short • Compiler is a key component for performance • Performance depends on hard-to-tune heuristics • Function inlining, hyperblock formation, loop unrolling, instruction scheduling, register allocation Machine learning can help

  3. Machine Learning for Compilers • Learning to schedule (NIPS ‘97, PLDI ‘04) • Meta Optimization (PLDI ‘03) • Automatically tuning inlining heuristics (Supercomputing ‘05) • Predicting unroll factors (CGO ‘05) • Machine learning for iterative optimization (CGO ‘06) Focus on feature selection, learn something about the problem

  4. NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

  5. R1 R2 Legend Compiling for TRIPS Block Control Flow Graph Dataflow Graph Execution Substrate HB1 R2 Source Code mul R1 R2 R1 add mul add mul add add HB2 HB3 mul add add add mul add R1 mul add HB4 R1 Register Data cache Execution Control

  6. Legend TRIPS Scheduling Overview R2 add mul br ld ld Static Placement, Dynamic Issue D0 ctrl D1 R1 ctrl R2 R1 R1 mul D0 Scheduler add mul Dataflow Graph add D1 ld mul add W1 br R1 ctrl R2 R1 D0 Placement D1 Register Data cache Execution Control Topology 128! Scheduling possibilities

  7. Schedule (block, topology) { initialize known anchor points while (not all instructions scheduled) { for (each instruction in open list, i) { for (each available location, n) { calculate placement cost for (i, n) keep track of n with min placement cost } keep track of i with highest min placement cost } schedule i with highest min placement cost } } calculate placement cost for (i, n) Placement cost + * * * * * * 0.7 1.6 0.2 -.3 1.1 0.4 0.2 -.2 0.1 1.0 1.1 0.9 Legend Output node Hidden node Input node Features Spatial Path Scheduling Function approximator

  8. NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

  9. Feature Selection • Features important for reinforcement learning • Implemented 64 features • Loop features (nesting depth) • Block features (fullness) • Instruction features (latency) • Tile features (row) • Instruction/tile features (critical path length) • Reduced feature set size • Correlation • Lasso regression

  10. Feature Selection via the Lasso • Goal: Rank features by effect on performance when used in placement cost • Feature coefficients as performance predictors • Dimensionality reduction • Subset of variables that exhibits strongest effects • Forces lasso coefficients to zero

  11. Dataflow Graph Placement Cost = 1.7 R2 mul add add R1 mul add Calculate features Coefficients R1 Topology 0.7 1.6 Critical path length Latency Link utilization Tile utilization Max resource usage Local inputs Remote siblings n = number of features i = instruction being placed l = location under consideration PC(i,l) = Placement cost for i at l FVk = kth Feature Value 0.2 -0.3 0.4 1.1 0.2 0.2 0.1 -1.0 0.9 1.1 0.6 0.2 Lasso Input Data Generation R1 R2

  12. Tile number Local inputs Criticality Remote inputs Link utilization Remote siblings Loop-carried dep. Critical path length Is load … Prioritized Features Single data point: coeff0 coeff1 Speedup: 1.0 Speedup: 0.7 coeff2 coeff3 coeff1 Speedup: 0.7 Speedup: 0.9 coeff4 coeff5 Speedup: 0.9 Speedup: 0.9 Speedup: 0.8 coeff6 coeff7 Speedup: 0.6 Speedup: 1.0 coeff7 coeff9 Speedup: 1.1 Speedup: 0.8 … Feature Prioritization

  13. Feature Selection Overview Initial features 64 Prioritized features 64 Pruned features 52 Final feature set 11 Lasso regression Prune correlated features Prune based on lasso priority

  14. NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

  15. Legend Output node Hidden node Input node Add node mutation Add link mutation NEAT • Genetic algorithm that uses neural networks • Modifies topology of network as well as weights • Standard crossover, mutation operators • “Complexification” operators

  16. Why NEAT? • Popular, publicly available, well-supported • Nine different implementations • Active user group of about 350 • Domain-independent • Large search spaces tractable • Complexification reduces training time • Inherently favors parsimony • Relatively little parameter tuning required • Solutions are reusable

  17. Input node Hidden node Output node Training NEAT Schedule using each network Evolve population Run program Crossover Assign fitnesses (geomean of speedup) Add link Mutation Add node Legend

  18. Legend Output node Hidden node Input node 0.1 0.5 0.5 0.8 0.7 -1.2 -1.1 1.7 0.7 -4.1 0.9 Is load Is store Local inputs Criticality Tile utilization Remote siblings Critical path length Loop-carried dependence Example Network Placement Cost Features:

  19. NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

  20. Grouping Blocks • Different blocks may require different placement features/heuristics • 12% speedup with specialized heuristics • Less than 1% speedup with general heuristics • Choose heuristic based on block characteristics • Cluster blocks that perform well with same networks • Classify based on block characteristics • Learn different solutions for different groups

  21. NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

  22. Experimental Setup • All tests performed on TRIPS prototype system • Fitness: Geomean of speedup in cycles • 64 features before feature selection, 11 after • Population size = 264 networks • 100 generations per NEAT run • Compared with simulated annealing scheduler • 47 small benchmarks • SPEC2000 kernels • EEMBC benchmarks • Signal processing kernels from GMTI radar suite • Vector add, fast fourier transform

  23. Feature Selection Results Training across four benchmarks with initial and lasso features

  24. 2% improvement 8% improvement Simulated Annealing vs. NEAT Speedup over programmer-designed heuristic for 47 specialized solutions Geomean of speedup

  25. General Solutions and Classification • General solution across all 47 benchmarks • Geomean of speedup = 1.00 after 100 generations • Required approximately one month • Classification • Three classes, trained two • Geomean of speedup = 1.03 after 4 generations • Required approximately two days • New benchmarks see little speedup

  26. Conclusions • Feature selection is important • Incorporate performance metrics • NEAT useful for optimizing compiler heuristics • Well supported, little parameter tuning • Very useful for specialized solutions • More work needed to find good general solutions • Open questions • What can learned heuristics teach us? • Can we simultaneously learn different heuristics? • How can we learn better general heuristics?

  27. Questions?

More Related