Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement ...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Katherine E. Coons, PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning. Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley. Motivation. Programmer time is expensive Time-to-market is short

Download Presentation

Katherine E. Coons,

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Katherine e coons

Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning

Katherine E. Coons,

Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley


Motivation

Motivation

  • Programmer time is expensive

  • Time-to-market is short

  • Compiler is a key component for performance

  • Performance depends on hard-to-tune heuristics

    • Function inlining, hyperblock formation, loop unrolling, instruction scheduling, register allocation

Machine learning can help


Machine learning for compilers

Machine Learning for Compilers

  • Learning to schedule (NIPS ‘97, PLDI ‘04)

  • Meta Optimization (PLDI ‘03)

  • Automatically tuning inlining heuristics (Supercomputing ‘05)

  • Predicting unroll factors (CGO ‘05)

  • Machine learning for iterative optimization (CGO ‘06)

Focus on feature selection,

learn something about the problem


Overview

NEAT

Group blocks

Feature Selection

Specialized solutions

Clustering

Initial feature set

Reduced feature set

Lasso regression

General solutions

Correlation

Classifier solutions

Classification

Data mining

Reinforcement learning

Unsupervised and supervised learning

Overview


Compiling for trips

R1

R2

Legend

Compiling for TRIPS

Block Control

Flow Graph

Dataflow

Graph

Execution

Substrate

HB1

R2

Source

Code

mul

R1

R2

R1

add

mul

add

mul

add

add

HB2

HB3

mul

add

add

add

mul

add

R1

mul

add

HB4

R1

Register

Data cache

Execution

Control


Trips scheduling overview

Legend

TRIPS Scheduling Overview

R2

add

mul

br

ld

ld

Static Placement,

Dynamic Issue

D0

ctrl

D1

R1

ctrl

R2

R1

R1

mul

D0

Scheduler

add

mul

Dataflow

Graph

add

D1

ld

mul

add

W1

br

R1

ctrl

R2

R1

D0

Placement

D1

Register

Data cache

Execution

Control

Topology

128! Scheduling possibilities


Spatial path scheduling

Schedule (block, topology) {

initialize known anchor points

while (not all instructions scheduled) {

for (each instruction in open list, i) {

for (each available location, n) {

calculate placement cost for (i, n)

keep track of n with min placement cost

}

keep track of i with highest min placement cost

}

schedule i with highest min placement cost

}

}

calculate placement cost for (i, n)

Placement

cost

+

* * * * * *

0.7

1.6

0.2

-.3

1.1

0.4

0.2

-.2

0.1

1.0

1.1

0.9

Legend

Output node

Hidden node

Input node

Features

Spatial Path Scheduling

Function approximator


Overview1

NEAT

Group blocks

Feature Selection

Specialized solutions

Clustering

Initial feature set

Reduced feature set

Lasso regression

General solutions

Correlation

Classifier solutions

Classification

Data mining

Reinforcement learning

Unsupervised and supervised learning

Overview


Feature selection

Feature Selection

  • Features important for reinforcement learning

  • Implemented 64 features

    • Loop features (nesting depth)

    • Block features (fullness)

    • Instruction features (latency)

    • Tile features (row)

    • Instruction/tile features (critical path length)

  • Reduced feature set size

    • Correlation

    • Lasso regression


Feature selection via the lasso

Feature Selection via the Lasso

  • Goal: Rank features by effect on performance when used in placement cost

  • Feature coefficients as performance predictors

  • Dimensionality reduction

    • Subset of variables that exhibits strongest effects

    • Forces lasso coefficients to zero


Lasso input data generation

Dataflow

Graph

Placement Cost = 1.7

R2

mul

add

add

R1

mul

add

Calculate features

Coefficients

R1

Topology

0.7

1.6

Critical path length

Latency

Link utilization

Tile utilization

Max resource usage

Local inputs

Remote siblings

n = number of features

i = instruction being placed

l = location under consideration

PC(i,l) = Placement cost for i at l

FVk = kth Feature Value

0.2

-0.3

0.4

1.1

0.2

0.2

0.1

-1.0

0.9

1.1

0.6

0.2

Lasso Input Data Generation

R1

R2


Feature prioritization

Tile number

Local inputs

Criticality

Remote inputs

Link utilization

Remote siblings

Loop-carried dep.

Critical path length

Is load

Prioritized Features

Single data point:

coeff0

coeff1

Speedup: 1.0

Speedup: 0.7

coeff2

coeff3

coeff1

Speedup: 0.7

Speedup: 0.9

coeff4

coeff5

Speedup: 0.9

Speedup: 0.9

Speedup: 0.8

coeff6

coeff7

Speedup: 0.6

Speedup: 1.0

coeff7

coeff9

Speedup: 1.1

Speedup: 0.8

Feature Prioritization


Feature selection overview

Feature Selection Overview

Initial features

64

Prioritized features

64

Pruned features

52

Final feature set

11

Lasso regression

Prune correlated features

Prune based on lasso priority


Overview2

NEAT

Group blocks

Feature Selection

Specialized solutions

Clustering

Initial feature set

Reduced feature set

Lasso regression

General solutions

Correlation

Classifier solutions

Classification

Data mining

Reinforcement learning

Unsupervised and supervised learning

Overview


Katherine e coons

Legend

Output node

Hidden node

Input node

Add node

mutation

Add link

mutation

NEAT

  • Genetic algorithm that uses neural networks

    • Modifies topology of network as well as weights

    • Standard crossover, mutation operators

    • “Complexification” operators


Why neat

Why NEAT?

  • Popular, publicly available, well-supported

    • Nine different implementations

    • Active user group of about 350

  • Domain-independent

  • Large search spaces tractable

    • Complexification reduces training time

    • Inherently favors parsimony

  • Relatively little parameter tuning required

  • Solutions are reusable


Training neat

Input node

Hidden node

Output node

Training NEAT

Schedule using

each network

Evolve population

Run program

Crossover

Assign

fitnesses

(geomean

of speedup)

Add link

Mutation

Add node

Legend


Example network

Legend

Output node

Hidden node

Input node

0.1

0.5

0.5

0.8

0.7

-1.2

-1.1

1.7

0.7

-4.1

0.9

Is load

Is store

Local inputs

Criticality

Tile utilization

Remote siblings

Critical path length

Loop-carried dependence

Example Network

Placement Cost

Features:


Overview3

NEAT

Group blocks

Feature Selection

Specialized solutions

Clustering

Initial feature set

Reduced feature set

Lasso regression

General solutions

Correlation

Classifier solutions

Classification

Data mining

Reinforcement learning

Unsupervised and supervised learning

Overview


Grouping blocks

Grouping Blocks

  • Different blocks may require different placement features/heuristics

    • 12% speedup with specialized heuristics

    • Less than 1% speedup with general heuristics

  • Choose heuristic based on block characteristics

    • Cluster blocks that perform well with same networks

    • Classify based on block characteristics

    • Learn different solutions for different groups


Overview4

NEAT

Group blocks

Feature Selection

Specialized solutions

Clustering

Initial feature set

Reduced feature set

Lasso regression

General solutions

Correlation

Classifier solutions

Classification

Data mining

Reinforcement learning

Unsupervised and supervised learning

Overview


Experimental setup

Experimental Setup

  • All tests performed on TRIPS prototype system

  • Fitness: Geomean of speedup in cycles

  • 64 features before feature selection, 11 after

  • Population size = 264 networks

  • 100 generations per NEAT run

  • Compared with simulated annealing scheduler

  • 47 small benchmarks

    • SPEC2000 kernels

    • EEMBC benchmarks

    • Signal processing kernels from GMTI radar suite

    • Vector add, fast fourier transform


Feature selection results

Feature Selection Results

Training across four benchmarks with initial and lasso features


Simulated annealing vs neat

2%

improvement

8%

improvement

Simulated Annealing vs. NEAT

Speedup over programmer-designed heuristic for 47 specialized solutions

Geomean of speedup


General solutions and classification

General Solutions and Classification

  • General solution across all 47 benchmarks

    • Geomean of speedup = 1.00 after 100 generations

    • Required approximately one month

  • Classification

    • Three classes, trained two

    • Geomean of speedup = 1.03 after 4 generations

    • Required approximately two days

  • New benchmarks see little speedup


Conclusions

Conclusions

  • Feature selection is important

    • Incorporate performance metrics

  • NEAT useful for optimizing compiler heuristics

    • Well supported, little parameter tuning

    • Very useful for specialized solutions

    • More work needed to find good general solutions

  • Open questions

    • What can learned heuristics teach us?

    • Can we simultaneously learn different heuristics?

    • How can we learn better general heuristics?


Questions

Questions?


  • Login