High level synthesis scheduling allocation assignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 68

High-level Synthesis Scheduling, Allocation, Assignment, PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

High-level Synthesis Scheduling, Allocation, Assignment,. Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS. Overview. High Level Synthesis Scheduling, Allocation and Assignment Estimations Transformations. Allocation, Assignment, and Scheduling.

Download Presentation

High-level Synthesis Scheduling, Allocation, Assignment,

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


High level synthesis scheduling allocation assignment

High-level SynthesisScheduling,Allocation, Assignment,

Note: Several slides in this Lecture are fromProf. Miodrag Potkonjak, UCLA CS


Overview

Overview

  • High Level Synthesis

  • Scheduling, Allocation and Assignment

  • Estimations

  • Transformations


Allocation assignment and scheduling

Allocation, Assignment, and Scheduling

Techniques Well Understood and Mature


Scheduling and assignment

Control

Step

Scheduling and Assignment

Control

Step


Asap scheduling algorithm

ASAP Scheduling Algorithm


Asap scheduling example

ASAP Scheduling Example


Asap another example

ASAP: Another Example

ASAP Schedule

Sequence Graph


Alap scheduling algorithm

ALAP Scheduling Algorithm


Alap scheduling example

ALAP Scheduling Example


Alap another example

ALAP: Another Example

ALAP Schedule(latency constraint = 4)

Sequence Graph


Observation about alap asap

Observation about ALAP & ASAP

  • No priority is given to nodes on critical path

  • As a result, less critical nodes may be scheduled ahead of critical nodes

  • No problem if unlimited hardware

  • However of the resources are limited, the less critical nodes may block the critical nodes and thus produce inferior schedules

  • List scheduling techniques overcome this problem by utilizing a more global node selection criterion


List scheduling and assignment

List Scheduling and Assignment


List scheduling algorithm using decreasing criticalness criterion

List Scheduling Algorithm using Decreasing Criticalness Criterion


Scheduling

Scheduling

  • NP-complete Problem

  • Optimal

  • Heuristics - Iterative Improvements

  • Heuristics – Constructive

  • Various versions of problem

    • Unconstrained minimum latency

    • Resource-constrained minimum latency

    • Timing constrained

  • If all resources identical, reduced to multiprocessor scheduling

    • Minimum latency multiprocessor problem is intractable


  • Scheduling optimal techniques

    Scheduling - Optimal Techniques

    • Integer Linear Programming

    • Branch and Bound


    Integer linear programming

    Integer Linear Programming

    • Given: integer-valued matrix Amxn,

      vectors B = ( b1, b2, … , bm ), C = ( c1, c2, … , cn )

    • Minimize: CTX

    • Subject to:

      AX  B

      X = ( x1, x2, … , xn ) is an integer-valued vector


    Integer linear programming1

    Integer Linear Programming

    • Problem: For a set of (dependent) computations {t1,t2,...,tn}, find the minimum number of units needed to complete the execution by k control steps.

    • Integer linear programming:

      Let y0 be an integer variable.

      For each control step i ( 1  i  k ):

      define variable xij as

      xij = 1, if computation tj is executed in the ith control step.

      xij = 0, otherwise.

      define variable yi = xi1 + xI2 + ... + xin .


    Integer linear programming2

    Integer Linear Programming

    • Integer linear programming:

      For each computation dependency: ti has to be done before tj, introduce a constraint:

      k x1i+ (k-1) x2i+ ... + xki kx1j+ (k-1) x2j+ ... + xkj+ 1(*)

      Minimize:y0

      Subject to: x1i+ x2i+ ... + xki = 1 for all 1  i  n

      yj y0 for all 1  i  k

      all computation dependency of type (*)


    An example

    c1

    c2

    c3

    c4

    c5

    c6

    An Example

    6 computations

    3 control steps


    An example1

    An Example

    • Introduce variables:

      • xij for 1  i  3, 1  j  6

      • yi = xi1+xi2+xi3+xi4+xi5+xi6 for 1  i  3

      • y0

    • Dependency constraints: e.g. execute c1 before c4

      3x11+2x21+x31 3x14 +2x24+x34+1

    • Execution constraints:

      x1i+x2i+x3i = 1 for 1  i  6


    An example2

    An Example

    • Minimize:y0

    • Subject to:yi y0 for all 1  i  3

      dependency constraints

      execution constraints

    • One solution:y0 = 2

      x11 = 1, x12 = 1,

      x23 = 1, x24 = 1,

      x35 = 1, x36 = 1.

      All other xij = 0


    Ilp model of scheduling

    ILP Model of Scheduling

    • Binary decision variables xil

      • i = 0, 1, …, n

      • l = 1, 2, … +1

    • Start time is unique


    Ilp model of scheduling contd

    ILP Model of Scheduling (contd.)

    • Sequencing relationships must be satisfied

    • Resource bounds must be met

      • let upper bound on # of resources of type k be ak


    Minimum latency scheduling under resource constraints

    Minimum-latency Scheduling Under Resource-constraints

    • Let t be the vector whose entries are start times

    • Formal ILP model


    Example

    Example

    • Two types of resources

      • Multiplier

      • ALU

        • Adder

        • Subtraction

        • Comparison

    • Both take 1 cycle execution time


    Example contd

    Example (contd.)

    • Heuristic (list scheduling) gives latency = 4 steps

    • Use ALAP and ASAP (with no resource constraints) to get bounds on start times

      • ASAP matches latency of heuristic

        • so heuristic is optimum, but let us ignore it!

    • Constraints?


    Example contd1

    Example (contd.)

    • Start time is unique


    Example contd2

    Example (contd.)

    • Sequencing constraints

      • note: only non-trivial ones listed

        • those with more than one possible start time for at least one operation


    Example contd3

    Example (contd.)

    • Resource constraints


    Example contd4

    Example (contd.)

    • Consider c = [0, 0, …, 1]T

      • Minimum latency schedule

      • since sink has no mobility (xn,5 = 1), any feasible schedule is optimum

    • Consider c = [1, 1, …, 1] T

      • finds earliest start times for all operations

      • equivalently,


    Example solution optimum schedule under resource constraint

    Example Solution: Optimum Schedule Under Resource Constraint


    Example contd5

    Example (contd.)

    • Assume multiplier costs 5 units of area, and ALU costs 1 unit of area

    • Same uniqueness and sequencing constraints as before

    • Resource constraints are in terms of unknown variables a1 and a2

      • a1 = # of multipliers

      • a2 = # of ALUs


    Example contd6

    Example (contd.)

    • Resource constraints


    Example solution

    Example Solution

    • MinimizecTa = 5.a1 + 1.a2

    • Solution with cost 12


    Precedence constrained multiprocessor scheduling

    Precedence-constrained Multiprocessor Scheduling

    • All operations done by the same type of resource

      • intractable problem

      • intractable even if all operations have unit delay


    Scheduling iterative improvement

    Scheduling - Iterative Improvement

    • Kernighan - Lin (deterministic)

    • Simulated Annealing

    • Lottery Iterative Improvement

    • Neural Networks

    • Genetic Algorithms

    • Taboo Search


    Scheduling constructive techniques

    Scheduling - Constructive Techniques

    • Most Constrained

    • Least Constraining


    Force directed scheduling

    Force Directed Scheduling

    • Goal is to reduce hardware by balancing concurrency

    • Iterative algorithm, one operation scheduled per iteration

    • Information (i.e. speed & area) fed back into scheduler


    The force directed scheduling algorithm

    The Force Directed Scheduling Algorithm


    Step 1

    -

    -

    -

    -

    *

    *

    *

    *

    *

    *

    *

    *

    *

    *

    *

    *

    +

    +

    +

    +

    <

    <

    ASAP

    Step 1

    • Determine ASAP and ALAP schedules

    ALAP


    Step 2

    *

    *

    *

    -

    -

    +

    +

    <

    *

    *

    *

    Step 2

    • Determine Time Frame of each op

      • Length of box ~ Possible execution cycles

      • Width of box ~ Probability of assignment

      • Uniform distribution, Area assigned = 1

    C-step 1

    C-step 2

    C-step 3

    1/2

    C-step 4

    1/3

    Time Frames


    Step 3

    0

    0

    1

    1

    2

    2

    3

    3

    4

    4

    Step 3

    • Create Distribution Graphs

      • Sum of probabilities of each Op type

      • Indicates concurrency of similar OpsDG(i) =  Prob(Op, i)

    1

    1

    2

    2

    3

    3

    4

    4

    DG for Multiply

    DG for Add, Sub, Comp


    Diff eq example precedence graph recalled

    Diff Eq Example: Precedence Graph Recalled


    Diff eq example time frame probability calculation

    Diff Eq Example: Time Frame & Probability Calculation


    Diff eq example dg calculation

    Diff Eq Example: DG Calculation


    Conditional statements

    -

    -

    Fork

    +

    +

    +

    Join

    -

    -

    0

    1

    2

    +

    +

    1

    +

    2

    DG for Add

    Conditional Statements

    • Operations in different branches are mutually exclusive

    • Operations of same type can be overlapped onto DG

    • Probability of most likely operation is added to DG


    Self forces

    Self Forces

    • Scheduling an operation will effect overall concurrency

    • Every operation has 'self force' for every C-step of its time frame

    • Analogous to the effect of a spring: f = Kx

    • Desirable scheduling will have negative self force

      • Will achieve better concurrency (lowerpotential energy)

    Force(i) = DG(i) * x(i)

    DG(i) ~ Current Distribution Graph value

    x(i) ~ Change in operation’s probability

    Self Force(j) = [Force(i)]


    Example1

    C-step 1

    C-step 2

    *

    *

    *

    C-step 3

    -

    -

    +

    +

    <

    1/2

    *

    C-step 4

    *

    *

    1/3

    1

    0

    1

    2

    3

    4

    2

    3

    4

    DG for Multiply

    Example

    • Attempt to schedule multiply in C-step 1

    • Self Force(1) = Force(1) + Force(2)

    • = ( DG(1) * X(1) ) + ( DG(2) * X(2) )

    • = [2.833*(0.5) + 2.333 * (-0.5)] = +0.25

    • This is positive, scheduling the multiply in the first C-step would be bad


    Diff eq example self force for node 4

    Diff Eq Example: Self Force for Node 4


    Predecessor successor forces

    -

    -

    *

    *

    *

    *

    +

    +

    <

    *

    *

    Predecessor & Successor Forces

    • Scheduling an operation may affect the time frames of other linked operations

    • This may negate the benefits of the desired assignment

    • Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations


    Diff eq example successor force on node 4

    Diff Eq Example: Successor Force on Node 4

    • If node 4 scheduled in step 1

      • no effect on time frame for successor node 8

    • Total force = Froce4(1) = +0.25

    • If node 4 scheduled in step 2

      • causes node 8 to be scheduled into step 3

      • must calculate successor force


    Diff eq example final time frame and schedule

    Diff Eq Example: Final Time Frame and Schedule


    Diff eq example final dg

    Diff Eq Example: Final DG


    Lookahead

    Lookahead

    • Temporarily modify the constant DG(i) to include the effect of the iteration being considered

      Force (i) = temp_DG(i) * x(i)

      temp_DG(i) = DG(i) + x(i)/3

    • Consider previous example:

      Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2)

      = .5(2.833 + .5/3) -.5(2.333 - .5/3)

      = +.41667

    • This is even worse than before


    Minimization of bus costs

    Minimization of Bus Costs

    • Basic algorithm suitable for narrow class of problems

    • Algorithm can be refined to consider “cost” factors

    • Number of buses ~ number of concurrent data transfers

    • Number of buses = maximum transfers in any C-step

    • Create modified DG to include transfers: Transfer DG

      Trans DG(i) =  [Prob (op,i) * Opn_No_InOuts]

      Opn_No_InOuts ~ combined distinct in/outputs for Op

    • Calculate Force with this DG and add to Self Force


    Minimization of register costs

    s

    s

    1

    s

    0

    1

    2

    d

    2

    3

    3

    d

    d

    4

    4

    Storage distribution for S

    5

    ASAP Lifetime

    MAX Lifetime

    ALAP Lifetime

    Minimization of Register Costs

    • Minimum registers required is given by the largest number of data arcs crossing a C-step boundary

    • Create Storage Operations, at output of any operation that transfers a value to a destination in a later C-step

    • Generate Storage DG for these “operations”

    • Length of storage operation depends on final schedule


    Minimization of register costs contd

    ASAP

    Force Directed

    7 registers minimum

    5 registers minimum

    Minimization of Register Costs( contd.)

    • avg life] =

    • storage DG(i) = (no overlap between ASAP & ALAP)

    • storage DG(i) = (if overlap)

    • Calculate and add “Storage” Force to Self Force


    Pipelining

    Functional Pipelining

    1

    +

    *

    *

    *

    2

    <

    *

    *

    *

    3, 1’

    -

    +

    *

    *

    *

    4, 2’

    -

    +

    <

    *

    *

    *

    3’

    -

    Instance

    4’

    -

    +

    Instance’

    1', 3

    2', 4

    DG for Multiply

    Structural Pipelining

    1

    *

    2

    *

    3

    4

    0

    1

    2

    3

    4

    0

    1

    2

    1

    2

    3

    4

    Pipelining

    • Functional Pipelining

      • Pipelining across multiple operations

      • Must balance distribution across groups of concurrent C-steps

      • Cut DG horizontally and superimpose

      • Finally perform regular Force Directed Scheduling

    • Structural Pipelining

      • Pipelining within an operation

      • For non data-dependant operations, only the first C-step need be considered


    Other optimizations

    Other Optimizations

    • Local timing constraints

      • Insert dummy timing operations -> Restricted time frames

    • Multiclass FU’s

      • Create multiclass DG by summing probabilities of relevant ops

    • Multistep/Chained operations.

      • Carry propagation delay information with operation

      • Extend time frames into other C-steps as required

    • Hardware constraints

      • Use Force as priority function in list scheduling algorithms


    Scheduling using simulated annealing

    Scheduling using Simulated Annealing

    Reference:

    Devadas, S.; Newton, A.R.

    Algorithms for hardware allocation in data path synthesis.

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):768-81.


    Simulated annealing

    Local Search

    Cost function

    ?

    Solution space

    Simulated Annealing


    Statistical mechanics combinatorial optimization

    Statistical MechanicsCombinatorial Optimization

    State {r:} (configuration -- a set of atomic position )

    weight e-E({r:])/K BT-- Boltzmann distribution

    E({r:]): energy of configuration

    KB: Boltzmann constant

    T: temperature

    Low temperature limit ??


    Analogy

    Analogy

    Physical System

    State (configuration)

    Energy

    Ground State

    Rapid Quenching

    Careful Annealing

    Optimization Problem

    Solution

    Cost Function

    Optimal Solution

    Iteration Improvement

    Simulated Annealing


    Generic simulated annealing algorithm

    Generic Simulated Annealing Algorithm

    1. Get an initial solution S

    2. Get an initial temperature T > 0

    3. While not yet 'frozen' do the following:

    3.1 For 1 i L, do the following:

    3.1.1 Pick a random neighbor S'of S

    3.1.2 Let =cost(S') - cost(S)

    3.1.3 If  0 (downhill move) set S = S'

    3.1.4 If >0 (uphill move)

    set S=S' with probability e-/T

    3.2 Set T = rT (reduce temperature)

    4. Return S


    Basic ingredients for s a

    Basic Ingredients for S.A.

    • Solution Space

    • Neighborhood Structure

    • Cost Function

    • Annealing Schedule


    Observation

    Observation

    • All scheduling algorithms we have discussed so far are critical path schedulers

    • They can only generate schedules for iteration period larger than or equal to the critical path

    • They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints


    Example2

    Example

    • Can one do better than iteration period of 4?

      • Pipelining + retiming can reduce critical path to 3, and also the # of functional units

    • Approaches

      • Transformations followed by scheduling

      • Transformations integrated with scheduling


    Conclusions

    Conclusions

    • High Level Synthesis

    • Connects Behavioral Description and Structural Description

    • Scheduling, Estimations, Transformations

    • High Level of Abstraction, High Impact on the Final Design


  • Login