Optimal polynomial time interprocedural register allocation for high level synthesis using ssa form
Download
1 / 27

Philip Brisk - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

csda. csda. Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form. Philip Brisk. Ajay K. Verma. Paolo Ienne. Outline. Register Allocation Overview Interprocedural Register Allocation Related Work SSA Form With Launch and Landing Pads

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Philip Brisk' - abel-spence


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimal polynomial time interprocedural register allocation for high level synthesis using ssa form

csda

csda

Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form

Philip Brisk

Ajay K. Verma

Paolo Ienne


Outline
Outline

  • Register Allocation Overview

  • Interprocedural Register Allocation

  • Related Work

  • SSA Form With Launch and Landing Pads

  • Optimal Solution

  • Experimental Results

  • Conclusion


Modeling register allocation
Modeling Register Allocation

  • For Procedure Pi…

    • Build interference graph Gi = (Vi, Ei)

      • Vi – One vertex for each variable

      • Ei – Edge between each pair of interfering variables

        • Two variables interfere if their lifetimes overlap

    • Compute the chromatic number χ(Gi)

      • Color assignment = Register assignment

      • NP-Complete in general


Local interferences
Local Interferences

  • Local Interferences – Single Procedure

    • Overlapping lifetimes

    • Static Single Assignment (SSA) Form

      • Interference graph is chordal

X 

Y 

 X

Z 

 Y

 Z

Y

Z

X

Y

X

Z


Global interferences
Global Interferences

  • Global Interferences

    • Variable V is live across a call to procedure P

    • V interferes with EVERY local variable in P

      • And all variables in all procedures reachable from P

        • Must consider all paths through the Call Graph

Main:

V 

Call P

 V

P:

Call Q

Q:

Main

P

Q


Global interferences and recursion
Global Interferences and Recursion

  • Fact:

    • No register can hold a local variable across a recursive function call

      • Runtime stack is required

      • Some exceptions (e.g. static local variables)

        • Ignored here

  • Call Graph

    • Compute strongly connected components (SCCs)

    • Collapse each SCC into a single node

    • Resulting “Augmented Component Graph” is acyclic


Interprocedural register allocation
Interprocedural Register Allocation

  • Interprocedural Interference Graph (IIG)

    • Undirected graph G = (V, E)

    • V – All variables in all procedures

    • E – Local AND global interferences

    • Compute chromatic number χ(G)


Related work
Related Work

  • Interprocedural Register Allocation in HLS

    • Color IIG with heuristic [Vemuri et al., TODAES ’02]

      • IIG is large

      • Polynomial heuristics are still slow

    • Scalable Approach [Beidas and Zhu, ASP-DAC ’05]

      • Color each procedure individually

        • Use any heuristic you want

        • Use any intermediate representation you want

      • Propagate global interferences at call points

        • IIG is never built


Contribution
Contribution

  • Interprocedural register allocation

    • Optimal, polynomial-time algorithm

    • Scalable

      • IIG is never built

        • If built, it would be chordal

      • Each Procedure colored individually

        • SSA Form – interference graph is chordal

      • Special case of [Beidas and Zhu, ASP-DAC ’05]

        • Top-down color propagation

        • Novel SSA-based intermediate representation

        • Chordal color assignment (with offset)


Preallocation of global registers

Procedure Call

P – Set of Procedures in App.

Pi

Pj

Pi – Procedure

ck – Call Point

ck

L(ck) – Set of variables live across ck

ck: Call Pj

Preallocation of Global Registers

  • Global registers hold variables that are live across procedure calls

    • How many do we need?


Preallocation of global registers1

Pi

Preallocation of Global Registers

  • Compute: δ – Number of variables live…

    • At the entry of a procedure

  • Across a call point

Procedure: Pi

ck: Call …

δ2

(δi is known)

δ1

δm

L(ck)

δi = MAX {δk}

1 ≤ k ≤ m

δk = δi + |L(ck)|

(i.e. Over all points that call Pi)


Example

P1

P1

P1

P1

0

0

0

0

P2

P2

P2

P2

2

2

0

2

P1

P1

P1

P4

P3

P3

P3

P3

3

3

0

3

2

P4

P4

P4

2

0

2

c8

c10

c11

c7

c9

P5

P5

P5

0

6

6

P6

P6

P6

0

5

5

P2

P2

P2

P3

P3

P3

P4

P4

P4

c7

c7

c7

c7

0

1

1

1

c8

c8

c8

c8

2

0

2

2

c13

c13

c14

c14

c12

c12

c9

c9

c9

c9

3

3

0

3

c10

c10

c10

c10

2

2

2

0

P5

P5

P6

P6

c11

c11

c11

c11

5

5

5

0

c12

c12

c12

c12

5

5

5

0

δ10 = |L(c10)| + δ1

δ11 = |L(c11)| + δ1

δ5 = MAX{δ12, δ13}

δ9 = |L(c9)| + δ1

δ14 = |L(c14)| + δ4

δ2 = MAX{δ7, δ8}

δ8 = |L(c8)| + δ1

δ3 = MAX{δ9}

δ6 = MAX{δ11, δ14}

δ7 = |L(c7)| + δ1

δ13 = |L(c13)| + δ3

δ12 = |L(c12)| + δ2

δ4 = MAX{δ10}

c13

c13

c13

c13

6

6

0

6

δ10 = 2 + 0= 2

δ7 = 1 + 0 = 1

δ3 = MAX{3} = 3

δ6 = MAX{5, 4} = 5

δ4 = MAX{2} = 2

δ13 = 3 + 3 = 6

δ2 = MAX{1, 2} = 2

δ14 = 2 + 2 = 4

δ5 = MAX{5, 6} = 6

δ11 = 5 + 0= 5

δ8 = 2 + 0= 2

δ12 = 3 + 2 = 5

δ9 = 3 + 0= 3

c14

c14

c14

c14

4

0

4

4

i

δi

Example

ci

|L(ci)|

c7

1

1

c8

2

2

c7

c8

c8

c9

c10

c10

c11

c11

c7

c9

c9

3

3

c10

2

2

c11

5

5

c12

3

3

c13

c14

c12

c13

3

3

c14

2

2

δ1 = 0


Preallocation of global registers2

Pi

P

Preallocation of Global Registers

  • When Procedure Pi is called..

    • At most δi variables live across calls leading to Pi

      • Holds for every path in the call graph

    • How to ensure that all variables live across calls leading to Pi are assigned to the right register?

N = MAX {δi} – Number of global registers allocated

T = {T1, …., TN}


Launch and landing pads
Launch and Landing Pads

  • Procedure Pi calls Pj; (m = δi)

    • Assign variables live across calls leading to Pi to T1…Tm

    • Let ck be the call point; n = |L(ck)|

      • Launch Pad

        • Parallel copy placed before the call

          (Tm+1…Tm+n)  ψ(L(ck))

      • Landing Pad

        • Copy the values back after the call

          L(ck)  ψ((Tm+1…Tm+n))


Theoretical consequences of launch and landing pads
Theoretical Consequences of Launch and Landing Pads

  • Theorem:

    • All global interferences involve at least one global register

  • Corollary:

    • Local variables in distinct procedures do not interfere

  • Corollary:

    • No local variable in “main” has a global interference

  • Theorem:

    • Every variable defined locally in Pi (m = δi)

      • Interferes with global registers T1…Tm

      • Does NOT interfere with global registers Tm+1, … TN

        => Can assign local vars in Pi to global registers Tm+1, … TN


Reducing the chromatic number
Reducing the Chromatic Number

Procedure: A

V  …

Call B

W  …

…  V

X  …

…  W

Y  …

…  X

Call B

…  Y

Procedure: B

Z  …

…  Z

V

W

V

W

X

Y

Z

X

Y

Chromatic Number = 3


Reducing the chromatic number1
Reducing the Chromatic Number

Procedure: A

V  …

T1Ψ(V)

Call B

V  Ψ-1(T1)

W  …

…  V

X  …

…  W

Y  …

…  X

T1Ψ(Y)

Call B

Y Ψ-1(T1)

…  Y

Procedure: B

Z  …

…  Z

V

T1

V

W

X

W

V

Y

X

Y

T1

Z

T1

Chromatic Number = 2


Characterizing the iig

Pi

P

Characterizing the IIG

  • Theorem:

    • T is a clique in the IIG

  • Theorem:

    • IIG is chordal

  • Theorem:

    • Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}


Example1
Example

CLIQUE

N = 6

T1

T2

T3

T4

T5

T6

G1

G2

G3

G4

G5

G6

δ1 = 0

δ2 = 2

δ3 = 3

δ4 = 2

δ3 = 6

δ6 = 5

Global interference

Tj interferes with each local variable in Gi


Coloring algorithm
Coloring Algorithm

  • Use SSA+LLP Form, but DON’T build the IIG

  • For Pi colors in the range 1..δi are unavailable

  • Color the local (chordal) interference graph Gi of Pi

    • Complexity: O(Vi + Ei)

  • For each vertex in Pi, replace color c with c + δi

    • Complexity: O(Vi)


Experiments
Experiments

  • Applications taken from Mediabench and MiBench

    • Written in C

    • Compiled Using Machine SUIF

  • Optimal color assignment

  • Compare to heuristics

    • Color Palette Propagation

      • Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05]

        • Heuristic Color Assignment [Matula and Beck, JACM ’83]


Registers allocated normalized to optimal
Registers Allocated (Normalized to Optimal)


Runtime normalized to optimal
Runtime (Normalized to Optimal)


Runtime of pegwit normalized to optimal
Runtime of Pegwit(Normalized to Optimal)


Limitations
Limitations

  • Global Variables

    • Interfere with all variables in the program

    • Lifetime can still be analyzed

  • Static Local Variables

    • Initialized on first access

    • Hold their values across function calls

  • Function Pointers

    • Resolution is NP-Complete


Conclusion
Conclusion

  • Inteprocedural register allocation in HLS

    • Optimal, polynomial-time algorithm

      • Uses SSA Form + Launch/Landing Pads

      • IIG is a chordal graph

      • Scalable – no need to build IIG

      • Significantly faster than sub-optimal heuristics

  • A few limitations

    • Global variables, local static variables

    • Function pointers

      • Resolution is NP-Complete


Related work1
Related Work

  • Register Allocation in HLS

    • Clique Partitioning/Coloring Problem

      • [Tseng and Siewiorek, ’86]

    • Scheduled DFGs – Interval Graphs

      • [Kurdahi and Parker, ’87]

    • Scheduled Cyclic DFGs – Circular Arc Graphs

      • (NP-Complete)

      • [Stok, ’92]

    • Restrictions on Variable Lifetimes – Chordal Graphs

      • [Springer and Thomas, ’94]

    • Static Single Assignment Form – Chordal Graphs

      • [Brisk et al. 2005/6], [Hack and Goos, 2005/6],

        [Bouchez et al. 2005]


ad