optimal polynomial time interprocedural register allocation for high level synthesis using ssa form
Download
Skip this Video
Download Presentation
Philip Brisk

Loading in 2 Seconds...

play fullscreen
1 / 27

Philip Brisk - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

csda. csda. Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form. Philip Brisk. Ajay K. Verma. Paolo Ienne. Outline. Register Allocation Overview Interprocedural Register Allocation Related Work SSA Form With Launch and Landing Pads

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Philip Brisk' - abel-spence


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
optimal polynomial time interprocedural register allocation for high level synthesis using ssa form

csda

csda

Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form

Philip Brisk

Ajay K. Verma

Paolo Ienne

outline
Outline
  • Register Allocation Overview
  • Interprocedural Register Allocation
  • Related Work
  • SSA Form With Launch and Landing Pads
  • Optimal Solution
  • Experimental Results
  • Conclusion
modeling register allocation
Modeling Register Allocation
  • For Procedure Pi…
    • Build interference graph Gi = (Vi, Ei)
      • Vi – One vertex for each variable
      • Ei – Edge between each pair of interfering variables
        • Two variables interfere if their lifetimes overlap
    • Compute the chromatic number χ(Gi)
      • Color assignment = Register assignment
      • NP-Complete in general
local interferences
Local Interferences
  • Local Interferences – Single Procedure
    • Overlapping lifetimes
    • Static Single Assignment (SSA) Form
      • Interference graph is chordal

X 

Y 

 X

Z 

 Y

 Z

Y

Z

X

Y

X

Z

global interferences
Global Interferences
  • Global Interferences
    • Variable V is live across a call to procedure P
    • V interferes with EVERY local variable in P
      • And all variables in all procedures reachable from P
        • Must consider all paths through the Call Graph

Main:

V 

Call P

 V

P:

Call Q

Q:

Main

P

Q

global interferences and recursion
Global Interferences and Recursion
  • Fact:
    • No register can hold a local variable across a recursive function call
      • Runtime stack is required
      • Some exceptions (e.g. static local variables)
        • Ignored here
  • Call Graph
    • Compute strongly connected components (SCCs)
    • Collapse each SCC into a single node
    • Resulting “Augmented Component Graph” is acyclic
interprocedural register allocation
Interprocedural Register Allocation
  • Interprocedural Interference Graph (IIG)
    • Undirected graph G = (V, E)
    • V – All variables in all procedures
    • E – Local AND global interferences
    • Compute chromatic number χ(G)
related work
Related Work
  • Interprocedural Register Allocation in HLS
    • Color IIG with heuristic [Vemuri et al., TODAES ’02]
      • IIG is large
      • Polynomial heuristics are still slow
    • Scalable Approach [Beidas and Zhu, ASP-DAC ’05]
      • Color each procedure individually
        • Use any heuristic you want
        • Use any intermediate representation you want
      • Propagate global interferences at call points
        • IIG is never built
contribution
Contribution
  • Interprocedural register allocation
    • Optimal, polynomial-time algorithm
    • Scalable
      • IIG is never built
        • If built, it would be chordal
      • Each Procedure colored individually
        • SSA Form – interference graph is chordal
      • Special case of [Beidas and Zhu, ASP-DAC ’05]
        • Top-down color propagation
        • Novel SSA-based intermediate representation
        • Chordal color assignment (with offset)
preallocation of global registers

Procedure Call

P – Set of Procedures in App.

Pi

Pj

Pi – Procedure

ck – Call Point

ck

L(ck) – Set of variables live across ck

ck: Call Pj

Preallocation of Global Registers
  • Global registers hold variables that are live across procedure calls
    • How many do we need?
preallocation of global registers1

Pi

Preallocation of Global Registers
  • Compute: δ – Number of variables live…
    • At the entry of a procedure
  • Across a call point

Procedure: Pi

ck: Call …

δ2

(δi is known)

δ1

δm

L(ck)

δi = MAX {δk}

1 ≤ k ≤ m

δk = δi + |L(ck)|

(i.e. Over all points that call Pi)

example

P1

P1

P1

P1

0

0

0

0

P2

P2

P2

P2

2

2

0

2

P1

P1

P1

P4

P3

P3

P3

P3

3

3

0

3

2

P4

P4

P4

2

0

2

c8

c10

c11

c7

c9

P5

P5

P5

0

6

6

P6

P6

P6

0

5

5

P2

P2

P2

P3

P3

P3

P4

P4

P4

c7

c7

c7

c7

0

1

1

1

c8

c8

c8

c8

2

0

2

2

c13

c13

c14

c14

c12

c12

c9

c9

c9

c9

3

3

0

3

c10

c10

c10

c10

2

2

2

0

P5

P5

P6

P6

c11

c11

c11

c11

5

5

5

0

c12

c12

c12

c12

5

5

5

0

δ10 = |L(c10)| + δ1

δ11 = |L(c11)| + δ1

δ5 = MAX{δ12, δ13}

δ9 = |L(c9)| + δ1

δ14 = |L(c14)| + δ4

δ2 = MAX{δ7, δ8}

δ8 = |L(c8)| + δ1

δ3 = MAX{δ9}

δ6 = MAX{δ11, δ14}

δ7 = |L(c7)| + δ1

δ13 = |L(c13)| + δ3

δ12 = |L(c12)| + δ2

δ4 = MAX{δ10}

c13

c13

c13

c13

6

6

0

6

δ10 = 2 + 0= 2

δ7 = 1 + 0 = 1

δ3 = MAX{3} = 3

δ6 = MAX{5, 4} = 5

δ4 = MAX{2} = 2

δ13 = 3 + 3 = 6

δ2 = MAX{1, 2} = 2

δ14 = 2 + 2 = 4

δ5 = MAX{5, 6} = 6

δ11 = 5 + 0= 5

δ8 = 2 + 0= 2

δ12 = 3 + 2 = 5

δ9 = 3 + 0= 3

c14

c14

c14

c14

4

0

4

4

i

δi

Example

ci

|L(ci)|

c7

1

1

c8

2

2

c7

c8

c8

c9

c10

c10

c11

c11

c7

c9

c9

3

3

c10

2

2

c11

5

5

c12

3

3

c13

c14

c12

c13

3

3

c14

2

2

δ1 = 0

preallocation of global registers2

Pi

P

Preallocation of Global Registers
  • When Procedure Pi is called..
    • At most δi variables live across calls leading to Pi
      • Holds for every path in the call graph
    • How to ensure that all variables live across calls leading to Pi are assigned to the right register?

N = MAX {δi} – Number of global registers allocated

T = {T1, …., TN}

launch and landing pads
Launch and Landing Pads
  • Procedure Pi calls Pj; (m = δi)
    • Assign variables live across calls leading to Pi to T1…Tm
    • Let ck be the call point; n = |L(ck)|
      • Launch Pad
        • Parallel copy placed before the call

(Tm+1…Tm+n)  ψ(L(ck))

      • Landing Pad
        • Copy the values back after the call

L(ck)  ψ((Tm+1…Tm+n))

theoretical consequences of launch and landing pads
Theoretical Consequences of Launch and Landing Pads
  • Theorem:
    • All global interferences involve at least one global register
  • Corollary:
    • Local variables in distinct procedures do not interfere
  • Corollary:
    • No local variable in “main” has a global interference
  • Theorem:
    • Every variable defined locally in Pi (m = δi)
      • Interferes with global registers T1…Tm
      • Does NOT interfere with global registers Tm+1, … TN

=> Can assign local vars in Pi to global registers Tm+1, … TN

reducing the chromatic number
Reducing the Chromatic Number

Procedure: A

V  …

Call B

W  …

…  V

X  …

…  W

Y  …

…  X

Call B

…  Y

Procedure: B

Z  …

…  Z

V

W

V

W

X

Y

Z

X

Y

Chromatic Number = 3

reducing the chromatic number1
Reducing the Chromatic Number

Procedure: A

V  …

T1Ψ(V)

Call B

V  Ψ-1(T1)

W  …

…  V

X  …

…  W

Y  …

…  X

T1Ψ(Y)

Call B

Y Ψ-1(T1)

…  Y

Procedure: B

Z  …

…  Z

V

T1

V

W

X

W

V

Y

X

Y

T1

Z

T1

Chromatic Number = 2

characterizing the iig

Pi

P

Characterizing the IIG
  • Theorem:
    • T is a clique in the IIG
  • Theorem:
    • IIG is chordal
  • Theorem:
    • Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}
example1
Example

CLIQUE

N = 6

T1

T2

T3

T4

T5

T6

G1

G2

G3

G4

G5

G6

δ1 = 0

δ2 = 2

δ3 = 3

δ4 = 2

δ3 = 6

δ6 = 5

Global interference

Tj interferes with each local variable in Gi

coloring algorithm
Coloring Algorithm
  • Use SSA+LLP Form, but DON’T build the IIG
  • For Pi colors in the range 1..δi are unavailable
  • Color the local (chordal) interference graph Gi of Pi
    • Complexity: O(Vi + Ei)
  • For each vertex in Pi, replace color c with c + δi
    • Complexity: O(Vi)
experiments
Experiments
  • Applications taken from Mediabench and MiBench
    • Written in C
    • Compiled Using Machine SUIF
  • Optimal color assignment
  • Compare to heuristics
    • Color Palette Propagation
      • Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05]
        • Heuristic Color Assignment [Matula and Beck, JACM ’83]
limitations
Limitations
  • Global Variables
    • Interfere with all variables in the program
    • Lifetime can still be analyzed
  • Static Local Variables
    • Initialized on first access
    • Hold their values across function calls
  • Function Pointers
    • Resolution is NP-Complete
conclusion
Conclusion
  • Inteprocedural register allocation in HLS
    • Optimal, polynomial-time algorithm
      • Uses SSA Form + Launch/Landing Pads
      • IIG is a chordal graph
      • Scalable – no need to build IIG
      • Significantly faster than sub-optimal heuristics
  • A few limitations
    • Global variables, local static variables
    • Function pointers
      • Resolution is NP-Complete
related work1
Related Work
  • Register Allocation in HLS
    • Clique Partitioning/Coloring Problem
      • [Tseng and Siewiorek, ’86]
    • Scheduled DFGs – Interval Graphs
      • [Kurdahi and Parker, ’87]
    • Scheduled Cyclic DFGs – Circular Arc Graphs
      • (NP-Complete)
      • [Stok, ’92]
    • Restrictions on Variable Lifetimes – Chordal Graphs
      • [Springer and Thomas, ’94]
    • Static Single Assignment Form – Chordal Graphs
      • [Brisk et al. 2005/6], [Hack and Goos, 2005/6],

[Bouchez et al. 2005]

ad