1 / 27

# Philip Brisk - PowerPoint PPT Presentation

csda. csda. Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form. Philip Brisk. Ajay K. Verma. Paolo Ienne. Outline. Register Allocation Overview Interprocedural Register Allocation Related Work SSA Form With Launch and Landing Pads

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Philip Brisk' - abel-spence

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

csda

### Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form

Philip Brisk

Ajay K. Verma

Paolo Ienne

• Register Allocation Overview

• Interprocedural Register Allocation

• Related Work

• SSA Form With Launch and Landing Pads

• Optimal Solution

• Experimental Results

• Conclusion

• For Procedure Pi…

• Build interference graph Gi = (Vi, Ei)

• Vi – One vertex for each variable

• Ei – Edge between each pair of interfering variables

• Two variables interfere if their lifetimes overlap

• Compute the chromatic number χ(Gi)

• Color assignment = Register assignment

• NP-Complete in general

• Local Interferences – Single Procedure

• Static Single Assignment (SSA) Form

• Interference graph is chordal

X 

Y 

 X

Z 

 Y

 Z

Y

Z

X

Y

X

Z

• Global Interferences

• Variable V is live across a call to procedure P

• V interferes with EVERY local variable in P

• And all variables in all procedures reachable from P

• Must consider all paths through the Call Graph

Main:

V 

Call P

 V

P:

Call Q

Q:

Main

P

Q

• Fact:

• No register can hold a local variable across a recursive function call

• Runtime stack is required

• Some exceptions (e.g. static local variables)

• Ignored here

• Call Graph

• Compute strongly connected components (SCCs)

• Collapse each SCC into a single node

• Resulting “Augmented Component Graph” is acyclic

• Interprocedural Interference Graph (IIG)

• Undirected graph G = (V, E)

• V – All variables in all procedures

• E – Local AND global interferences

• Compute chromatic number χ(G)

• Interprocedural Register Allocation in HLS

• Color IIG with heuristic [Vemuri et al., TODAES ’02]

• IIG is large

• Polynomial heuristics are still slow

• Scalable Approach [Beidas and Zhu, ASP-DAC ’05]

• Color each procedure individually

• Use any heuristic you want

• Use any intermediate representation you want

• Propagate global interferences at call points

• IIG is never built

• Interprocedural register allocation

• Optimal, polynomial-time algorithm

• Scalable

• IIG is never built

• If built, it would be chordal

• Each Procedure colored individually

• SSA Form – interference graph is chordal

• Special case of [Beidas and Zhu, ASP-DAC ’05]

• Top-down color propagation

• Novel SSA-based intermediate representation

• Chordal color assignment (with offset)

P – Set of Procedures in App.

Pi

Pj

Pi – Procedure

ck – Call Point

ck

L(ck) – Set of variables live across ck

ck: Call Pj

Preallocation of Global Registers

• Global registers hold variables that are live across procedure calls

• How many do we need?

Pi

Preallocation of Global Registers

• Compute: δ – Number of variables live…

• At the entry of a procedure

• Across a call point

Procedure: Pi

ck: Call …

δ2

(δi is known)

δ1

δm

L(ck)

δi = MAX {δk}

1 ≤ k ≤ m

δk = δi + |L(ck)|

(i.e. Over all points that call Pi)

P1

P1

P1

P1

0

0

0

0

P2

P2

P2

P2

2

2

0

2

P1

P1

P1

P4

P3

P3

P3

P3

3

3

0

3

2

P4

P4

P4

2

0

2

c8

c10

c11

c7

c9

P5

P5

P5

0

6

6

P6

P6

P6

0

5

5

P2

P2

P2

P3

P3

P3

P4

P4

P4

c7

c7

c7

c7

0

1

1

1

c8

c8

c8

c8

2

0

2

2

c13

c13

c14

c14

c12

c12

c9

c9

c9

c9

3

3

0

3

c10

c10

c10

c10

2

2

2

0

P5

P5

P6

P6

c11

c11

c11

c11

5

5

5

0

c12

c12

c12

c12

5

5

5

0

δ10 = |L(c10)| + δ1

δ11 = |L(c11)| + δ1

δ5 = MAX{δ12, δ13}

δ9 = |L(c9)| + δ1

δ14 = |L(c14)| + δ4

δ2 = MAX{δ7, δ8}

δ8 = |L(c8)| + δ1

δ3 = MAX{δ9}

δ6 = MAX{δ11, δ14}

δ7 = |L(c7)| + δ1

δ13 = |L(c13)| + δ3

δ12 = |L(c12)| + δ2

δ4 = MAX{δ10}

c13

c13

c13

c13

6

6

0

6

δ10 = 2 + 0= 2

δ7 = 1 + 0 = 1

δ3 = MAX{3} = 3

δ6 = MAX{5, 4} = 5

δ4 = MAX{2} = 2

δ13 = 3 + 3 = 6

δ2 = MAX{1, 2} = 2

δ14 = 2 + 2 = 4

δ5 = MAX{5, 6} = 6

δ11 = 5 + 0= 5

δ8 = 2 + 0= 2

δ12 = 3 + 2 = 5

δ9 = 3 + 0= 3

c14

c14

c14

c14

4

0

4

4

i

δi

Example

ci

|L(ci)|

c7

1

1

c8

2

2

c7

c8

c8

c9

c10

c10

c11

c11

c7

c9

c9

3

3

c10

2

2

c11

5

5

c12

3

3

c13

c14

c12

c13

3

3

c14

2

2

δ1 = 0

Pi

P

Preallocation of Global Registers

• When Procedure Pi is called..

• At most δi variables live across calls leading to Pi

• Holds for every path in the call graph

• How to ensure that all variables live across calls leading to Pi are assigned to the right register?

N = MAX {δi} – Number of global registers allocated

T = {T1, …., TN}

• Procedure Pi calls Pj; (m = δi)

• Assign variables live across calls leading to Pi to T1…Tm

• Let ck be the call point; n = |L(ck)|

• Parallel copy placed before the call

(Tm+1…Tm+n)  ψ(L(ck))

• Copy the values back after the call

L(ck)  ψ((Tm+1…Tm+n))

• Theorem:

• All global interferences involve at least one global register

• Corollary:

• Local variables in distinct procedures do not interfere

• Corollary:

• No local variable in “main” has a global interference

• Theorem:

• Every variable defined locally in Pi (m = δi)

• Interferes with global registers T1…Tm

• Does NOT interfere with global registers Tm+1, … TN

=> Can assign local vars in Pi to global registers Tm+1, … TN

Procedure: A

V  …

Call B

W  …

…  V

X  …

…  W

Y  …

…  X

Call B

…  Y

Procedure: B

Z  …

…  Z

V

W

V

W

X

Y

Z

X

Y

Chromatic Number = 3

Procedure: A

V  …

T1Ψ(V)

Call B

V  Ψ-1(T1)

W  …

…  V

X  …

…  W

Y  …

…  X

T1Ψ(Y)

Call B

Y Ψ-1(T1)

…  Y

Procedure: B

Z  …

…  Z

V

T1

V

W

X

W

V

Y

X

Y

T1

Z

T1

Chromatic Number = 2

Pi

P

Characterizing the IIG

• Theorem:

• T is a clique in the IIG

• Theorem:

• IIG is chordal

• Theorem:

• Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}

CLIQUE

N = 6

T1

T2

T3

T4

T5

T6

G1

G2

G3

G4

G5

G6

δ1 = 0

δ2 = 2

δ3 = 3

δ4 = 2

δ3 = 6

δ6 = 5

Global interference

Tj interferes with each local variable in Gi

• Use SSA+LLP Form, but DON’T build the IIG

• For Pi colors in the range 1..δi are unavailable

• Color the local (chordal) interference graph Gi of Pi

• Complexity: O(Vi + Ei)

• For each vertex in Pi, replace color c with c + δi

• Complexity: O(Vi)

• Applications taken from Mediabench and MiBench

• Written in C

• Compiled Using Machine SUIF

• Optimal color assignment

• Compare to heuristics

• Color Palette Propagation

• Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05]

• Heuristic Color Assignment [Matula and Beck, JACM ’83]

Registers Allocated (Normalized to Optimal)

Runtime (Normalized to Optimal)

Runtime of Pegwit(Normalized to Optimal)

• Global Variables

• Interfere with all variables in the program

• Lifetime can still be analyzed

• Static Local Variables

• Initialized on first access

• Hold their values across function calls

• Function Pointers

• Resolution is NP-Complete

• Inteprocedural register allocation in HLS

• Optimal, polynomial-time algorithm

• Uses SSA Form + Launch/Landing Pads

• IIG is a chordal graph

• Scalable – no need to build IIG

• Significantly faster than sub-optimal heuristics

• A few limitations

• Global variables, local static variables

• Function pointers

• Resolution is NP-Complete

• Register Allocation in HLS

• Clique Partitioning/Coloring Problem

• [Tseng and Siewiorek, ’86]

• Scheduled DFGs – Interval Graphs

• [Kurdahi and Parker, ’87]

• Scheduled Cyclic DFGs – Circular Arc Graphs

• (NP-Complete)

• [Stok, ’92]

• Restrictions on Variable Lifetimes – Chordal Graphs

• [Springer and Thomas, ’94]

• Static Single Assignment Form – Chordal Graphs

• [Brisk et al. 2005/6], [Hack and Goos, 2005/6],

[Bouchez et al. 2005]