- 92 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Philip Brisk' - abel-spence

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form

csda

Philip Brisk

Ajay K. Verma

Paolo Ienne

Outline

- Register Allocation Overview
- Interprocedural Register Allocation
- Related Work
- SSA Form With Launch and Landing Pads
- Optimal Solution
- Experimental Results
- Conclusion

Modeling Register Allocation

- For Procedure Pi…
- Build interference graph Gi = (Vi, Ei)
- Vi – One vertex for each variable
- Ei – Edge between each pair of interfering variables
- Two variables interfere if their lifetimes overlap

- Compute the chromatic number χ(Gi)
- Color assignment = Register assignment
- NP-Complete in general

- Build interference graph Gi = (Vi, Ei)

Local Interferences

- Local Interferences – Single Procedure
- Overlapping lifetimes
- Static Single Assignment (SSA) Form
- Interference graph is chordal

X

Y

X

Z

Y

Z

Y

Z

X

Y

X

Z

Global Interferences

- Global Interferences
- Variable V is live across a call to procedure P
- V interferes with EVERY local variable in P
- And all variables in all procedures reachable from P
- Must consider all paths through the Call Graph

- And all variables in all procedures reachable from P

Main:

V

Call P

V

P:

…

Call Q

…

Q:

…

Main

P

Q

Global Interferences and Recursion

- Fact:
- No register can hold a local variable across a recursive function call
- Runtime stack is required
- Some exceptions (e.g. static local variables)
- Ignored here

- No register can hold a local variable across a recursive function call
- Call Graph
- Compute strongly connected components (SCCs)
- Collapse each SCC into a single node
- Resulting “Augmented Component Graph” is acyclic

Interprocedural Register Allocation

- Interprocedural Interference Graph (IIG)
- Undirected graph G = (V, E)
- V – All variables in all procedures
- E – Local AND global interferences
- Compute chromatic number χ(G)

Related Work

- Interprocedural Register Allocation in HLS
- Color IIG with heuristic [Vemuri et al., TODAES ’02]
- IIG is large
- Polynomial heuristics are still slow

- Scalable Approach [Beidas and Zhu, ASP-DAC ’05]
- Color each procedure individually
- Use any heuristic you want
- Use any intermediate representation you want

- Propagate global interferences at call points
- IIG is never built

- Color each procedure individually

- Color IIG with heuristic [Vemuri et al., TODAES ’02]

Contribution

- Interprocedural register allocation
- Optimal, polynomial-time algorithm
- Scalable
- IIG is never built
- If built, it would be chordal

- Each Procedure colored individually
- SSA Form – interference graph is chordal

- Special case of [Beidas and Zhu, ASP-DAC ’05]
- Top-down color propagation
- Novel SSA-based intermediate representation
- Chordal color assignment (with offset)

- IIG is never built

P – Set of Procedures in App.

Pi

Pj

Pi – Procedure

ck – Call Point

ck

L(ck) – Set of variables live across ck

ck: Call Pj

…

Preallocation of Global Registers- Global registers hold variables that are live across procedure calls
- How many do we need?

Pi

Preallocation of Global Registers- Compute: δ – Number of variables live…
- At the entry of a procedure

- Across a call point

Procedure: Pi

ck: Call …

δ2

(δi is known)

…

δ1

δm

L(ck)

…

δi = MAX {δk}

1 ≤ k ≤ m

δk = δi + |L(ck)|

(i.e. Over all points that call Pi)

P1

P1

P1

P1

0

0

0

0

P2

P2

P2

P2

2

2

0

2

P1

P1

P1

P4

P3

P3

P3

P3

3

3

0

3

2

P4

P4

P4

2

0

2

c8

c10

c11

c7

c9

P5

P5

P5

0

6

6

P6

P6

P6

0

5

5

P2

P2

P2

P3

P3

P3

P4

P4

P4

c7

c7

c7

c7

0

1

1

1

c8

c8

c8

c8

2

0

2

2

c13

c13

c14

c14

c12

c12

c9

c9

c9

c9

3

3

0

3

c10

c10

c10

c10

2

2

2

0

P5

P5

P6

P6

c11

c11

c11

c11

5

5

5

0

c12

c12

c12

c12

5

5

5

0

δ10 = |L(c10)| + δ1

δ11 = |L(c11)| + δ1

δ5 = MAX{δ12, δ13}

δ9 = |L(c9)| + δ1

δ14 = |L(c14)| + δ4

δ2 = MAX{δ7, δ8}

δ8 = |L(c8)| + δ1

δ3 = MAX{δ9}

δ6 = MAX{δ11, δ14}

δ7 = |L(c7)| + δ1

δ13 = |L(c13)| + δ3

δ12 = |L(c12)| + δ2

δ4 = MAX{δ10}

c13

c13

c13

c13

6

6

0

6

δ10 = 2 + 0= 2

δ7 = 1 + 0 = 1

δ3 = MAX{3} = 3

δ6 = MAX{5, 4} = 5

δ4 = MAX{2} = 2

δ13 = 3 + 3 = 6

δ2 = MAX{1, 2} = 2

δ14 = 2 + 2 = 4

δ5 = MAX{5, 6} = 6

δ11 = 5 + 0= 5

δ8 = 2 + 0= 2

δ12 = 3 + 2 = 5

δ9 = 3 + 0= 3

c14

c14

c14

c14

4

0

4

4

i

δi

Exampleci

|L(ci)|

c7

1

1

c8

2

2

c7

c8

c8

c9

c10

c10

c11

c11

c7

c9

c9

3

3

c10

2

2

c11

5

5

c12

3

3

c13

c14

c12

c13

3

3

c14

2

2

δ1 = 0

Pi

P

Preallocation of Global Registers- When Procedure Pi is called..
- At most δi variables live across calls leading to Pi
- Holds for every path in the call graph

- How to ensure that all variables live across calls leading to Pi are assigned to the right register?

- At most δi variables live across calls leading to Pi

N = MAX {δi} – Number of global registers allocated

T = {T1, …., TN}

Launch and Landing Pads

- Procedure Pi calls Pj; (m = δi)
- Assign variables live across calls leading to Pi to T1…Tm
- Let ck be the call point; n = |L(ck)|
- Launch Pad
- Parallel copy placed before the call
(Tm+1…Tm+n) ψ(L(ck))

- Parallel copy placed before the call
- Landing Pad
- Copy the values back after the call
L(ck) ψ((Tm+1…Tm+n))

- Copy the values back after the call

- Launch Pad

Theoretical Consequences of Launch and Landing Pads

- Theorem:
- All global interferences involve at least one global register

- Corollary:
- Local variables in distinct procedures do not interfere

- Corollary:
- No local variable in “main” has a global interference

- Theorem:
- Every variable defined locally in Pi (m = δi)
- Interferes with global registers T1…Tm
- Does NOT interfere with global registers Tm+1, … TN
=> Can assign local vars in Pi to global registers Tm+1, … TN

- Every variable defined locally in Pi (m = δi)

Reducing the Chromatic Number

Procedure: A

V …

Call B

W …

… V

X …

… W

Y …

… X

Call B

… Y

Procedure: B

Z …

… Z

V

W

V

W

X

Y

Z

X

Y

Chromatic Number = 3

Reducing the Chromatic Number

Procedure: A

V …

T1Ψ(V)

Call B

V Ψ-1(T1)

W …

… V

X …

… W

Y …

… X

T1Ψ(Y)

Call B

Y Ψ-1(T1)

… Y

Procedure: B

Z …

… Z

V

T1

V

W

X

W

V

Y

X

Y

T1

Z

T1

Chromatic Number = 2

Pi

P

Characterizing the IIG- Theorem:
- T is a clique in the IIG

- Theorem:
- IIG is chordal

- Theorem:
- Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}

Example

CLIQUE

N = 6

T1

T2

T3

T4

T5

T6

G1

G2

G3

G4

G5

G6

δ1 = 0

δ2 = 2

δ3 = 3

δ4 = 2

δ3 = 6

δ6 = 5

Global interference

Tj interferes with each local variable in Gi

Coloring Algorithm

- Use SSA+LLP Form, but DON’T build the IIG
- For Pi colors in the range 1..δi are unavailable
- Color the local (chordal) interference graph Gi of Pi
- Complexity: O(Vi + Ei)

- For each vertex in Pi, replace color c with c + δi
- Complexity: O(Vi)

Experiments

- Applications taken from Mediabench and MiBench
- Written in C
- Compiled Using Machine SUIF

- Optimal color assignment
- Compare to heuristics
- Color Palette Propagation
- Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05]
- Heuristic Color Assignment [Matula and Beck, JACM ’83]

- Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05]

- Color Palette Propagation

Registers Allocated (Normalized to Optimal)

Runtime (Normalized to Optimal)

Runtime of Pegwit(Normalized to Optimal)

Limitations

- Global Variables
- Interfere with all variables in the program
- Lifetime can still be analyzed

- Static Local Variables
- Initialized on first access
- Hold their values across function calls

- Function Pointers
- Resolution is NP-Complete

Conclusion

- Inteprocedural register allocation in HLS
- Optimal, polynomial-time algorithm
- Uses SSA Form + Launch/Landing Pads
- IIG is a chordal graph
- Scalable – no need to build IIG
- Significantly faster than sub-optimal heuristics

- Optimal, polynomial-time algorithm
- A few limitations
- Global variables, local static variables
- Function pointers
- Resolution is NP-Complete

Related Work

- Register Allocation in HLS
- Clique Partitioning/Coloring Problem
- [Tseng and Siewiorek, ’86]

- Scheduled DFGs – Interval Graphs
- [Kurdahi and Parker, ’87]

- Scheduled Cyclic DFGs – Circular Arc Graphs
- (NP-Complete)
- [Stok, ’92]

- Restrictions on Variable Lifetimes – Chordal Graphs
- [Springer and Thomas, ’94]

- Static Single Assignment Form – Chordal Graphs
- [Brisk et al. 2005/6], [Hack and Goos, 2005/6],
[Bouchez et al. 2005]

- [Brisk et al. 2005/6], [Hack and Goos, 2005/6],

- Clique Partitioning/Coloring Problem

Download Presentation

Connecting to Server..