Register Pressure in Instruction Level Parallelism. TOUATI SidAhmedAli. Outline. Prologue Part one : Basic Blocks Part two : Simple Innermost Loops Epilogue. Memory Bottleneck.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
TOUATI SidAhmedAli
From [Lin et al 01], in HPCA 2001. Simulated performance on an Alpha 21364 processor (1.6Ghz). Recent Compaq compiler (peakoptimization compiler flags).
Thesis defense
To tolerate
Thesis defense
Scheduling +
Register Allocation
We do not advocate this method
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Register
Constraints
Register Pressure Management
Modified DDG
Register Allocation
Code Scheduling
First Strategy : Register Pressure ManagementMinimize
Critical Path
Increase
Thesis defense
Register
Constraints
Early Register Allocation
Allocated DDG
Code Scheduling
Second Strategy : Schedule Independent Register AllocationMinimize
Critical Path
Increase
Thesis defense
Thesis defense
+
+
1
2
3
+
4
5
+
+
6
7
st
8
+
+
9
10
ld
11
12
ld
Local Register Requirement+
+
x
+
+
+
st
+
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
+
x
+
+
+
+
+
ld
Killing Function...+
+
x
+
+
+
+
st
+
ld
Killing function
Disjoint Value DAG : interval order
Thesis defense
Thesis defense
Thesis defense
S
TT’
T’
T
Descendant
values
Descendant
values
Saturating Killing SetDescendant
values
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
+
x
+
+
+
+
+
Example of Early RARegister Allocation is
a minimal chain decomposition
ld
Thesis defense
Thesis defense
Thesis defense
RS and RF are analysed before ILP scheduling : the DAG becomes free from register constraints.
RS management maximizes the register requirement in order to minimize the # of introduced false dependences.
RF analysis enables to check if spill code is useless.
Our heuristics are nearly optimal (empirical results).
Thesis defense
Thesis defense
1st
cn
a
b


c

d


e
2 1 0
 c a
e  b
 d 
  
0
1
2
3
2nd
a
b


c

d


e
h
rn
L
3rd
a
b


c

d


e
h
h
h
Software Pipelining Motifiterations
time
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Thesis defense
h=4
v1
v2
v3
0
v3
v1
3
1
2
v2
Cyclic Register RequirementIt i
v1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
v3
It i+1
v1
v2
v3
It i+2
v1
v2
v3
v2
Thesis defense
Thesis defense
v1
v2
v3
h=4
h
h
In_fraction_of_h Intervalsv1
v2
v3
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
It i
It i+1
It i+
It i+
u
u
u
u
u


v
v
v
v
v
Motivating ExampleR
R1
R1
R2
Thesis defense
Thesis defense
Thesis defense
Thesis defense
iteration
Physical
registers
h
R1=…
h
r5
r4
r3
r2
r1
r0
Rotating Register FilesThesis defense
Theorem [Touati 2002]: if we fix statically the reuse arcs, computing the distances so as to minimize the register requirement under a fixed execution rate has a totally unimodular constraints matrix.
Thesis defense
Thesis defense
Hamiltonian SIRA needs at most one extra register than SIRA (under the same II) in very few cases.
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense
Thesis defense