190 likes | 281 Views
WCET-aware Register Allocation based on Integer-Linear Programming. Heiko Falk, Norman Schmitz, Florian Schmoll TU Dortmund Computer Science 12 Design Automation for Embedded Systems. Outline. Introduction State of the Art in Compiler Design Register Allocation
E N D
WCET-aware Register Allocationbased onInteger-Linear Programming Heiko Falk, Norman Schmitz, Florian Schmoll TU Dortmund Computer Science 12 Design Automation for Embedded Systems
Outline • Introduction • State of the Art in Compiler Design • Register Allocation • Traditional ILP-based Register Allocation • ILP Model • Limitations • WCET-aware Register Allocation using ILP • Model of the WCET • Model of Pipeline-Related Spill Costs • Results • Summary & Future Work
Current State of the Art in Compiler Design Objective Function of Compiler Optimizations • Usually reduction of Average-Case Execution Times (ACET): Accelerate a “typical” execution of a program using “typical” input data • No statements about WCETs possible Optimization Strategy • Naive: Current compilers lack precise ACET timing model • Application of an optimization if “promising” • Effect of optimizations on a program’s ACET fully unknown to the compiler itself. • ACET-optimizations not useful for WCET minimization
Register Allocation Goals • Considered the most important compiler optimization • Registers are fastest and most efficient memories • Register Allocation should make optimal use of registers Tasks • Assembly code before register allocation: virtual registers (VREGs) • Map all (potentially many) VREGs to (usually few) physical registers (PHREGs) of a processor • Insert memory loads and stores (spill code) whenever VREGs don’t fit into the register file
Well-Known Register Allocators Graph Coloring • De-facto standard approach nowadays • Heuristics decide about allocation and spill code generation • Fast approach of moderate complexity • Spill heuristic might lead to poor code quality [P. Briggs, Register Allocation via Graph Coloring, 1992] [D. W. Goodwin, K. D. Wilken, Optimal and Near-optimal Global Register Allocation Using 0-1 Integer Programming, 1996] Register Allocation via Integer- Linear Programming (ILP) • Formal mathematical model of allocation and spilling • Achieves minimal spill code overhead, i.e. minimizes total number of spill instructions • Relatively high complexity, but optimal quality
Traditional ILP-based Register Allocation Allocation decisions Variables , and map VREGs to PHREGs Spilling decisions Constraints Guarantee correctness of allocation and spilling decisions, e.g. • ensure that each VREG is assigned to at least one PHREG, • that at most one VREG can be assigned to a single PHREG, • ...
Traditional ILP-based Register Allocation Objective Function • Minimizes spill code-related overhead • Under the assumption: • Each spill instruction contributes by same constant amount to objective function • Example: minimization of spill-related code size
WCET Minimization via ILP-based Allocation? Limitation of the traditional approach • Assumption: • Each spill instruction contributes by same constant amount to objective function • Assumption only holds for trivial objectives like e.g. code size Challenges • How to model and minimize Worst-Case Execution Time (WCET) as non-trivial objective? • How to deal with complex processor pipelines executing spill instructions in parallel with other code?
Challenge 1: ILP Model of the WCET The Worst-Case Execution Path (WCEP) • WCET of a program = Length of the program’s longest execution path (WCEP) • WCET Minimization: Optimization of only those parts of a program lying on the WCEP • Code optimization apart the WCEP will not reduce WCET • Only those spill-related decision variables must contribute to the ILP’s objective function that actually lie on the WCEP. • But: Spilling decisions affect WCET of basic blocks and thus the WCEP within a program. • How to model the WCEP via ILP depending on spill-related decision variables?
Spill Code-dependent Costs • Costs of basic block : • models WCET of depending on the WCET of potentially inserted spill code • WCET without any spill code, plus WCET of all spill code inside
Intraprocedural Control Flow • Modeling of a function’s control flow: Acyclic sub-graphs: (Reducible) Loops: • Treat body of inner-most loop like acyclic sub-graph • Fold loop • Costs of : • Continue with next innermost loop A A B B C Loop L B, C, D C D D E E = WCET of longest path starting at A
Objective Function • WCET of entire function: • Each function has dedicated entry block • Variable models WCET of longest path within starting at • Variable models WCET of entire function
add d0,d1,d2; # d0 = d1 + d2 ld d0,[a0]; # d0 = mem[a0] Challenge 2: Pipeline-Related Spill Costs Example: The Infineon TriCore Pipelines • Integer I-Pipeline: Executes usual integer ALU instructions • Load/Store LS-Pipeline: Executes memory loads/stores and address arithmetic • Ideal case: One I- and one LS-instruction executed in parallel within same clock cycle • However... (Some even more subtle cases of the TriCore pipelines omitted here…) I-instruction LS-instruction WAW hazard (write after write) Stalled by 1 cycle
add d0,d1,d2; # i: d0 = d1 + d2 ld d0,[a0]; # s: d0 = mem[a0] ILP Example for Costs of Spill Instruction s st [a1],d1; # i: mem[a1] = d1 ld d0,[a0]; # s: d0 = mem[a0] Case 1 • If is LS-instruction: • . costs 1 cycle if is actually generated: Case 2 • If is spill-loadand is I-instruction: • . costs 1 cycle if is actually generatedand WAW hazard between and exists via PHREG :
Results – Worst-Case Execution Times [H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009] 98% x2 80% 19% • Compiler: WCC at optimizationlevel -O3 (42 optimizations) • Target Processor: TriCore TC1796 • 100%: WCETEST using Graph Coloring
Results – Average-Case Execution Times • Compiler: WCC at optimizationlevel -O3 (42 optimizations) • Target Processor: TriCore TC1796 • 100%: ACET using Graph Coloring
Results – CPU Runtimes ILP-based Allocator • Runtimes range from 1 CPU second to 54:08 CPU minutes • Including WCET analysis and ILP solver • Average runtime for 55 benchmarks: 3:33 CPU minutes WCET-aware Graph Coloring • Average runtime for 55 benchmarks: 4:13 CPU minutes • Reason: Performs a costly WCET analysis after register allocation for each individual basic block
Summary & Future Work Summary • Current state of the art: Compilers are unaware of timing, naive optimization strategies • Standard register allocators unaware of worst-case properties • May thus lead to spill code generation along WCEP • WCET-aware ILP-based register allocation: Sophisticated models of WCET and pipeline-related spill costs • Average WCET reductions over 55 benchmarks: 20.2% • Outperforms WCET-aware graph coloring by factor 2 Future Work • Reduce runtimes of ILP-based register allocator • Improve code quality further by integrating rematerialization