260 likes | 379 Views
This paper presents a comprehensive study of global progressive register allocation techniques, focusing on the unbounded number of program variables and limited processor registers. Utilizing Multi-Commodity Network Flow (MCNF) models, we explore solutions to the register allocation problem, addressing memory spill code optimization and iterative heuristics. Through the Lagrangian relaxation approach, we derive optimal solutions and assess their effectiveness in compiler implementation, specifically in GCC targeting x86 architecture. Results indicate significant improvements in allocation accuracy and code size, making this method a valuable addition to compiler optimization strategies.
E N D
A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu
eax ebx ecx edx esi edi esp ebp Register Allocation Problem unbounded number of program variables limited number of processor registers + slow memory spill code optimization … v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); … register preferences rematerialization register allocator live range splitting memory operands
fully utilize machine description explicit and expressive model of costs of allocation for given architecture optimal solutions A More Principled Register Allocator reg alloc machine description
Multi-commodity Network Flow: An Expressive Model • Given network (directed graph) with • cost and capacity on each edge • sources & sinks for multiple commodities • Find lowest cost flow of commodities • NP-complete for integer flows b a Example: edges have unit capacity 1 0 b a
a a r0 r0 r1 r1 mem mem 1 1 Register Allocation as a MCNF Variables Commodities Variable Definition Source Variable Last Use Sink Nodes Allocation Classes (Reg/Mem/Const) Registers Limits Node Capacities Spill Costs Edge Costs Allocation Flow r1 mem 1 3 Also need anti-variables to model persistent memory
Example load cost Source Code int example(int a, int b) { int d = 1; int c = a - b; return c+d; } insn pref cost Pre-alloc Assembly MOVE 1 -> d SUB a,b -> c ADD c,d -> c MOVE c -> r0 mem access cost
Split Normal Merge a: %eax a: %eax a: mem a: mem a: mem Control Flow • MCNF can only represent straight-line code • need to link together networks from basic blocks New nodes to handle block entry/exit constraints a: %eax a: mem
fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions NP-hard, so use progressive solution technique reg alloc machine description Technique: Lagrangian relaxation directed allocators Allocation Quality Compile Time A More Principled Register Allocator
Solution Procedure • Compute Lagrangian prices using iterative subgradient optimization • guaranteed converge to “optimal” prices • for linear relaxation of the problem • Prices used by allocator to find solution • solution improves as prices converge • two allocators • iterative heuristic allocator • simultaneous heuristic allocator
Solution Procedure • Advantages • iterative nature progressive • Lagrangian relaxation theory provides means for computing a good lower bound • Can compute optimality bound • Disadvantages • No guarantee of finding optimal solution • Optimality bound poor if integrality gap large 99% of the time integrality gap = 0
a b c d 0 4 0 -2 Iterative Heuristic Allocator Edges to/from memory cost 3 Allocation order: a, b, c, d Cost: Total: 2
X X Simultaneous Heuristic Allocator Edges to/from memory cost 3 Current cost: -1 -3 -2
Evaluation • Implemented in gcc 3.4.3 targeting x86 • Optimize for code size • perfect static evaluation • important metric in its own right • MediaBench, MiBench, Spec95, Spec2000 • over 10,000 functions
default allocator: 1121 graph allocator: 1422 • CPLEX Progressiveness
graph allocator • default allocator • CPLEX Progressiveness
Progressive! Code Size
Optimality Proven maximum distance from optimal Proven optimality
10x slower Compile Time Slowdown :-(
fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions approach optimality using progressive solution technique: Lagrangian directed allocators reg alloc machine description A More Principled Register Allocator
? Questions?
Accuracy of the Model Global MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled
Compile Time Asymptotic Complexity one iteration: O(nv)
Compile Time Slowdown :-( 10x slower