A Global Progressive Register Allocator

A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu

eax ebx ecx edx esi edi esp ebp Register Allocation Problem unbounded number of program variables limited number of processor registers + slow memory spill code optimization … v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); … register preferences rematerialization register allocator live range splitting memory operands

fully utilize machine description explicit and expressive model of costs of allocation for given architecture optimal solutions A More Principled Register Allocator reg alloc machine description

Multi-commodity Network Flow: An Expressive Model • Given network (directed graph) with • cost and capacity on each edge • sources & sinks for multiple commodities • Find lowest cost flow of commodities • NP-complete for integer flows b a Example: edges have unit capacity 1 0 b a

a a r0 r0 r1 r1 mem mem 1 1 Register Allocation as a MCNF Variables  Commodities Variable Definition  Source Variable Last Use  Sink Nodes  Allocation Classes (Reg/Mem/Const) Registers Limits  Node Capacities Spill Costs  Edge Costs Allocation  Flow r1 mem 1 3 Also need anti-variables to model persistent memory

Example load cost Source Code int example(int a, int b) { int d = 1; int c = a - b; return c+d; } insn pref cost Pre-alloc Assembly MOVE 1 -> d SUB a,b -> c ADD c,d -> c MOVE c -> r0 mem access cost

Split Normal Merge a: %eax a: %eax a: mem a: mem a: mem Control Flow • MCNF can only represent straight-line code • need to link together networks from basic blocks New nodes to handle block entry/exit constraints a: %eax a: mem

fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions NP-hard, so use progressive solution technique reg alloc machine description Technique: Lagrangian relaxation directed allocators Allocation Quality Compile Time A More Principled Register Allocator

Solution Procedure • Compute Lagrangian prices using iterative subgradient optimization • guaranteed converge to “optimal” prices • for linear relaxation of the problem • Prices used by allocator to find solution • solution improves as prices converge • two allocators • iterative heuristic allocator • simultaneous heuristic allocator

Solution Procedure • Advantages • iterative nature  progressive • Lagrangian relaxation theory provides means for computing a good lower bound • Can compute optimality bound • Disadvantages • No guarantee of finding optimal solution • Optimality bound poor if integrality gap large 99% of the time integrality gap = 0

a b c d 0 4 0 -2 Iterative Heuristic Allocator Edges to/from memory cost 3 Allocation order: a, b, c, d Cost: Total: 2

X X Simultaneous Heuristic Allocator Edges to/from memory cost 3 Current cost: -1 -3 -2

Evaluation • Implemented in gcc 3.4.3 targeting x86 • Optimize for code size • perfect static evaluation • important metric in its own right • MediaBench, MiBench, Spec95, Spec2000 • over 10,000 functions

default allocator: 1121 graph allocator: 1422 • CPLEX Progressiveness

graph allocator • default allocator • CPLEX Progressiveness

Progressive! Code Size

Optimality Proven maximum distance from optimal Proven optimality

10x slower Compile Time Slowdown :-(

fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions approach optimality using progressive solution technique: Lagrangian directed allocators reg alloc machine description A More Principled Register Allocator

? Questions?

Accuracy of the Model Global MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled

Code Size

Compile Time Asymptotic Complexity one iteration: O(nv)

Code Performance

Compile Time Slowdown :-( 10x slower

A Global Progressive Register Allocator

A Global Progressive Register Allocator

Presentation Transcript

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Kernel Memory Allocator

A Locality-Improving Dynamic Memory Allocator

A Global Progressive Register Allocator

Memory Allocator Security

Register a meter

Memory Allocator Attack and Defense

EQUIPMENT LEASING AND FINANCE A Progressive, Global Industry

register a business

Momentum Global Growth Allocator

FlexCard Reviewer and Allocator Training

Progressive Register Allocation for Irregular Architectures

Kernel Memory Allocator

Progressive Global Resources

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Register A Domain

Global Heat Cost Allocator Production Market Report 2020

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Kernel Memory Allocator

A Locality-Improving Dynamic Memory Allocator