Linear Scan Register Allocation

Linear Scan Register AllocationMassimiliano Poletto, Vivek SarkarA Fast, Memory-Efficient Register Allocation Framework for Embedded SystemsSathyanarayanan Thammanur, Santosh Pande

Linear Scan Register Allocation • NOT based on graph coloring • faster than algorithms based on graph coloring • scans all the live ranges in a single pass, allocating registers to variables in a greedy fashion. • useful in situations where both compile time & code quality are important

Linear Scan Register Allocation Program model – • intermediate representation that consists of RTL-like quads or pseudo-instructions. • Register candidates (live ranges) represented by an unbounded set of variable names or virtual registers. • variables are not live on entry to the start node.

Linear Scan Register Allocation Assumptions – • intermediate representation pseudo-instructions are numbered according to some order. • order in which pseudo-instructions appear in the intermediate representation. • depth first order • choice of instruction ordering does not affect correctness of the algorithm • may affect the quality of allocation.

Linear Scan Register Allocation Live Interval • [i,j] : live interval for variable v if there is no instruction with number j´ > j such that v is live at j´, and there is no instruction with number i´ < i such that v is live at i´ • conservative approximation of live ranges • there may be sub ranges [i,j] in which v is not live • trivial live range for any variable – [1,N]

Linear Scan Register Allocation The Linear Scan Algorithm • compute the live intervals. • live intervals are stored in a list sorted in order of increasing start point. • At each step, the algorithm maintains a list, active, of live intervals that overlap the current point and have been placed in registers. • active list is sorted in order of increasing end point.

LinearScanRegisterAllocation active {} foreach live interval i, in order of increasing start point ExpireOldIntervals(i) if length(active) =R then SpillAtInterval(i) else register[i] a register removed from pool of free registers add i to active, sorted by increasing end point ExpireOldIntervals(i) foreach interval j in active, in order of increasing end point if endpoint [j] ≥ startpoint [i] then return remove j from active add register[j] to pool of free registers SpillAtInterval(i) spill last interval in active if endpoint [spill] > endpoint [i] then register[i] register[spill] location[spill] new stack location remove spill from active add i to active, sorted by increasing end point else location[i] new stack location

Linear Scan Register Allocation An Example

Linear Scan Register Allocation Complexity • O(V), if R is constant • But R can be large! • worst case execution time complexity • dictated by time taken to insert into active • O(log R) for insertion, if balanced binary tree used • O(R), if linear search for insertion point • worst case complexity – O(V * R)

Linear Scan Register Allocation Evaluation • two different infrastructures – • one to measure compile-time performance, and • one to measure the run-time performance of the generated code. • ICODE infrastructure • to evaluate compile time performance • SUIF infrastructure • to evaluate run time performance

Linear Scan Register Allocation

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems • usage density based register allocator usage density : represents both frequency and density of uses. • geared towards embedded systems wherein speed, code size & memory requirements are of equal concern. • does not make use of live range and interval analysis.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Goal : optimize the following parameters - • speed of execution of generated code, • speed of the allocator, • size of the generated code, • size of the allocator, and • amount of memory required (memory footprint) during the allocation.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Graph-coloring based allocators – • summarize liveness info in terms of interference graph. • heuristically attempt to color the graph • quality of code produced is very efficient • cost (in terms of speed & space) increases as size of interference graph increases. • prioritize the quality of generated code over speed of compilation & memory requirements.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Linear scan register allocation – • tries to detect and resolve conflicts locally • operates faster than graph coloring • suffers from code quality • a spilled variable cannot be reassigned to a register • memory requirements lower than graph coloring based allocators • still has quadratic memory requirements due to the need to maintain live intervals.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Tradeoffs in Register Allocation – is it necessary to expend effort in finding the live ranges and forming live intervals in order to make good spill decisions? combine the effects of frequencies of references and their density/sparsity to emulate the notion of interfering live intervals =>usage density. usage density information : linear in terms of program size, reducing memory demands during allocation.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Overview • keep track of the usage density of variables at any given point of a program • allocate registers to variables that have a high usage density. • Usage density of a variable x at any point p is the ratio of the total number of uses of a value since its last definition to the average distance between the uses. • keep the variables with highest usage densities in registers until that point

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Usage density based register allocation – • traverse the CFG in topological order. • usage information about each variable at different program points is maintained in a table called the usage density table • for each variable, last use statement, total number of uses since the last definition, average distance between the uses, basic block where used last, and usage density are maintained.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems calculating the usage information – • traverse the CFG in topological order • for each definition of a variable, • reset to zero- • total number of uses • average distance • last use basic block • set last use to current instruction label • initialize usage density to zero

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems • for each use of a variable, update the following – • last use: this is updated to the statement number corresponding to the statement where the use of the variable occurred. • total distance: This is the distance between the last use and the corresponding definition(s). If there is only one definition, the distance is just the total number of instructions that elapse between the definition and the use. For multiple ones, the average distance is calculated from the definition points to the join point where the corresponding SSA merged definition is located. From this point, simple distance is calculated to each of the uses and is added to the average distance found earlier to get the total distance. • total number of uses: This is incremented by 1. • average distance: This is updated to ratio of total distance to total number of uses. • usage density: This is updated to the ratio of total number of uses to average distance. • active window: An active window of a variable is a program point until which its usage density would remain equal to or higher than its current value if a use of that variable were to occur within that window.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems The Algorithm at a program point p if _use(p) = x if(r = _free_register()) != Φ _allot(r,x) else for all y Є V, s.t y is in a register if p !Є _active_window(y) _update_usg_dens(y) min = _min_usg_dens(V) if _usg_dens(x) > _usg_dens(min) _allot(_reg(min),x) endif endif if _def(p) = x _reset(x)

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems An Example

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

LEMMA 1.The active window d(p) of a variable at program point p of its use must obey the relation d(p) ≤ (2ad(p) + 1/ud(p)), where ad(p) is the average distance and ud(p) is the usage density at program point p. COROLLARY 1. For simplicity of calculation, it is safe to use an active window size equal to twice the average distance at program point p. Implications of corollary 1 – calculate usage densities of variables only at points of its uses at other points, calculate usage densities only on demand recalculate usage densities of only those variables for which current program point is outside their active windows. spill the variables with minimum usage densities. A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Evaluation and Comparison Performance evaluation was done with respect to the following parameters: • the compile time needed by the allocator, • the execution time of the generated code, • size of the generated code, including the number of loads/stores generated • size of the allocator itself, and • the amount of dynamic memory required during the allocation for different benchmark suites. • All experiments were carried on an unloaded Sun Ultra 5 Workstation. • Times measured are the sums of system and user times returned by the UNIX getrusage system call.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Speed and Code Quality –

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Memory Requirements – We evaluate the space efficiency of each of the allocators by comparing each of the following: • static instructions generated by each of the allocators for the various benchmarks, • dynamic memory required for the operation of each of the allocators, and • binary size of the allocators.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Conclusion - • usage density based allocation is a simple, fast technique for embedded systems. • keeps the compilation time close to that of linear scan • The usage density of a variable is an indicator of the frequency as well as the distribution of the uses of the variable at a program point and allows performing effective register allocation without the use of traditional live range or live interval information.

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems • The memory requirements in terms of code size generated, size of the allocator, and amount of dynamic memory utilized for its operation is less than that needed for other allocators. • The amount of information used by the usage density algorithm is linearly proportional to program size • The algorithm allows lazy computation of usage densities using the property of an active window,

A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Thank You

Linear Scan Register Allocation

Linear Scan Register Allocation

Presentation Transcript

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register Allocation

Register allocation

Register Allocation

Register Allocation