1 / 40

CSE P501 – Compiler Construction

CSE P501 – Compiler Construction. Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring. k. IR typically assumes infinite number of virtual registers Real machine has k physical registers available x86: k = 8 x64: k = 16 Goals

alesia
Download Presentation

CSE P501 – Compiler Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring Jim Hogg - UW - CSE - P501

  2. k • IR typically assumes infinite number of virtual registers • Real machine has k physical registers available • x86: k = 8 • x64: k = 16 • Goals • Produce correct code that uses k or less registers • Minimize added loads and stores • Minimize space needed for spilled values • Do this efficiently – O(n), O(n log n), maybe O(n2) Jim Hogg - UW - CSE - P501

  3. Register Allocation • At each point in the code, pick which values to keep in registers • Insert code to move values between registers and memory • No additional transformations – scheduling already ran • But will usually re-run scheduling afterwards (to include the effects of spilling) • Minimize inserted save/restores ("spill" code) Jim Hogg - UW - CSE - P501

  4. Allocation vs Assignment • Allocation: which values to keep in registers? • Assignment: which specific registers? • Compiler must do both Jim Hogg - UW - CSE - P501

  5. Local Register Allocation • Applied to basic blocks (ie: "local") • Produces good register usage inside a block • But can have inefficiencies at boundaries between blocks • Two variations: top-down, bottom-up Jim Hogg - UW - CSE - P501

  6. "Top-down" Local Allocation • We assume a RISC target machine ("load/store architecture") • load operands into two registers (if not already there) • perform instruction (eg: add, lshift) • store result back to memory (if required) • ie: operands for instructions MUST be in registers, rather than memory • (note, by contrast, that x86 allows operands directly from memory) • Principle: keep most heavily used values in registers • Priority = # of times virtual-register used (destination and/or source) in block • If more virtual registers than physical(the common case) • Reserve some registers for values allocated to memory • Need enough to address and load two operands and store result (R1 = R2 + R3) • Other registers dedicated to “hot” values • But are tied up for entire block with particular value, even if only needed for part of the block Jim Hogg - UW - CSE - P501

  7. "Bottom-up" Local Allocation, 1 • Keep a list of free physical registers • initially all physical registers at beginning of block • except frame-pointer, stack-pointer - pre-allocated • Scan code, instruction-by-instruction • Allocate a register when one is needed • if one is free, allocate it • otherwise, pick a victim - eg: VR3 using R7 • save contents of R7 • allocate R7to requesting virtual register • Emit instructions to use physical register • Free physical register as soon as possible • In x = y op z, free y and z if no longer needed before allocating x Jim Hogg - UW - CSE - P501

  8. Bottom-up Local Allocation, 2 • How to pick a victim? • Scan ahead thru code • Look for next use of each currently-allocated-to virtual register • Pick the virtual register whose next use is furthest off. Why? • Save the victim register's to memory (restore later) Example: CPU with 4 available registers Jim Hogg - UW - CSE - P501

  9. Local "bottom-up" Register Allocation, -1 • ; load v2 from memory • ; load v3 from memory • v1 = v2 + v3 • ; load v5, v6 from memory • v4 = v5 - v6 • v7 = v2 - 29 • ; load v9 from memory • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 • Still in LIR. So lots (too many!) virtual registers required (v2, etc). • Grey instructions (1,2,4,7) load operands from memory into virtual registers. • We will ignore these going forward. Focus on mapping virtual to physical. Jim Hogg - UW - CSE - P501

  10. Local "bottom-up" Register Allocation, 0 vRegNextRef pRegvReg v1 1 v2 1 v3 1 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 - R2 - R3 - R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 Jim Hogg - UW - CSE - P501

  11. Local "bottom-up" Register Allocation, 1 vRegNextRef pRegvReg v1 1 v2 13 v3 16 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3 R3 v1 R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 Jim Hogg - UW - CSE - P501

  12. Local "bottom-up" Register Allocation, 2 vRegNextRef pRegvReg v1  v2 3 v3 6 v4 25 v5 2 v6 25 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3v4 R3 v1v6 R4 v5 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no - still clean R2 = R4 - R3 Jim Hogg - UW - CSE - P501

  13. Local "bottom-up" Register Allocation, 3 vRegNextRef pRegvReg v1  v2 3 v3 6 v4 5 v5  v6 5 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v4 R3 v6 R4 v5v7 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no! R2 = R4 - R3 ; spill R4? - no! R4 = R1 - 29 And so on . . . Jim Hogg - UW - CSE - P501

  14. Bottom-Up Allocator • Invented about once per decade • Sheldon Best, 1955, for Fortran I • LasloBelady, 1965, for analyzing paging algorithms • William Harrison, 1975, ECS compiler work • Chris Fraser, 1989, LCC compiler • Vincenzo Liberatore, 1997, Rutgers • Will be reinvented again, no doubt • Many arguments for optimality of this Jim Hogg - UW - CSE - P501

  15. Global Register Allocation • Standard technique is graph coloring • Use control-graph and dataflow-graph to derive interference graph • Nodes are live ranges (not registers!) • Edge between nodes when live at same time • Connected nodes cannot use same register • Then color the nodes in the graph • Two nodes connected by an edge may not have same color (ie: be allocated to same physical register) • If more than k colors are needed, insert spill code Jim Hogg - UW - CSE - P501

  16. Graph Coloring, 1 1 2 3 4 5 • Assign a color to each node, such that • No adjacent nodes have same color Jim Hogg - UW - CSE - P501

  17. Graph Coloring, 2 1 2 3 4 5 • 2 colors are enough for this graph • called a 2-coloring • this graph's chromatic number = 2 Jim Hogg - UW - CSE - P501

  18. Graph Coloring, 3 1 1 2 2 3 4 3 4 5 5 small changes forces a 3rd color Jim Hogg - UW - CSE - P501

  19. Live Ranges • Live Range of a variable is between a def and all subsequent possible uses • if a < 10, then a is live lines 1..5 • if a < 20, then a is live lines 1..10 • otherwise, a is live lines 1..14 • Mmm - clearly nonsense! We need a flowgraph for a correct specification • Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half-way thru' the instruction • This example is simplified: variables are never re-def'd. And there are no temps 1 a = ... 2 b = ... 3 c = ... 4 a = a + b + c 5 if a < 10 then 6 d = c + 9 7 print(c) 8 elseif a < 20 then 9 e = 10 10 d = e + a 11 print(e) 12 else 13 f = 12 14 d = f + a 15 print(f) 16 endif 17 print(d) Jim Hogg - UW - CSE - P501

  20. Live Ranges in Flowgraphs a a = ... b = ... c = ... a = a + b + c b c a = ... b = ... c = ... a = a + b + c if a < 10 then d = c + 9 print(c) elseif a < 20 then e = 10 d = e + a print(e) else f = 12 d = f + a print(f) endif print(d) a < 10 d = c + 8 print(c) a < 20 e = 10 d = e + a print(e) f = 12 d = f + a print(f) print(d) Jim Hogg - UW - CSE - P501

  21. Register Interference Graph b f a d e c • If live range of 2 variables overlap, or interfere, they cannot use same register • Eg: a and b cannot both be stored in register R5 • Eg: c and ecan be stored in register R5 (at different, non-overlapping times) • How many registers will we need? - Guesses? Jim Hogg - UW - CSE - P501

  22. Interference Graph - Coloring b f a Node Degree a 3 b 2 c 3 d 3 e 2 f 2 d e c • Problem: color this interference graph, where each color represents a different physical register • Solution: simplify graph: recursively remove easiest nodes (lowest degree) Jim Hogg - UW - CSE - P501

  23. Interference Graph - Remove b b f a Node Degree a 2 b 2 c 2 d 3 e 2 f 2 d e c Removed: b Jim Hogg - UW - CSE - P501

  24. Interference Graph - Remove a b f a Node Degree a 2 b 2 c 1 d 3 e 1 f 1 d e c Removed: b a Jim Hogg - UW - CSE - P501

  25. Interference Graph - Remove c b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c Jim Hogg - UW - CSE - P501

  26. Interference Graph - Remove e b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c e Jim Hogg - UW - CSE - P501

  27. Interference Graph - Color! b f a d e c Removed: b a c e Jim Hogg - UW - CSE - P501

  28. Interference Graph - Add e b f a d e c Removed: b a c Jim Hogg - UW - CSE - P501

  29. Interference Graph - Add c b f a d e c Removed: b a Jim Hogg - UW - CSE - P501

  30. Interference Graph - Add a b f a d e c Removed: b Jim Hogg - UW - CSE - P501

  31. Interference Graph - Add b b f a d e c • 3 colors are enough • eg: red = R1, green = R2, blue = R3 • If less registers available - need to spill Jim Hogg - UW - CSE - P501

  32. Live Ranges: More Realistic Example Register Interval • varp = ... • vw = [varp + 0] • v2 = 2 • vx = [varp + @x] • vy = [varp + @y] • vz = [varp + @z] • vw = vw * v2 • vw = vw * vx • vw = vw * vy • vw = vw * vz • [varp + 0] = vw varp 1..11 vw 2..7 vw 7..8 vw 8..9 vw 9..10 vw 10..11 v2 3..7 vx 4..8 vy 5..9 vz 6..10 In this example, virtual registers are re-def'd. Eg: vw participates in 5 live ranges Eg: vw [7..8] interferes with vx [4..8] but not vw [9..10] But no control flow!

  33. Build Live Ranges, 1 • Use dataflow information to build interference graph • Nodes = live ranges • Add an edge for each pair of live ranges that overlap • Beware copy instructions: • rx = ry does not create interference between rxand ry : • can use same register if the ranges do not otherwise interfere Jim Hogg - UW - CSE - P501

  34. Build Live Ranges, 2 • Construct the interference graph • Find live ranges – SSA • Build SSA form of IR • Live range initially includes single SSA name' • At a -function, form union of live ranges • Either rewrite code to use live range names or keep a mapping between SSA names and live-range names Jim Hogg - UW - CSE - P501

  35. Simplify & Color Algorithm Simplify while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node Color color the sole remaining node while stack is non-empty pop node from stack color newly expanded graph Jim Hogg - UW - CSE - P501

  36. Coalescing Live Ranges • Idea: if two live ranges are connected by a copy operation (rx = ry) do not otherwise interfere, then coalesce • Rewrite all references to rx to use ry • Remove the copy instruction • Then fix up interference graph Jim Hogg - UW - CSE - P501

  37. Advantages of Coalescing? • Makes the code smaller, faster (no copy operation) • Shrinks number of live ranges • Reduces the degree of any live range that interfered with both live ranges rx and ry • But: coalescing two live ranges can prevent coalescing of others, so ordering matters • coalesce most frequently executed ranges first (eg: inner loops) • Can have a substantial payoff – do it! Jim Hogg - UW - CSE - P501

  38. Overall Structure More Coalescing Possible Find live ranges Build int. graph Coalesce Spill Costs Find Coloring No Spills Spills Insert Spills Jim Hogg - UW - CSE - P501

  39. Real-Life Complications • Need to deal with irregularities in the register set • Some operations require dedicated registers • Eg: idivin x86 • Eg: split address/data registers in M68k • Register conventions like function results • Eg: use of registers across calls (return in EAX) • Different register classes • Eg: integer, floating-point, SIMD • Overlapping registers • Eg: x86 AL, AH, AX, EAX, RAX • Model by precoloring nodes, adding constraints in the graph, etc. Jim Hogg - UW - CSE - P501

  40. Graph Representation • Recall: • Register allocation is one of the most important optimizations • Finding optimum solution for typical interference graph will take a million years • So we use heuristics to find a good solution a million times each day • Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler) • Not unknown to have ~5k nodes and ~1M edges Jim Hogg - UW - CSE - P501

More Related