1 / 32

Register Allocation and Spilling via Graph Coloring

Register Allocation and Spilling via Graph Coloring. G. J. Chaitin IBM Research, 1982. Motivation. Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers

komala
Download Presentation

Register Allocation and Spilling via Graph Coloring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982

  2. Motivation • Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers • The symbolic registers must be mapped to real registers in a way that avoids conflicts • Symbolic registers that cannot be mapped to real registers must be spilled to memory • We need an algorithm to map registers with minimal spilling cost

  3. Paper Overview • Register allocation overview • Subsumption algorithm • Interference graph coloring algorithm • Spilling algorithm

  4. Register Allocation Steps • Determine which registers are live at any point in the intermediate language (IL) program • Build a register interference graph • Nodes represent symbolic registers • Edges represent a conflict between symbolic registers • Subsumption: eliminate unnecessary register copies • Find a 32-coloring of the interference graph • Decide which registers to spill if necessary

  5. Subsumption • If the source and destination of a register copy do not interfere, they may be coalesced into a single node • For each register copy in IL, determine whether the registers interfere • If not, coalesce the two nodes into one • After first pass, rewrite IL code • Repeat until no more coalescing is possible

  6. Subsumption Example A B C D

  7. Subsumption Example AD BC

  8. Finding a 32-Coloring • Each symbolic register is assigned a color representing a real register • If no adjacent nodes have the same color, then the coloring succeeds • Assume that G has a node N with degree < 32 • Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32-colorable • Algorithm throws away nodes of degree < 32 until all nodes have been removed • Algorithm fails if no node has degree < 32

  9. 3-coloring example A B C D

  10. Spilling • If the 32-coloring fails, then nodes must be spilled to memory • Spilled registers are stored to memory, then loaded momentarily when their results are needed • Every time spill code is generated, the interference graph must be rebuilt • Usually recoloring succeeds after spilling, but sometimes several passes are required

  11. Spilling • NP-Complete problem • Heuristic: spill the node that minimizes • Cost of spilling / Degree of node • Cost of spilling • (number of definition points + number of use points) * frequency of each point • In some cases, spilled node can be reloaded for an extended interval

  12. Conclusion • The graph coloring and spilling algorithms should produce faster code • The register allocation algorithm is efficient • Graph coloring is (N) • But uses (N2) space

  13. Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993

  14. Motivation • Single assignment languages simplify dependency checking • Which simplifies automatic detection and exploitation of parallelism • But single-assignment languages require a large number of copies • Previous implementations eliminate copies at runtime • Increased efficiency if copies can be eliminated at compile time

  15. Paper Overview • Single-assignment languages • Code generation • Compile-time copy elimination techniques • Substitution • Pattern matching • Substructure sharing • Substructure targeting • Results – success! • Eliminated all copies in bubble sort

  16. Single-assignment languages • Functional languages (LISP, Haskell, SISAL) • Simpler dependency checking • True dependencies – write, read • b = f(c), a = f(b) • Anti-dependencies – read, write • a = f(b), b = f(c) • Output dependencies – write, write • a = f(b), a = f(c) • Aliasing • caused by pointers, array indexes • To avoid aliasing, all inputs and outputs are passed by value

  17. Example – Swap(A,i,j) Input • Data flow diagram • Edges transport values • Simple nodes are operations • Pick any feasible node evaluation order at random • Naïve implementation • Each edge has its own memory • Swap uses 5 array copies! • Optimized implementation • Swap array updates are done in-place AElement AElement AReplace AReplace

  18. Example: BubbleSort(A) • Compound nodes represent control flow • Loops are implemented using recursion to avoid multiple assignment of the iteration variable • Naïve implementation • Bubble sort requires (n2) array copies • Optimized implementation • All array updates are done in place • But parallelism is decreased

  19. Code Generation Overview • Input is from compiler front-end • IF1: intermediate data-flow graph representation • Code generator eliminates copies • Output is in C • Compiled into machine code using an optimized C compiler

  20. Vertical Substitution Input • If input and output have the same type and size, they can share memory • Updates are done in-place 1 AElement 2 AElement 3 AReplace 4 AReplace

  21. Horizontal Substitution Input • If an output has several destinations, the output edges can share memory 1 AElement 2 AElement 3 AReplace 4 AReplace

  22. Horizontal and Vertical Substitution • Horizontal and vertical substitution can interfere with each other • A node along the substitution chain modifies the shared object before its last use • Edges can be marked as read-only if they are shared and this is not the last use

  23. Horizontal and Vertical Substitution Input Input 1 AElement 2 AElement 1 AElement 3 AElement 3 AReplace 2 AReplace 4 AReplace 4 AReplace

  24. Interprocedural Substitution • Previous discussion concerned simple nodes that can be analyzed at compiler design time • Information about a function is needed in order to use substitution • Does the function modify an input? • Will an input be chained to an output?

  25. Intersubgraph Substitution • Substitution analysis is done for each construct • Same basic principles

  26. Determining the Evaluation Order • Evaluation order can impact efficiency of substitution • Naïve implementation selects the next node to evaluate at random • Hints tell algorithm which nodes should be evaluated before and after other nodes if possible • Hints are ad hoc?

  27. Pattern Matching • Replace hard-to-optimize pieces of code • Patterns are language-specific • Patterns are detected using “ad hoc” methods

  28. Substructure Sharing • Allow substructures to be referenced without copies • AElement can be treated as a NoOp • Happens after substitution analysis – less important • Same principles as substitution analysis

  29. Substructure Targeting • Allow structures to be built from substructures without copies • Similar to substructure sharing

  30. Results • Compared optimizations versus naïve implementation • Optimization eliminate all copies for bubble sort • Informal comparison to run-time optimizer shows improvements

  31. Results

  32. Conclusions • Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. • Copy elimination no longer has to be done at run-time. • Single assignment languages should be more efficient for parallel programs.

More Related