1 / 72

Introduction to Compilers

Introduction to Compilers. Ben Livshits. Based in part of Stanford class slides from http ://infolab.stanford.edu/~ullman/dragon/w06/w06.html. Organization. Really basic stuff Flow Graphs Constant Folding Global Common Subexpressions Induction Variables/Reduction in Strength

darice
Download Presentation

Introduction to Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Compilers Ben Livshits Based in part of Stanford class slides from http://infolab.stanford.edu/~ullman/dragon/w06/w06.html

  2. Organization • Really basic stuff • Flow Graphs • Constant Folding • Global Common Subexpressions • Induction Variables/Reduction in Strength • Data-flow analysis • Proving Little Theorems • Data-Flow Equations • Major Examples • Pointer analysis

  3. Compiler Organization

  4. Really Basic Stuff Flow Graphs Constant Folding Global Common Subexpressions Induction Variables/Reduction in Strength

  5. Dawn of Code Optimization • A never-published Stanford technical report by Fran Allen in 1968 • Flow graphs of intermediate code • Key things worth doing

  6. Intermediate Code for (i=0; i<n; i++) A[i] = 1; • Intermediate code exposes optimizable constructs we cannot see at source-code level. • Make flow explicit by breaking into basic blocks = sequences of steps with entry at beginning, exit at end.

  7. i = 0 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Basic Blocks for (i=0; i<n; i++) A[i] = 1;

  8. Induction Variables • x is an induction variable in a loop if it takes on a linear sequence of values each time through the loop. • Common case: loop index like i and computed array index like t1. • Eliminate “superfluous” induction variables. • Replace multiplication by addition (reduction in strength ).

  9. i = 0 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Example t1 = 0 n1 = 8*n if t1>=n1 goto … A[t1] = 1 t1 = t1+8

  10. Loop-Invariant Code Motion • Sometimes, a computation is done each time around a loop. • Move it before the loop to save n-1 computations. • Be careful: could n=0? I.e., the loop is typically executed 0 times.

  11. Example i = 0 i = 0 t1 = y+z if i>=n goto … if i>=n goto … t1 = y+z x = x+t1 i = i+1 x = x+t1 i = i+1

  12. Constant Folding • Sometimes a variable has a known constant value at a point. • If so, replacing the variable by the constant simplifies and speeds-up the code. • Easy within a basic block; harder across blocks.

  13. i = 0 n = 100 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Example t1 = 0 if t1>=800 goto … A[t1] = 1 t1 = t1+8

  14. Global Common Subexpressions • Suppose block B has a computation of x+y. • Suppose we are sure that when we reach this computation, we are sure to have: • Computed x+y, and • Not subsequently reassigned x or y. • Then we can hold the value of x+y and use it in B.

  15. a = x+y t = x+y a = t b = x+y t = x+y b = t c = x+y c = t Example

  16. t = x+y a = t t = x+y b = t c = t Example --- Even Better t = x+y a = t b = t t = x+y b = t c = t

  17. Data-Flow Analysis Proving Little Theorems Data-Flow Equations Major Examples

  18. An Obvious Theorem boolean x = true; while (x) { . . . // no change to x } • Doesn’t terminate. • Proof: only assignment to x is at top, so x is always true.

  19. As a Flow Graph x = true if x == true “body”

  20. Formulation: Reaching Definitions • Each place some variable x is assigned is a definition. • Ask: for this use of x, where could x last have been defined. • In our example: only at x=true.

  21. d2 d1 Example: Reaching Definitions d1: x = true d1 d2 if x == true d1 d2: a = 10

  22. Clincher • Since at x == true, d1 is the only definition of x that reaches, it must be that x is true at that point. • The conditional is not really a conditional and can be replaced by a branch.

  23. Not Always That Easy int i = 2; int j = 3; while (i != j) { if (i < j) i += 2; else j += 2; } • We’ll develop techniques for this problem, but later …

  24. d1 d2 d3 d4 d2, d3, d4 d1, d3, d4 d1, d2, d3, d4 d1, d2, d3, d4 The Flow Graph d1: i = 2 d2: j = 3 if i != j d1, d2, d3, d4 if i < j d3: i = i+2 d4: j = j+2

  25. DFA Is Sometimes Insufficient • In this example, i can be defined in two places, and j in two places. • No obvious way to discover that i!=j is always true. • But OK, because reaching definitions is sufficient to catch most opportunities for constant folding (replacement of a variable by its only possible value).

  26. Be Conservative! • (Code optimization only) • It’s OK to discover a subset of the opportunities to make some code-improving transformation. • It’s notOK to think you have an opportunity that you don’t really have.

  27. Example: Be Conservative boolean x = true; while (x) { . . . *p = false; . . . } • Is it possible that p points to x?

  28. Another def of x d2 As a Flow Graph d1: x = true d1 if x == true d2: *p = false

  29. Possible Resolution • Just as data-flow analysis of “reaching definitions” can tell what definitions of x might reach a point, another DFA can eliminate cases where p definitely does not point to x. • Example: the only definition of p is p = &y and there is no possibility that y is an alias of x.

  30. Reaching Definitions Formalized • A definition d of a variable x is said to reach a point p in a flow graph if: • Every path from the entry of the flow graph to p has d on the path, and • After the last occurrence of d there is no possibility that x is redefined.

  31. Data-Flow Equations --- (1) • A basic block can generate a definition. • A basic block can either • Kill a definition of x if it surely redefines x. • Transmit a definition if it may not redefine the same variable(s) as that definition.

  32. Data-Flow Equations --- (2) • Variables: • IN(B) = set of definitions reaching the beginning of block B. • OUT(B) = set of definitions reaching the end of B.

  33. Data-Flow Equations --- (3) • Two kinds of equations: • Confluence equations : IN(B) in terms of outs of predecessors of B. • Transfer equations : OUT(B) in terms of of IN(B) and what goes on in block B.

  34. {d1, d2} {d2, d3} Confluence Equations IN(B) = ∪predecessors P of B OUT(P) P1 P2 {d1, d2, d3} B

  35. Transfer Equations • Generate a definition in the block if its variable is not definitely rewritten later in the basic block. • Kill a definition if its variable is definitely rewritten in the block. • An internal definition may be both killed and generated.

  36. Example: Gen and Kill IN = {d2(x), d3(y), d3(z), d5(y), d6(y), d7(z)} Kill includes {d1(x), d2(x), d3(y), d5(y), d6(y),…} d1: y = 3 d2: x = y+z d3: *p = 10 d4: y = 5 Gen = {d2(x), d3(x), d3(z),…, d4(y)} OUT = {d2(x), d3(x), d3(z),…, d4(y), d7(z)}

  37. Transfer Function for a Block • For any block B: OUT(B) = (IN(B) – Kill(B)) ∪Gen(B)

  38. Iterative Solution to Equations • For an n-block flow graph, there are 2n equations in 2n unknowns. • Alas, the solution is not unique. • Use iterative solution to get the least fixed-point. • Identifies any def that might reach a point.

  39. Iterative Solution --- (2) IN(entry) = ∅; for each block B do OUT(B)= ∅; while (changes occur) do for each block B do { IN(B) = ∪predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪Gen(B); }

  40. IN(B1) = {} OUT(B1) = { IN(B2) = { d1, OUT(B2) = { IN(B3) = { d1, OUT(B3) = { Example: Reaching Definitions d1: x = 5 B1 d1} d2} if x == 10 B2 d1, d2} d2} d2: x = 15 B3 d2}

  41. Aside: Notice the Conservatism • Not only the most conservative assumption about when a def is killed or gen’d. • Also the conservative assumption that any path in the flow graph can actually be taken.

  42. Everything Else About Data Flow Analysis Flow- and Context-Sensitivity Logical Representation Pointer Analysis Interprocedural Analysis

  43. Three Levels of Sensitivity • In DFA so far, we have cared about where in the program we are. • Called flow-sensitivity. • But we didn’t care how we got there. • Called context-sensitivity. • We could even care about neither. • Example: where could x ever be defined in this program?

  44. Flow/Context Insensitivity • Not so bad when program units are small (few assignments to any variable). • Example: Java code often consists of many small methods. • Remember: you can distinguish variables by their full name, e.g., class.method.block.identifier.

  45. Context Sensitivity • Can distinguish paths to a given point. • Example: If we remembered paths, we would not have the problem in the constant-propagation framework where x+y = 5 but neither x nor y is constant over all paths.

  46. The Example Again x = 2 y = 3 x = 3 y = 2 z = x+y

  47. An Interprocedural Example int id(int x) {return x;} void p() {a=2; b=id(a);…} void q() {c=3; d=id(c);…} • If we distinguish p calling id from q calling id, then we can discover b=2 and d=3. • Otherwise, we think b, d = {2, 3}.

  48. Context-Sensitivity --- (2) • Loops and recursive calls lead to an infinite number of contexts. • Generally used only for interprocedural analysis, so forget about loops. • Need to collapse strong components of the calling graph to a single group. • “Context” becomes the sequence of groups on the calling stack.

  49. Example: Calling Graph t Contexts: Green Green, pink Green, yellow Green, pink, yellow r s p q main

  50. Comparative Complexity • Insensitive: proportional to size of program (number of variables). • Flow-Sensitive: size of program, squared (points times variables). • Context-Sensitive: worst-case exponential in program size (acyclic paths through the code).

More Related