1 / 33

Data-Flow Analysis II

Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state

benjamin
Download Presentation

Data-Flow Analysis II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Flow Analysis II CS 671 March 13, 2008

  2. Data-Flow Analysis • Gather conservative, approximate information about what a program does • Result: some property that holds every time the instruction executes • The Data-Flow Abstraction • Execution of an instruction transforms program state • To analyze a program, we must consider all possible sequences of program points (paths) • Summarize all possible program states with finite set of facts • Limitation: may consider some infeasible paths

  3. The General Approach • Setting up and solving systems of equations that relate information at various points in the program • such as out[S] = gen[S] È ( in[S] - kill[S] ) where • S is a statement • in[S] and out[S] are information before and after S • gen[S] and kill[S] are information generated and killed by S • definition of in, out, gen, and kill depends on the desired information

  4. Data-Flow Analysis (cont.) • Properties: • either a forward analysis (out as function of in) or • a backward analysis (in as a function of out). • either an “along some path” problem or • an “along all paths” problem. • Data-flow analysis must be conservative • Definitions: • point between two statements (or before the first statements and after the last) • path is a sequence of consecutive points in the control-flow graph

  5. Example – Live Variables • Steps: • Set up live sets for each program point • Instantiate equations • Solve equations if (c) x = y+1 y = 2*z if (d) x = y+z z = 1 z = x

  6. Example • Program points L1 if (c) L2 L3 x = y+1 y = 2*z if (d) L4 L5 L6 L7 x = y+z L8 L9 z = 1 L10 L11 z = x L12

  7. Example L1 if (c) 1 L2 L3 x = y+1 y = 2*z if (d) 2 L4 3 L5 4 L6 L7 5 x = y+z L8 L9 z = 1 6 L10 L11 7 z = x L12

  8. in[I] = ( out[I] – def[I] )  use[I] out[B] = in[B’] B’  succ(B) Example L1 = L2 = L3 = L4 = L5 = L6 = L7 = L8 = L9 = L10 = L11 = L12 = L1 = { } if (c) 1 L2 = { } L3 = { } x = y+1 y = 2*z if (d) 2 L4 = { } 3 L5 = { } 4 L6 = { } L7 = { } 5 x = y+z L8 = { } L9 = { } z = 1 6 L10 = { } L11 = { } 7 z = x L12 = { }

  9. More Terminology • Successors • Succ(B1) = • Succ(B2) = • Succ(B3) = • Predecessors • Pred(B2) = • Pred(B3) = • Pred(B4) = B1 B2 B3 B4 • Branch node – more than one successor • Join node – more than one predecessor

  10. Dominators • Dominance is a binary relation on the flow graph nodes that allows us to easily find loops • Node d dominates node i (d dom i) if every possible execution path from entry to i includes d • Dominance is: • Reflexive – every node dominates itself • Transitive – if a dom b and b dom c, then a dom c • Antisymmetric – if a dom b and b dom a then a=b • dom(entry) = • dom(b1) = • dom(b2) = • dom(b3) = • dom(b4) = • dom(b5) = • dom(b6) = • dom(exit) = entry B1 B2 B3 B4 B5 B6 exit

  11. Immediate dominators • Idom(b) – a iff (a  b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b • Idom of a node is unique • Idom relationship forms a tree whose root is the entry node • idom(b1) = • idom(b2) = • idom(b3) = • idom(b4) = • idom(b5) = • idom(b6) = • idom(exit) = entry B1 B2 B3 B4 B5 B6 exit • Flow graph

  12. Strict Dominators and Postdominators • (d sdom i) if d dominates i and d  i • (p pdom i) if every possible execution path from i to exit includes p entry • pdom(entry) = • pdom(b1) = • pdom(b2) = • pdom(b3) = • pdom(b4) = • pdom(b5) = • pdom(b6) = B1 B2 B3 B4 B5 B6 exit • Flow graph

  13. Loops • Back edge – edge whose head dominates its tail • Loop containing this type of back edge is a natural loop • i.e. it has a single external entry point • For back edge b  c the loop header is c entry B1 • Natural loops = • Loop header (B3  B1) = • Loop header (B2  B2) = B2 B3 exit

  14. Quicksort Example • How might we optimize this code? i := m-1 j := n t1 := 4*n v := a[t1] b1 i := i+1 t2 := 4*i t3 := a[t2] if t3 < v goto b2 b2 j := j-1 t4 := 4*j t5 := a[t4] if t5 > v goto b3 b3 t6 :=4*i x := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] :=t9 t10 := 4*j a[t10] := x t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := x if i >= j goto b6 b4 b5 b6 [Quicksort] (i, j, v, x variables are needed outside)

  15. Reaching Definitions • Informally: • determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program • Why reaching definitions may be useful: x := 5 y := x + 2 if “x := 5” is the only definition reaching “y := x+2”, it can be simplified to “y := 7” (constant propagation)

  16. Reaching Definitions • Definition of a variable X: • is a statements that assigns (or may assign) a value to X • unambiguous: X := 3 • ambiguous: foo(X) or *Y := 3 • A definition d reaches a point p : • if there is a path from the point immediately following d to p, • such that d is not killed along that path. • A definition d of variable X is killed along path p • if there is another definition of X along p.

  17. Reaching Definitions (cont.) • Has the following properties: • forward analysis • “along some path” problem • Is conservative in that: • definition d may not define variable X • along a path p, there is another definition of X, but this other definition is ambiguous • definition d may be killed along infeasible paths

  18. 1 1 2 3 1-2-3 2-3 Data-Flow Analysis: Structured Programs • Most programs are structured: • sequence of statements • if-then-else construct • while-loops (including for-loops, loops with breaks,...) • For these programs, we may use an inductive (syntax driven) approach: 1-2-3

  19. d: a=b+c S S Reaching Definitions for Structured Programs gen[S] = {d} kill[S] = All-defs-of-a - {d} out[S] = gen[S]È ( in[S] - kill[S] ) gen[S] = gen[S2]È ( gen[S1] - kill[S2] ) kill[S] = kill[S2]È ( kill[S1] - gen[S2] ) in[S1] = in[S] in[S2] = out[S1] out[S] = out[S2] S1 S2

  20. S S Reaching Definitions for Structured Programs (cont.) gen[S] = gen[S1]Ègen[S2] kill[S] = kill[S1]Ç kill[S2] in[S1] = in[S2] = in[S] out[S] = out[S1]È out[S2] S1 S2 gen[S] = gen[S1] kill[S] = kill[S1] in[S1] = in[S]È gen[S1] out[S] = out[S1] S1

  21. Iterative Solution: Data-Flow Equations • Inductive approach only applicable to structured programs • because utilizes the structure of the program to synthesize & distribute the data-flow information • Need a general technique: Iterative Approach • compute the gen/kill sets of each statement / basic block • initialize the in/out sets • repetitively compute out/in sets until a steady state is reached

  22. Reaching Definitions • Reaching definitions: • set of definitions that may reach (along one or more paths) a given point • gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S. • kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning. • Equations: • in[S] =È (P a predecessor of S) out[P ] • out[S] = gen[S] È ( in[S] - kill[S] )

  23. Reaching Definitions (cont.) • Algorithm: for each basic block B: out[B] := gen[B]; (1) do change := false; for each basic block B do in[B] =È (P a predecessor of B) out[P ]; (2) old-out = out[B]; (3) out[B] = gen[B] È (in[B] - kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change

  24. gen[b1] := {d1, d2, d3} kill[b1] := {d4, d5, d6, d7} i := m-1 d1 j := n d2 a := u1 d3 b1 gen[b2] := {} kill[b2] := {} gen[b3] := {} kill[b3] := {} i := i+1 d4 j := j-1 d5 b2 gen[b4] := {} kill[b4] := {} a := u2 d6 b3 i := u3 d7 b4 Example for Reaching Definitions Compute gen/kill and iterate (visiting order: b1, b2, b3, b4) b1 b2 b3 b4 initial in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass1 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass2 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass3 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000

  25. Generalizations: Other Data-Flow Analyses • Reaching definitions is a (forward; some-path) analysis • For backward analysis: • interchange in / out sets in the previous algorithm, lines (1-5) • For all-path analysis: • intersection is substituted for union in line (2)

  26. Common Subexpression Elimination • Rule used to eliminate subexpression within a basic block • The subexpression was already defined • The value of the subexpression is not modified • i.e. none of the values needed to compute the subexpression are redefined • What about eliminating subexpressions across basic blocks?

  27. Available Expressions • An expression x+y is available at a point p: • if every path from the initial node to p evaluates x+y, and • after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y. • Definitions: • forward, all-path, • e-gen[S]: expressions definitely generated by S, • e.g. “z := x+y”: expression “x+y” is generated • e-kill[S]: expressions that may be killed by S • e.g. “z := x+y”: all expression containing “z” are killed. • order: compute e-gen and then e-kill, e.g. “x:= x+y”

  28. Available Expressions (cont.) • Algorithm: for each basic block B: out[B] := e-gen[B]; (1) do change := false; for each basic block B do in[B] = Ç (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = e-gen[B] È (in[B] - e-kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (2), use intersection instead of union

  29. Pointer Analysis • Identify the memory locations that may be addressed by a pointer • may be formalized as a system of data-flow equations. • Simple programming model: • pointer to integer (or float, arrays of integer, arrays of float) • no pointer to pointers allowed • Definitions: • in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S. • out[S]: the set of pairs (p, a), where p might point to a after statement S. • gen[S]: the new pairs (p, a) generated by the statement S. • kill[S]: the pairs (p, a) killed by the statement S.

  30. S: a=b+c S: p = &a S: p = q Pointer Analysis (cont.) input set gen[S ] = { } kill[S ] = { } input set gen[S ] = { (p, a) } kill[S, input set ] = { (p, b) | (p, b) is in input set } input set gen[S, input set ] = { (p, b) | (q, b) is in input set } kill[S, input set ] = { (p, b) | (p, b) is in input set }

  31. Pointer Analysis (cont.) • Algorithm: for each basic block B: out[B] := gen[B ]; (1) do change := false; for each basic block B do in[B] =È (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (4): gen and kill are functions of B and in[B].

  32. Performance of Iterative Solutions • Global analysis may be memory-space / computing intensive • May be reduced by • using bitvector representations for sets • analyzing only relevant variables • e.g. temporary variables may be ignored • synthesizing data-flow within basic block • mixing inductive and iterative solutions • suitably ordering the basic block • e.g. depth first order is good for forward analysis • limiting scope • may reduce the precision of analysis

  33. Summary • Iterative algorithm: • solve data-flow problem for arbitrary control flow graph • To solve a new data-flow problem: • define gen/kill accordingly • determine properties: • forward / backward • some-path / all-path

More Related