data flow analysis ii
Download
Skip this Video
Download Presentation
Data-Flow Analysis II

Loading in 2 Seconds...

play fullscreen
1 / 33

Data-Flow Analysis II - PowerPoint PPT Presentation


  • 517 Views
  • Uploaded on

Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data-Flow Analysis II' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data flow analysis ii

Data-Flow Analysis II

CS 671

March 13, 2008

data flow analysis
Data-Flow Analysis
  • Gather conservative, approximate information about what a program does
  • Result: some property that holds every time the instruction executes
  • The Data-Flow Abstraction
  • Execution of an instruction transforms program state
  • To analyze a program, we must consider all possible sequences of program points (paths)
  • Summarize all possible program states with finite set of facts
    • Limitation: may consider some infeasible paths
the general approach
The General Approach
  • Setting up and solving systems of equations that relate information at various points in the program
  • such as out[S] = gen[S] È ( in[S] - kill[S] ) where
      • S is a statement
      • in[S] and out[S] are information before and after S
      • gen[S] and kill[S] are information generated and killed by S
  • definition of in, out, gen, and kill depends on the desired information
data flow analysis cont
Data-Flow Analysis (cont.)
  • Properties:
    • either a forward analysis (out as function of in) or
    • a backward analysis (in as a function of out).
    • either an “along some path” problem or
    • an “along all paths” problem.
    • Data-flow analysis must be conservative
  • Definitions:
    • point between two statements (or before the first statements and after the last)
    • path is a sequence of consecutive points in the control-flow graph
example live variables
Example – Live Variables
  • Steps:
    • Set up live sets for each program point
    • Instantiate equations
    • Solve equations

if (c)

x = y+1

y = 2*z

if (d)

x = y+z

z = 1

z = x

example
Example
  • Program points

L1

if (c)

L2

L3

x = y+1

y = 2*z

if (d)

L4

L5

L6

L7

x = y+z

L8

L9

z = 1

L10

L11

z = x

L12

example1
Example

L1

if (c)

1

L2

L3

x = y+1

y = 2*z

if (d)

2

L4

3

L5

4

L6

L7

5

x = y+z

L8

L9

z = 1

6

L10

L11

7

z = x

L12

example2
in[I] = ( out[I] – def[I] )  use[I]

out[B] = in[B’]

B’  succ(B)

Example

L1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

L8 =

L9 =

L10 =

L11 =

L12 =

L1 = { }

if (c)

1

L2 = { }

L3 = { }

x = y+1

y = 2*z

if (d)

2

L4 = { }

3

L5 = { }

4

L6 = { }

L7 = { }

5

x = y+z

L8 = { }

L9 = { }

z = 1

6

L10 = { }

L11 = { }

7

z = x

L12 = { }

more terminology
More Terminology
  • Successors
  • Succ(B1) =
  • Succ(B2) =
  • Succ(B3) =
  • Predecessors
  • Pred(B2) =
  • Pred(B3) =
  • Pred(B4) =

B1

B2

B3

B4

  • Branch node – more than one successor
  • Join node – more than one predecessor
dominators
Dominators
  • Dominance is a binary relation on the flow graph nodes that allows us to easily find loops
  • Node d dominates node i (d dom i) if every possible execution path from entry to i includes d
  • Dominance is:
    • Reflexive – every node dominates itself
    • Transitive – if a dom b and b dom c, then a dom c
    • Antisymmetric – if a dom b and b dom a then a=b
  • dom(entry) =
  • dom(b1) =
  • dom(b2) =
  • dom(b3) =
  • dom(b4) =
  • dom(b5) =
  • dom(b6) =
  • dom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

immediate dominators
Immediate dominators
  • Idom(b) – a iff (a  b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b
  • Idom of a node is unique
  • Idom relationship forms a tree whose root is the entry node
  • idom(b1) =
  • idom(b2) =
  • idom(b3) =
  • idom(b4) =
  • idom(b5) =
  • idom(b6) =
  • idom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

  • Flow graph
strict dominators and postdominators
Strict Dominators and Postdominators
  • (d sdom i) if d dominates i and d  i
  • (p pdom i) if every possible execution path from i to exit includes p

entry

  • pdom(entry) =
  • pdom(b1) =
  • pdom(b2) =
  • pdom(b3) =
  • pdom(b4) =
  • pdom(b5) =
  • pdom(b6) =

B1

B2

B3

B4

B5

B6

exit

  • Flow graph
loops
Loops
  • Back edge – edge whose head dominates its tail
  • Loop containing this type of back edge is a natural loop
    • i.e. it has a single external entry point
  • For back edge b  c the loop header is c

entry

B1

  • Natural loops =
  • Loop header (B3  B1) =
  • Loop header (B2  B2) =

B2

B3

exit

quicksort example
Quicksort Example
  • How might we optimize this code?

i := m-1

j := n

t1 := 4*n

v := a[t1]

b1

i := i+1

t2 := 4*i

t3 := a[t2]

if t3 < v goto b2

b2

j := j-1

t4 := 4*j

t5 := a[t4]

if t5 > v goto b3

b3

t6 :=4*i

x := a[t6]

t7 := 4*i

t8 := 4*j

t9 := a[t8]

a[t7] :=t9

t10 := 4*j

a[t10] := x

t11 := 4*i

x := a[t11]

t12 := 4*i

t13 := 4*n

t14 := a[t13]

a[t12] := t14

t15 := 4*n

a[t15] := x

if i >= j goto b6

b4

b5

b6

[Quicksort] (i, j, v, x variables are needed outside)

reaching definitions
Reaching Definitions
  • Informally:
    • determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program
  • Why reaching definitions may be useful:

x := 5

y := x + 2

if “x := 5” is the only definition reaching “y := x+2”,

it can be simplified to “y := 7”

(constant propagation)

reaching definitions1
Reaching Definitions
  • Definition of a variable X:
    • is a statements that assigns (or may assign) a value to X
    • unambiguous: X := 3
    • ambiguous: foo(X) or *Y := 3
  • A definition d reaches a point p :
    • if there is a path from the point immediately following d to p,
    • such that d is not killed along that path.
  • A definition d of variable X is killed along path p
    • if there is another definition of X along p.
reaching definitions cont
Reaching Definitions (cont.)
  • Has the following properties:
    • forward analysis
    • “along some path” problem
  • Is conservative in that:
    • definition d may not define variable X
    • along a path p, there is another definition of X, but this other definition is ambiguous
    • definition d may be killed along infeasible paths
data flow analysis structured programs
1

1

2

3

1-2-3

2-3

Data-Flow Analysis: Structured Programs
  • Most programs are structured:
    • sequence of statements
    • if-then-else construct
    • while-loops (including for-loops, loops with breaks,...)
  • For these programs, we may use an inductive (syntax driven) approach:

1-2-3

reaching definitions for structured programs
d: a=b+c

S

S

Reaching Definitions for Structured Programs

gen[S] = {d}

kill[S] = All-defs-of-a - {d}

out[S] = gen[S]È ( in[S] - kill[S] )

gen[S] = gen[S2]È ( gen[S1] - kill[S2] )

kill[S] = kill[S2]È ( kill[S1] - gen[S2] )

in[S1] = in[S]

in[S2] = out[S1]

out[S] = out[S2]

S1

S2

reaching definitions for structured programs cont
S

S

Reaching Definitions for Structured Programs (cont.)

gen[S] = gen[S1]Ègen[S2]

kill[S] = kill[S1]Ç kill[S2]

in[S1] = in[S2] = in[S]

out[S] = out[S1]È out[S2]

S1

S2

gen[S] = gen[S1]

kill[S] = kill[S1]

in[S1] = in[S]È gen[S1]

out[S] = out[S1]

S1

iterative solution data flow equations
Iterative Solution: Data-Flow Equations
  • Inductive approach only applicable to structured programs
    • because utilizes the structure of the program to synthesize & distribute the data-flow information
  • Need a general technique: Iterative Approach
    • compute the gen/kill sets of each statement / basic block
    • initialize the in/out sets
    • repetitively compute out/in sets until a steady state is reached
reaching definitions2
Reaching Definitions
  • Reaching definitions:
    • set of definitions that may reach (along one or more paths) a given point
    • gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S.
    • kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.
  • Equations:
    • in[S] =È (P a predecessor of S) out[P ]
    • out[S] = gen[S] È ( in[S] - kill[S] )
reaching definitions cont1
Reaching Definitions (cont.)
  • Algorithm:

for each basic block B: out[B] := gen[B]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P ]; (2)

old-out = out[B]; (3)

out[B] = gen[B] È (in[B] - kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

example for reaching definitions
gen[b1] := {d1, d2, d3}

kill[b1] := {d4, d5, d6, d7}

i := m-1 d1

j := n d2

a := u1 d3

b1

gen[b2] := {}

kill[b2] := {}

gen[b3] := {}

kill[b3] := {}

i := i+1 d4

j := j-1 d5

b2

gen[b4] := {}

kill[b4] := {}

a := u2 d6

b3

i := u3 d7

b4

Example for Reaching Definitions

Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)

b1

b2

b3

b4

initial

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass1

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass2

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass3

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

generalizations other data flow analyses
Generalizations: Other Data-Flow Analyses
  • Reaching definitions is a (forward; some-path) analysis
  • For backward analysis:
    • interchange in / out sets in the previous algorithm, lines (1-5)
  • For all-path analysis:
    • intersection is substituted for union in line (2)
common subexpression elimination
Common Subexpression Elimination
  • Rule used to eliminate subexpression within a basic block
    • The subexpression was already defined
    • The value of the subexpression is not modified
      • i.e. none of the values needed to compute the subexpression are redefined
  • What about eliminating subexpressions across basic blocks?
available expressions
Available Expressions
  • An expression x+y is available at a point p:
    • if every path from the initial node to p evaluates x+y, and
    • after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y.
  • Definitions:
    • forward, all-path,
    • e-gen[S]: expressions definitely generated by S,
      • e.g. “z := x+y”: expression “x+y” is generated
    • e-kill[S]: expressions that may be killed by S
      • e.g. “z := x+y”: all expression containing “z” are killed.
    • order: compute e-gen and then e-kill, e.g. “x:= x+y”
available expressions cont
Available Expressions (cont.)
  • Algorithm:

for each basic block B: out[B] := e-gen[B]; (1)

do

change := false;

for each basic block B do

in[B] = Ç (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = e-gen[B] È (in[B] - e-kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (2), use intersection instead of union

pointer analysis
Pointer Analysis
  • Identify the memory locations that may be addressed by a pointer
    • may be formalized as a system of data-flow equations.
  • Simple programming model:
    • pointer to integer (or float, arrays of integer, arrays of float)
    • no pointer to pointers allowed
  • Definitions:
    • in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S.
    • out[S]: the set of pairs (p, a), where p might point to a after statement S.
    • gen[S]: the new pairs (p, a) generated by the statement S.
    • kill[S]: the pairs (p, a) killed by the statement S.
pointer analysis cont
S: a=b+c

S: p = &a

S: p = q

Pointer Analysis (cont.)

input set

gen[S ] = { }

kill[S ] = { }

input set

gen[S ] = { (p, a) }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

input set

gen[S, input set ] = { (p, b)

| (q, b) is in input set }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

pointer analysis cont1
Pointer Analysis (cont.)
  • Algorithm:

for each basic block B: out[B] := gen[B ]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (4): gen and kill are functions of B and in[B].

performance of iterative solutions
Performance of Iterative Solutions
  • Global analysis may be memory-space / computing intensive
  • May be reduced by
    • using bitvector representations for sets
    • analyzing only relevant variables
      • e.g. temporary variables may be ignored
    • synthesizing data-flow within basic block
    • mixing inductive and iterative solutions
    • suitably ordering the basic block
      • e.g. depth first order is good for forward analysis
    • limiting scope
      • may reduce the precision of analysis
summary
Summary
  • Iterative algorithm:
    • solve data-flow problem for arbitrary control flow graph
  • To solve a new data-flow problem:
    • define gen/kill accordingly
    • determine properties:
      • forward / backward
      • some-path / all-path
ad