Data-Flow Analysis II

1 / 33

# Data-Flow Analysis II - PowerPoint PPT Presentation

Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Data-Flow Analysis II' - benjamin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Data-Flow Analysis II

CS 671

March 13, 2008

Data-Flow Analysis
• Gather conservative, approximate information about what a program does
• Result: some property that holds every time the instruction executes
• The Data-Flow Abstraction
• Execution of an instruction transforms program state
• To analyze a program, we must consider all possible sequences of program points (paths)
• Summarize all possible program states with finite set of facts
• Limitation: may consider some infeasible paths
The General Approach
• Setting up and solving systems of equations that relate information at various points in the program
• such as out[S] = gen[S] È ( in[S] - kill[S] ) where
• S is a statement
• in[S] and out[S] are information before and after S
• gen[S] and kill[S] are information generated and killed by S
• definition of in, out, gen, and kill depends on the desired information
Data-Flow Analysis (cont.)
• Properties:
• either a forward analysis (out as function of in) or
• a backward analysis (in as a function of out).
• either an “along some path” problem or
• an “along all paths” problem.
• Data-flow analysis must be conservative
• Definitions:
• point between two statements (or before the first statements and after the last)
• path is a sequence of consecutive points in the control-flow graph
Example – Live Variables
• Steps:
• Set up live sets for each program point
• Instantiate equations
• Solve equations

if (c)

x = y+1

y = 2*z

if (d)

x = y+z

z = 1

z = x

Example
• Program points

L1

if (c)

L2

L3

x = y+1

y = 2*z

if (d)

L4

L5

L6

L7

x = y+z

L8

L9

z = 1

L10

L11

z = x

L12

Example

L1

if (c)

1

L2

L3

x = y+1

y = 2*z

if (d)

2

L4

3

L5

4

L6

L7

5

x = y+z

L8

L9

z = 1

6

L10

L11

7

z = x

L12

in[I] = ( out[I] – def[I] )  use[I]

out[B] = in[B’]

B’  succ(B)

Example

L1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

L8 =

L9 =

L10 =

L11 =

L12 =

L1 = { }

if (c)

1

L2 = { }

L3 = { }

x = y+1

y = 2*z

if (d)

2

L4 = { }

3

L5 = { }

4

L6 = { }

L7 = { }

5

x = y+z

L8 = { }

L9 = { }

z = 1

6

L10 = { }

L11 = { }

7

z = x

L12 = { }

More Terminology
• Successors
• Succ(B1) =
• Succ(B2) =
• Succ(B3) =
• Predecessors
• Pred(B2) =
• Pred(B3) =
• Pred(B4) =

B1

B2

B3

B4

• Branch node – more than one successor
• Join node – more than one predecessor
Dominators
• Dominance is a binary relation on the flow graph nodes that allows us to easily find loops
• Node d dominates node i (d dom i) if every possible execution path from entry to i includes d
• Dominance is:
• Reflexive – every node dominates itself
• Transitive – if a dom b and b dom c, then a dom c
• Antisymmetric – if a dom b and b dom a then a=b
• dom(entry) =
• dom(b1) =
• dom(b2) =
• dom(b3) =
• dom(b4) =
• dom(b5) =
• dom(b6) =
• dom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

Immediate dominators
• Idom(b) – a iff (a  b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b
• Idom of a node is unique
• Idom relationship forms a tree whose root is the entry node
• idom(b1) =
• idom(b2) =
• idom(b3) =
• idom(b4) =
• idom(b5) =
• idom(b6) =
• idom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

• Flow graph
Strict Dominators and Postdominators
• (d sdom i) if d dominates i and d  i
• (p pdom i) if every possible execution path from i to exit includes p

entry

• pdom(entry) =
• pdom(b1) =
• pdom(b2) =
• pdom(b3) =
• pdom(b4) =
• pdom(b5) =
• pdom(b6) =

B1

B2

B3

B4

B5

B6

exit

• Flow graph
Loops
• Back edge – edge whose head dominates its tail
• Loop containing this type of back edge is a natural loop
• i.e. it has a single external entry point
• For back edge b  c the loop header is c

entry

B1

• Natural loops =
• Loop header (B3  B1) =
• Loop header (B2  B2) =

B2

B3

exit

Quicksort Example
• How might we optimize this code?

i := m-1

j := n

t1 := 4*n

v := a[t1]

b1

i := i+1

t2 := 4*i

t3 := a[t2]

if t3 < v goto b2

b2

j := j-1

t4 := 4*j

t5 := a[t4]

if t5 > v goto b3

b3

t6 :=4*i

x := a[t6]

t7 := 4*i

t8 := 4*j

t9 := a[t8]

a[t7] :=t9

t10 := 4*j

a[t10] := x

t11 := 4*i

x := a[t11]

t12 := 4*i

t13 := 4*n

t14 := a[t13]

a[t12] := t14

t15 := 4*n

a[t15] := x

if i >= j goto b6

b4

b5

b6

[Quicksort] (i, j, v, x variables are needed outside)

Reaching Definitions
• Informally:
• determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program
• Why reaching definitions may be useful:

x := 5

y := x + 2

if “x := 5” is the only definition reaching “y := x+2”,

it can be simplified to “y := 7”

(constant propagation)

Reaching Definitions
• Definition of a variable X:
• is a statements that assigns (or may assign) a value to X
• unambiguous: X := 3
• ambiguous: foo(X) or *Y := 3
• A definition d reaches a point p :
• if there is a path from the point immediately following d to p,
• such that d is not killed along that path.
• A definition d of variable X is killed along path p
• if there is another definition of X along p.
Reaching Definitions (cont.)
• Has the following properties:
• forward analysis
• “along some path” problem
• Is conservative in that:
• definition d may not define variable X
• along a path p, there is another definition of X, but this other definition is ambiguous
• definition d may be killed along infeasible paths

1

1

2

3

1-2-3

2-3

Data-Flow Analysis: Structured Programs
• Most programs are structured:
• sequence of statements
• if-then-else construct
• while-loops (including for-loops, loops with breaks,...)
• For these programs, we may use an inductive (syntax driven) approach:

1-2-3

d: a=b+c

S

S

Reaching Definitions for Structured Programs

gen[S] = {d}

kill[S] = All-defs-of-a - {d}

out[S] = gen[S]È ( in[S] - kill[S] )

gen[S] = gen[S2]È ( gen[S1] - kill[S2] )

kill[S] = kill[S2]È ( kill[S1] - gen[S2] )

in[S1] = in[S]

in[S2] = out[S1]

out[S] = out[S2]

S1

S2

S

S

Reaching Definitions for Structured Programs (cont.)

gen[S] = gen[S1]Ègen[S2]

kill[S] = kill[S1]Ç kill[S2]

in[S1] = in[S2] = in[S]

out[S] = out[S1]È out[S2]

S1

S2

gen[S] = gen[S1]

kill[S] = kill[S1]

in[S1] = in[S]È gen[S1]

out[S] = out[S1]

S1

Iterative Solution: Data-Flow Equations
• Inductive approach only applicable to structured programs
• because utilizes the structure of the program to synthesize & distribute the data-flow information
• Need a general technique: Iterative Approach
• compute the gen/kill sets of each statement / basic block
• initialize the in/out sets
• repetitively compute out/in sets until a steady state is reached
Reaching Definitions
• Reaching definitions:
• set of definitions that may reach (along one or more paths) a given point
• gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S.
• kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.
• Equations:
• in[S] =È (P a predecessor of S) out[P ]
• out[S] = gen[S] È ( in[S] - kill[S] )
Reaching Definitions (cont.)
• Algorithm:

for each basic block B: out[B] := gen[B]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P ]; (2)

old-out = out[B]; (3)

out[B] = gen[B] È (in[B] - kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

gen[b1] := {d1, d2, d3}

kill[b1] := {d4, d5, d6, d7}

i := m-1 d1

j := n d2

a := u1 d3

b1

gen[b2] := {}

kill[b2] := {}

gen[b3] := {}

kill[b3] := {}

i := i+1 d4

j := j-1 d5

b2

gen[b4] := {}

kill[b4] := {}

a := u2 d6

b3

i := u3 d7

b4

Example for Reaching Definitions

Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)

b1

b2

b3

b4

initial

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass1

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass2

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass3

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

Generalizations: Other Data-Flow Analyses
• Reaching definitions is a (forward; some-path) analysis
• For backward analysis:
• interchange in / out sets in the previous algorithm, lines (1-5)
• For all-path analysis:
• intersection is substituted for union in line (2)
Common Subexpression Elimination
• Rule used to eliminate subexpression within a basic block
• The subexpression was already defined
• The value of the subexpression is not modified
• i.e. none of the values needed to compute the subexpression are redefined
• What about eliminating subexpressions across basic blocks?
Available Expressions
• An expression x+y is available at a point p:
• if every path from the initial node to p evaluates x+y, and
• after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y.
• Definitions:
• forward, all-path,
• e-gen[S]: expressions definitely generated by S,
• e.g. “z := x+y”: expression “x+y” is generated
• e-kill[S]: expressions that may be killed by S
• e.g. “z := x+y”: all expression containing “z” are killed.
• order: compute e-gen and then e-kill, e.g. “x:= x+y”
Available Expressions (cont.)
• Algorithm:

for each basic block B: out[B] := e-gen[B]; (1)

do

change := false;

for each basic block B do

in[B] = Ç (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = e-gen[B] È (in[B] - e-kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (2), use intersection instead of union

Pointer Analysis
• Identify the memory locations that may be addressed by a pointer
• may be formalized as a system of data-flow equations.
• Simple programming model:
• pointer to integer (or float, arrays of integer, arrays of float)
• no pointer to pointers allowed
• Definitions:
• in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S.
• out[S]: the set of pairs (p, a), where p might point to a after statement S.
• gen[S]: the new pairs (p, a) generated by the statement S.
• kill[S]: the pairs (p, a) killed by the statement S.

S: a=b+c

S: p = &a

S: p = q

Pointer Analysis (cont.)

input set

gen[S ] = { }

kill[S ] = { }

input set

gen[S ] = { (p, a) }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

input set

gen[S, input set ] = { (p, b)

| (q, b) is in input set }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

Pointer Analysis (cont.)
• Algorithm:

for each basic block B: out[B] := gen[B ]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (4): gen and kill are functions of B and in[B].

Performance of Iterative Solutions
• Global analysis may be memory-space / computing intensive
• May be reduced by
• using bitvector representations for sets
• analyzing only relevant variables
• e.g. temporary variables may be ignored
• synthesizing data-flow within basic block
• mixing inductive and iterative solutions
• suitably ordering the basic block
• e.g. depth first order is good for forward analysis
• limiting scope
• may reduce the precision of analysis
Summary
• Iterative algorithm:
• solve data-flow problem for arbitrary control flow graph
• To solve a new data-flow problem:
• define gen/kill accordingly
• determine properties:
• forward / backward
• some-path / all-path