1 / 33

# Data-Flow Analysis II CS 671 - PowerPoint PPT Presentation

Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state

Related searches for Data-Flow Analysis II CS 671

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Data-Flow Analysis II CS 671' - benjamin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Data-Flow Analysis II

CS 671

March 13, 2008

• Gather conservative, approximate information about what a program does

• Result: some property that holds every time the instruction executes

• The Data-Flow Abstraction

• Execution of an instruction transforms program state

• To analyze a program, we must consider all possible sequences of program points (paths)

• Summarize all possible program states with finite set of facts

• Limitation: may consider some infeasible paths

• Setting up and solving systems of equations that relate information at various points in the program

• such as out[S] = gen[S] È ( in[S] - kill[S] ) where

• S is a statement

• in[S] and out[S] are information before and after S

• gen[S] and kill[S] are information generated and killed by S

• definition of in, out, gen, and kill depends on the desired information

• Properties:

• either a forward analysis (out as function of in) or

• a backward analysis (in as a function of out).

• either an “along some path” problem or

• an “along all paths” problem.

• Data-flow analysis must be conservative

• Definitions:

• point between two statements (or before the first statements and after the last)

• path is a sequence of consecutive points in the control-flow graph

• Steps:

• Set up live sets for each program point

• Instantiate equations

• Solve equations

if (c)

x = y+1

y = 2*z

if (d)

x = y+z

z = 1

z = x

• Program points

L1

if (c)

L2

L3

x = y+1

y = 2*z

if (d)

L4

L5

L6

L7

x = y+z

L8

L9

z = 1

L10

L11

z = x

L12

L1

if (c)

1

L2

L3

x = y+1

y = 2*z

if (d)

2

L4

3

L5

4

L6

L7

5

x = y+z

L8

L9

z = 1

6

L10

L11

7

z = x

L12

in[I] = ( out[I] – def[I] )  use[I]

out[B] = in[B’]

B’  succ(B)

Example

L1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

L8 =

L9 =

L10 =

L11 =

L12 =

L1 = { }

if (c)

1

L2 = { }

L3 = { }

x = y+1

y = 2*z

if (d)

2

L4 = { }

3

L5 = { }

4

L6 = { }

L7 = { }

5

x = y+z

L8 = { }

L9 = { }

z = 1

6

L10 = { }

L11 = { }

7

z = x

L12 = { }

• Successors

• Succ(B1) =

• Succ(B2) =

• Succ(B3) =

• Predecessors

• Pred(B2) =

• Pred(B3) =

• Pred(B4) =

B1

B2

B3

B4

• Branch node – more than one successor

• Join node – more than one predecessor

• Dominance is a binary relation on the flow graph nodes that allows us to easily find loops

• Node d dominates node i (d dom i) if every possible execution path from entry to i includes d

• Dominance is:

• Reflexive – every node dominates itself

• Transitive – if a dom b and b dom c, then a dom c

• Antisymmetric – if a dom b and b dom a then a=b

• dom(entry) =

• dom(b1) =

• dom(b2) =

• dom(b3) =

• dom(b4) =

• dom(b5) =

• dom(b6) =

• dom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

• Idom(b) – a iff (a  b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b

• Idom of a node is unique

• Idom relationship forms a tree whose root is the entry node

• idom(b1) =

• idom(b2) =

• idom(b3) =

• idom(b4) =

• idom(b5) =

• idom(b6) =

• idom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

• Flow graph

• (d sdom i) if d dominates i and d  i

• (p pdom i) if every possible execution path from i to exit includes p

entry

• pdom(entry) =

• pdom(b1) =

• pdom(b2) =

• pdom(b3) =

• pdom(b4) =

• pdom(b5) =

• pdom(b6) =

B1

B2

B3

B4

B5

B6

exit

• Flow graph

• Back edge – edge whose head dominates its tail

• Loop containing this type of back edge is a natural loop

• i.e. it has a single external entry point

• For back edge b  c the loop header is c

entry

B1

• Natural loops =

• Loop header (B3  B1) =

• Loop header (B2  B2) =

B2

B3

exit

• How might we optimize this code?

i := m-1

j := n

t1 := 4*n

v := a[t1]

b1

i := i+1

t2 := 4*i

t3 := a[t2]

if t3 < v goto b2

b2

j := j-1

t4 := 4*j

t5 := a[t4]

if t5 > v goto b3

b3

t6 :=4*i

x := a[t6]

t7 := 4*i

t8 := 4*j

t9 := a[t8]

a[t7] :=t9

t10 := 4*j

a[t10] := x

t11 := 4*i

x := a[t11]

t12 := 4*i

t13 := 4*n

t14 := a[t13]

a[t12] := t14

t15 := 4*n

a[t15] := x

if i >= j goto b6

b4

b5

b6

[Quicksort] (i, j, v, x variables are needed outside)

• Informally:

• determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program

• Why reaching definitions may be useful:

x := 5

y := x + 2

if “x := 5” is the only definition reaching “y := x+2”,

it can be simplified to “y := 7”

(constant propagation)

• Definition of a variable X:

• is a statements that assigns (or may assign) a value to X

• unambiguous: X := 3

• ambiguous: foo(X) or *Y := 3

• A definition d reaches a point p :

• if there is a path from the point immediately following d to p,

• such that d is not killed along that path.

• A definition d of variable X is killed along path p

• if there is another definition of X along p.

• Has the following properties:

• forward analysis

• “along some path” problem

• Is conservative in that:

• definition d may not define variable X

• along a path p, there is another definition of X, but this other definition is ambiguous

• definition d may be killed along infeasible paths

1

2

3

1-2-3

2-3

Data-Flow Analysis: Structured Programs

• Most programs are structured:

• sequence of statements

• if-then-else construct

• while-loops (including for-loops, loops with breaks,...)

• For these programs, we may use an inductive (syntax driven) approach:

1-2-3

S

S

Reaching Definitions for Structured Programs

gen[S] = {d}

kill[S] = All-defs-of-a - {d}

out[S] = gen[S]È ( in[S] - kill[S] )

gen[S] = gen[S2]È ( gen[S1] - kill[S2] )

kill[S] = kill[S2]È ( kill[S1] - gen[S2] )

in[S1] = in[S]

in[S2] = out[S1]

out[S] = out[S2]

S1

S2

S

Reaching Definitions for Structured Programs (cont.)

gen[S] = gen[S1]Ègen[S2]

kill[S] = kill[S1]Ç kill[S2]

in[S1] = in[S2] = in[S]

out[S] = out[S1]È out[S2]

S1

S2

gen[S] = gen[S1]

kill[S] = kill[S1]

in[S1] = in[S]È gen[S1]

out[S] = out[S1]

S1

• Inductive approach only applicable to structured programs

• because utilizes the structure of the program to synthesize & distribute the data-flow information

• Need a general technique: Iterative Approach

• compute the gen/kill sets of each statement / basic block

• initialize the in/out sets

• repetitively compute out/in sets until a steady state is reached

• Reaching definitions:

• set of definitions that may reach (along one or more paths) a given point

• gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S.

• kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.

• Equations:

• in[S] =È (P a predecessor of S) out[P ]

• out[S] = gen[S] È ( in[S] - kill[S] )

• Algorithm:

for each basic block B: out[B] := gen[B]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P ]; (2)

old-out = out[B]; (3)

out[B] = gen[B] È (in[B] - kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

kill[b1] := {d4, d5, d6, d7}

i := m-1 d1

j := n d2

a := u1 d3

b1

gen[b2] := {}

kill[b2] := {}

gen[b3] := {}

kill[b3] := {}

i := i+1 d4

j := j-1 d5

b2

gen[b4] := {}

kill[b4] := {}

a := u2 d6

b3

i := u3 d7

b4

Example for Reaching Definitions

Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)

b1

b2

b3

b4

initial

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass1

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass2

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass3

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

• Reaching definitions is a (forward; some-path) analysis

• For backward analysis:

• interchange in / out sets in the previous algorithm, lines (1-5)

• For all-path analysis:

• intersection is substituted for union in line (2)

• Rule used to eliminate subexpression within a basic block

• The subexpression was already defined

• The value of the subexpression is not modified

• i.e. none of the values needed to compute the subexpression are redefined

• What about eliminating subexpressions across basic blocks?

• An expression x+y is available at a point p:

• if every path from the initial node to p evaluates x+y, and

• after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y.

• Definitions:

• forward, all-path,

• e-gen[S]: expressions definitely generated by S,

• e.g. “z := x+y”: expression “x+y” is generated

• e-kill[S]: expressions that may be killed by S

• e.g. “z := x+y”: all expression containing “z” are killed.

• order: compute e-gen and then e-kill, e.g. “x:= x+y”

• Algorithm:

for each basic block B: out[B] := e-gen[B]; (1)

do

change := false;

for each basic block B do

in[B] = Ç (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = e-gen[B] È (in[B] - e-kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (2), use intersection instead of union

• Identify the memory locations that may be addressed by a pointer

• may be formalized as a system of data-flow equations.

• Simple programming model:

• pointer to integer (or float, arrays of integer, arrays of float)

• no pointer to pointers allowed

• Definitions:

• in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S.

• out[S]: the set of pairs (p, a), where p might point to a after statement S.

• gen[S]: the new pairs (p, a) generated by the statement S.

• kill[S]: the pairs (p, a) killed by the statement S.

S: p = &a

S: p = q

Pointer Analysis (cont.)

input set

gen[S ] = { }

kill[S ] = { }

input set

gen[S ] = { (p, a) }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

input set

gen[S, input set ] = { (p, b)

| (q, b) is in input set }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

• Algorithm:

for each basic block B: out[B] := gen[B ]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (4): gen and kill are functions of B and in[B].

• Global analysis may be memory-space / computing intensive

• May be reduced by

• using bitvector representations for sets

• analyzing only relevant variables

• e.g. temporary variables may be ignored

• synthesizing data-flow within basic block

• mixing inductive and iterative solutions

• suitably ordering the basic block

• e.g. depth first order is good for forward analysis

• limiting scope

• may reduce the precision of analysis

• Iterative algorithm:

• solve data-flow problem for arbitrary control flow graph

• To solve a new data-flow problem:

• define gen/kill accordingly

• determine properties:

• forward / backward

• some-path / all-path