- 517 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Data-Flow Analysis II' - benjamin

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Data-Flow Analysis

- Gather conservative, approximate information about what a program does
- Result: some property that holds every time the instruction executes
- The Data-Flow Abstraction
- Execution of an instruction transforms program state
- To analyze a program, we must consider all possible sequences of program points (paths)
- Summarize all possible program states with finite set of facts
- Limitation: may consider some infeasible paths

The General Approach

- Setting up and solving systems of equations that relate information at various points in the program
- such as out[S] = gen[S] È ( in[S] - kill[S] ) where
- S is a statement
- in[S] and out[S] are information before and after S
- gen[S] and kill[S] are information generated and killed by S
- definition of in, out, gen, and kill depends on the desired information

Data-Flow Analysis (cont.)

- Properties:
- either a forward analysis (out as function of in) or
- a backward analysis (in as a function of out).
- either an “along some path” problem or
- an “along all paths” problem.
- Data-flow analysis must be conservative
- Definitions:
- point between two statements (or before the first statements and after the last)
- path is a sequence of consecutive points in the control-flow graph

Example – Live Variables

- Steps:
- Set up live sets for each program point
- Instantiate equations
- Solve equations

if (c)

x = y+1

y = 2*z

if (d)

x = y+z

z = 1

z = x

in[I] = ( out[I] – def[I] ) use[I]

out[B] = in[B’]

B’ succ(B)

ExampleL1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

L8 =

L9 =

L10 =

L11 =

L12 =

L1 = { }

if (c)

1

L2 = { }

L3 = { }

x = y+1

y = 2*z

if (d)

2

L4 = { }

3

L5 = { }

4

L6 = { }

L7 = { }

5

x = y+z

L8 = { }

L9 = { }

z = 1

6

L10 = { }

L11 = { }

7

z = x

L12 = { }

More Terminology

- Successors
- Succ(B1) =
- Succ(B2) =
- Succ(B3) =
- Predecessors
- Pred(B2) =
- Pred(B3) =
- Pred(B4) =

B1

B2

B3

B4

- Branch node – more than one successor
- Join node – more than one predecessor

Dominators

- Dominance is a binary relation on the flow graph nodes that allows us to easily find loops
- Node d dominates node i (d dom i) if every possible execution path from entry to i includes d
- Dominance is:
- Reflexive – every node dominates itself
- Transitive – if a dom b and b dom c, then a dom c
- Antisymmetric – if a dom b and b dom a then a=b

- dom(entry) =
- dom(b1) =
- dom(b2) =
- dom(b3) =
- dom(b4) =
- dom(b5) =
- dom(b6) =
- dom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

Immediate dominators

- Idom(b) – a iff (a b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b
- Idom of a node is unique
- Idom relationship forms a tree whose root is the entry node

- idom(b1) =
- idom(b2) =
- idom(b3) =
- idom(b4) =
- idom(b5) =
- idom(b6) =
- idom(exit) =

entry

B1

B2

B3

B4

B5

B6

exit

- Flow graph

Strict Dominators and Postdominators

- (d sdom i) if d dominates i and d i
- (p pdom i) if every possible execution path from i to exit includes p

entry

- pdom(entry) =
- pdom(b1) =
- pdom(b2) =
- pdom(b3) =
- pdom(b4) =
- pdom(b5) =
- pdom(b6) =

B1

B2

B3

B4

B5

B6

exit

- Flow graph

Loops

- Back edge – edge whose head dominates its tail
- Loop containing this type of back edge is a natural loop
- i.e. it has a single external entry point
- For back edge b c the loop header is c

entry

B1

- Natural loops =
- Loop header (B3 B1) =
- Loop header (B2 B2) =

B2

B3

exit

Quicksort Example

- How might we optimize this code?

i := m-1

j := n

t1 := 4*n

v := a[t1]

b1

i := i+1

t2 := 4*i

t3 := a[t2]

if t3 < v goto b2

b2

j := j-1

t4 := 4*j

t5 := a[t4]

if t5 > v goto b3

b3

t6 :=4*i

x := a[t6]

t7 := 4*i

t8 := 4*j

t9 := a[t8]

a[t7] :=t9

t10 := 4*j

a[t10] := x

t11 := 4*i

x := a[t11]

t12 := 4*i

t13 := 4*n

t14 := a[t13]

a[t12] := t14

t15 := 4*n

a[t15] := x

if i >= j goto b6

b4

b5

b6

[Quicksort] (i, j, v, x variables are needed outside)

Reaching Definitions

- Informally:
- determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program
- Why reaching definitions may be useful:

x := 5

y := x + 2

if “x := 5” is the only definition reaching “y := x+2”,

it can be simplified to “y := 7”

(constant propagation)

Reaching Definitions

- Definition of a variable X:
- is a statements that assigns (or may assign) a value to X
- unambiguous: X := 3
- ambiguous: foo(X) or *Y := 3
- A definition d reaches a point p :
- if there is a path from the point immediately following d to p,
- such that d is not killed along that path.
- A definition d of variable X is killed along path p
- if there is another definition of X along p.

Reaching Definitions (cont.)

- Has the following properties:
- forward analysis
- “along some path” problem
- Is conservative in that:
- definition d may not define variable X
- along a path p, there is another definition of X, but this other definition is ambiguous
- definition d may be killed along infeasible paths

1

1

2

3

1-2-3

2-3

Data-Flow Analysis: Structured Programs- Most programs are structured:
- sequence of statements
- if-then-else construct
- while-loops (including for-loops, loops with breaks,...)
- For these programs, we may use an inductive (syntax driven) approach:

1-2-3

d: a=b+c

S

S

Reaching Definitions for Structured Programsgen[S] = {d}

kill[S] = All-defs-of-a - {d}

out[S] = gen[S]È ( in[S] - kill[S] )

gen[S] = gen[S2]È ( gen[S1] - kill[S2] )

kill[S] = kill[S2]È ( kill[S1] - gen[S2] )

in[S1] = in[S]

in[S2] = out[S1]

out[S] = out[S2]

S1

S2

S

S

Reaching Definitions for Structured Programs (cont.)gen[S] = gen[S1]Ègen[S2]

kill[S] = kill[S1]Ç kill[S2]

in[S1] = in[S2] = in[S]

out[S] = out[S1]È out[S2]

S1

S2

gen[S] = gen[S1]

kill[S] = kill[S1]

in[S1] = in[S]È gen[S1]

out[S] = out[S1]

S1

Iterative Solution: Data-Flow Equations

- Inductive approach only applicable to structured programs
- because utilizes the structure of the program to synthesize & distribute the data-flow information
- Need a general technique: Iterative Approach
- compute the gen/kill sets of each statement / basic block
- initialize the in/out sets
- repetitively compute out/in sets until a steady state is reached

Reaching Definitions

- Reaching definitions:
- set of definitions that may reach (along one or more paths) a given point
- gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S.
- kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.
- Equations:
- in[S] =È (P a predecessor of S) out[P ]
- out[S] = gen[S] È ( in[S] - kill[S] )

Reaching Definitions (cont.)

- Algorithm:

for each basic block B: out[B] := gen[B]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P ]; (2)

old-out = out[B]; (3)

out[B] = gen[B] È (in[B] - kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

gen[b1] := {d1, d2, d3}

kill[b1] := {d4, d5, d6, d7}

i := m-1 d1

j := n d2

a := u1 d3

b1

gen[b2] := {}

kill[b2] := {}

gen[b3] := {}

kill[b3] := {}

i := i+1 d4

j := j-1 d5

b2

gen[b4] := {}

kill[b4] := {}

a := u2 d6

b3

i := u3 d7

b4

Example for Reaching DefinitionsCompute gen/kill and iterate (visiting order: b1, b2, b3, b4)

b1

b2

b3

b4

initial

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass1

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass2

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

pass3

in[B]

000 0000

000 0000

000 0000

000 0000

out[B]

000 0000

000 0000

000 0000

000 0000

Generalizations: Other Data-Flow Analyses

- Reaching definitions is a (forward; some-path) analysis
- For backward analysis:
- interchange in / out sets in the previous algorithm, lines (1-5)
- For all-path analysis:
- intersection is substituted for union in line (2)

Common Subexpression Elimination

- Rule used to eliminate subexpression within a basic block
- The subexpression was already defined
- The value of the subexpression is not modified
- i.e. none of the values needed to compute the subexpression are redefined
- What about eliminating subexpressions across basic blocks?

Available Expressions

- An expression x+y is available at a point p:
- if every path from the initial node to p evaluates x+y, and
- after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y.
- Definitions:
- forward, all-path,
- e-gen[S]: expressions definitely generated by S,
- e.g. “z := x+y”: expression “x+y” is generated
- e-kill[S]: expressions that may be killed by S
- e.g. “z := x+y”: all expression containing “z” are killed.
- order: compute e-gen and then e-kill, e.g. “x:= x+y”

Available Expressions (cont.)

- Algorithm:

for each basic block B: out[B] := e-gen[B]; (1)

do

change := false;

for each basic block B do

in[B] = Ç (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = e-gen[B] È (in[B] - e-kill[B]); (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (2), use intersection instead of union

Pointer Analysis

- Identify the memory locations that may be addressed by a pointer
- may be formalized as a system of data-flow equations.
- Simple programming model:
- pointer to integer (or float, arrays of integer, arrays of float)
- no pointer to pointers allowed
- Definitions:
- in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S.
- out[S]: the set of pairs (p, a), where p might point to a after statement S.
- gen[S]: the new pairs (p, a) generated by the statement S.
- kill[S]: the pairs (p, a) killed by the statement S.

S: a=b+c

S: p = &a

S: p = q

Pointer Analysis (cont.)input set

gen[S ] = { }

kill[S ] = { }

input set

gen[S ] = { (p, a) }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

input set

gen[S, input set ] = { (p, b)

| (q, b) is in input set }

kill[S, input set ] = { (p, b)

| (p, b) is in input set }

Pointer Analysis (cont.)

- Algorithm:

for each basic block B: out[B] := gen[B ]; (1)

do

change := false;

for each basic block B do

in[B] =È (P a predecessor of B) out[P]; (2)

old-out = out[B]; (3)

out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4)

if (out[B] != old-out) then change := true; (5)

end

while change

difference: line (4): gen and kill are functions of B and in[B].

Performance of Iterative Solutions

- Global analysis may be memory-space / computing intensive
- May be reduced by
- using bitvector representations for sets
- analyzing only relevant variables
- e.g. temporary variables may be ignored
- synthesizing data-flow within basic block
- mixing inductive and iterative solutions
- suitably ordering the basic block
- e.g. depth first order is good for forward analysis
- limiting scope
- may reduce the precision of analysis

Summary

- Iterative algorithm:
- solve data-flow problem for arbitrary control flow graph
- To solve a new data-flow problem:
- define gen/kill accordingly
- determine properties:
- forward / backward
- some-path / all-path

Download Presentation

Connecting to Server..