Data flow analysis ii
Download
1 / 33

Data-Flow Analysis II CS 671 - PowerPoint PPT Presentation


  • 517 Views
  • Updated On :

Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data-Flow Analysis II CS 671' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data flow analysis ii

Data-Flow Analysis II

CS 671

March 13, 2008


Data flow analysis
Data-Flow Analysis

  • Gather conservative, approximate information about what a program does

  • Result: some property that holds every time the instruction executes

  • The Data-Flow Abstraction

  • Execution of an instruction transforms program state

  • To analyze a program, we must consider all possible sequences of program points (paths)

  • Summarize all possible program states with finite set of facts

    • Limitation: may consider some infeasible paths


The general approach
The General Approach

  • Setting up and solving systems of equations that relate information at various points in the program

  • such as out[S] = gen[S] È ( in[S] - kill[S] ) where

    • S is a statement

    • in[S] and out[S] are information before and after S

    • gen[S] and kill[S] are information generated and killed by S

  • definition of in, out, gen, and kill depends on the desired information


  • Data flow analysis cont
    Data-Flow Analysis (cont.)

    • Properties:

      • either a forward analysis (out as function of in) or

      • a backward analysis (in as a function of out).

      • either an “along some path” problem or

      • an “along all paths” problem.

      • Data-flow analysis must be conservative

    • Definitions:

      • point between two statements (or before the first statements and after the last)

      • path is a sequence of consecutive points in the control-flow graph


    Example live variables
    Example – Live Variables

    • Steps:

      • Set up live sets for each program point

      • Instantiate equations

      • Solve equations

    if (c)

    x = y+1

    y = 2*z

    if (d)

    x = y+z

    z = 1

    z = x


    Example
    Example

    • Program points

    L1

    if (c)

    L2

    L3

    x = y+1

    y = 2*z

    if (d)

    L4

    L5

    L6

    L7

    x = y+z

    L8

    L9

    z = 1

    L10

    L11

    z = x

    L12


    Example1
    Example

    L1

    if (c)

    1

    L2

    L3

    x = y+1

    y = 2*z

    if (d)

    2

    L4

    3

    L5

    4

    L6

    L7

    5

    x = y+z

    L8

    L9

    z = 1

    6

    L10

    L11

    7

    z = x

    L12


    Example2

    in[I] = ( out[I] – def[I] )  use[I]

    out[B] = in[B’]

    B’  succ(B)

    Example

    L1 =

    L2 =

    L3 =

    L4 =

    L5 =

    L6 =

    L7 =

    L8 =

    L9 =

    L10 =

    L11 =

    L12 =

    L1 = { }

    if (c)

    1

    L2 = { }

    L3 = { }

    x = y+1

    y = 2*z

    if (d)

    2

    L4 = { }

    3

    L5 = { }

    4

    L6 = { }

    L7 = { }

    5

    x = y+z

    L8 = { }

    L9 = { }

    z = 1

    6

    L10 = { }

    L11 = { }

    7

    z = x

    L12 = { }


    More terminology
    More Terminology

    • Successors

    • Succ(B1) =

    • Succ(B2) =

    • Succ(B3) =

    • Predecessors

    • Pred(B2) =

    • Pred(B3) =

    • Pred(B4) =

    B1

    B2

    B3

    B4

    • Branch node – more than one successor

    • Join node – more than one predecessor


    Dominators
    Dominators

    • Dominance is a binary relation on the flow graph nodes that allows us to easily find loops

    • Node d dominates node i (d dom i) if every possible execution path from entry to i includes d

    • Dominance is:

      • Reflexive – every node dominates itself

      • Transitive – if a dom b and b dom c, then a dom c

      • Antisymmetric – if a dom b and b dom a then a=b

    • dom(entry) =

    • dom(b1) =

    • dom(b2) =

    • dom(b3) =

    • dom(b4) =

    • dom(b5) =

    • dom(b6) =

    • dom(exit) =

    entry

    B1

    B2

    B3

    B4

    B5

    B6

    exit


    Immediate dominators
    Immediate dominators

    • Idom(b) – a iff (a  b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b

    • Idom of a node is unique

    • Idom relationship forms a tree whose root is the entry node

    • idom(b1) =

    • idom(b2) =

    • idom(b3) =

    • idom(b4) =

    • idom(b5) =

    • idom(b6) =

    • idom(exit) =

    entry

    B1

    B2

    B3

    B4

    B5

    B6

    exit

    • Flow graph


    Strict dominators and postdominators
    Strict Dominators and Postdominators

    • (d sdom i) if d dominates i and d  i

    • (p pdom i) if every possible execution path from i to exit includes p

    entry

    • pdom(entry) =

    • pdom(b1) =

    • pdom(b2) =

    • pdom(b3) =

    • pdom(b4) =

    • pdom(b5) =

    • pdom(b6) =

    B1

    B2

    B3

    B4

    B5

    B6

    exit

    • Flow graph


    Loops
    Loops

    • Back edge – edge whose head dominates its tail

    • Loop containing this type of back edge is a natural loop

      • i.e. it has a single external entry point

    • For back edge b  c the loop header is c

    entry

    B1

    • Natural loops =

    • Loop header (B3  B1) =

    • Loop header (B2  B2) =

    B2

    B3

    exit


    Quicksort example
    Quicksort Example

    • How might we optimize this code?

    i := m-1

    j := n

    t1 := 4*n

    v := a[t1]

    b1

    i := i+1

    t2 := 4*i

    t3 := a[t2]

    if t3 < v goto b2

    b2

    j := j-1

    t4 := 4*j

    t5 := a[t4]

    if t5 > v goto b3

    b3

    t6 :=4*i

    x := a[t6]

    t7 := 4*i

    t8 := 4*j

    t9 := a[t8]

    a[t7] :=t9

    t10 := 4*j

    a[t10] := x

    t11 := 4*i

    x := a[t11]

    t12 := 4*i

    t13 := 4*n

    t14 := a[t13]

    a[t12] := t14

    t15 := 4*n

    a[t15] := x

    if i >= j goto b6

    b4

    b5

    b6

    [Quicksort] (i, j, v, x variables are needed outside)


    Reaching definitions
    Reaching Definitions

    • Informally:

      • determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program

    • Why reaching definitions may be useful:

    x := 5

    y := x + 2

    if “x := 5” is the only definition reaching “y := x+2”,

    it can be simplified to “y := 7”

    (constant propagation)


    Reaching definitions1
    Reaching Definitions

    • Definition of a variable X:

      • is a statements that assigns (or may assign) a value to X

      • unambiguous: X := 3

      • ambiguous: foo(X) or *Y := 3

    • A definition d reaches a point p :

      • if there is a path from the point immediately following d to p,

      • such that d is not killed along that path.

    • A definition d of variable X is killed along path p

      • if there is another definition of X along p.


    Reaching definitions cont
    Reaching Definitions (cont.)

    • Has the following properties:

      • forward analysis

      • “along some path” problem

    • Is conservative in that:

      • definition d may not define variable X

      • along a path p, there is another definition of X, but this other definition is ambiguous

      • definition d may be killed along infeasible paths


    Data flow analysis structured programs

    1

    1

    2

    3

    1-2-3

    2-3

    Data-Flow Analysis: Structured Programs

    • Most programs are structured:

      • sequence of statements

      • if-then-else construct

      • while-loops (including for-loops, loops with breaks,...)

    • For these programs, we may use an inductive (syntax driven) approach:

    1-2-3


    Reaching definitions for structured programs

    d: a=b+c

    S

    S

    Reaching Definitions for Structured Programs

    gen[S] = {d}

    kill[S] = All-defs-of-a - {d}

    out[S] = gen[S]È ( in[S] - kill[S] )

    gen[S] = gen[S2]È ( gen[S1] - kill[S2] )

    kill[S] = kill[S2]È ( kill[S1] - gen[S2] )

    in[S1] = in[S]

    in[S2] = out[S1]

    out[S] = out[S2]

    S1

    S2


    Reaching definitions for structured programs cont

    S

    S

    Reaching Definitions for Structured Programs (cont.)

    gen[S] = gen[S1]Ègen[S2]

    kill[S] = kill[S1]Ç kill[S2]

    in[S1] = in[S2] = in[S]

    out[S] = out[S1]È out[S2]

    S1

    S2

    gen[S] = gen[S1]

    kill[S] = kill[S1]

    in[S1] = in[S]È gen[S1]

    out[S] = out[S1]

    S1


    Iterative solution data flow equations
    Iterative Solution: Data-Flow Equations

    • Inductive approach only applicable to structured programs

      • because utilizes the structure of the program to synthesize & distribute the data-flow information

    • Need a general technique: Iterative Approach

      • compute the gen/kill sets of each statement / basic block

      • initialize the in/out sets

      • repetitively compute out/in sets until a steady state is reached


    Reaching definitions2
    Reaching Definitions

    • Reaching definitions:

      • set of definitions that may reach (along one or more paths) a given point

      • gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S.

      • kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.

    • Equations:

      • in[S] =È (P a predecessor of S) out[P ]

      • out[S] = gen[S] È ( in[S] - kill[S] )


    Reaching definitions cont1
    Reaching Definitions (cont.)

    • Algorithm:

      for each basic block B: out[B] := gen[B]; (1)

      do

      change := false;

      for each basic block B do

      in[B] =È (P a predecessor of B) out[P ]; (2)

      old-out = out[B]; (3)

      out[B] = gen[B] È (in[B] - kill[B]); (4)

      if (out[B] != old-out) then change := true; (5)

      end

      while change


    Example for reaching definitions

    gen[b1] := {d1, d2, d3}

    kill[b1] := {d4, d5, d6, d7}

    i := m-1 d1

    j := n d2

    a := u1 d3

    b1

    gen[b2] := {}

    kill[b2] := {}

    gen[b3] := {}

    kill[b3] := {}

    i := i+1 d4

    j := j-1 d5

    b2

    gen[b4] := {}

    kill[b4] := {}

    a := u2 d6

    b3

    i := u3 d7

    b4

    Example for Reaching Definitions

    Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)

    b1

    b2

    b3

    b4

    initial

    in[B]

    000 0000

    000 0000

    000 0000

    000 0000

    out[B]

    000 0000

    000 0000

    000 0000

    000 0000

    pass1

    in[B]

    000 0000

    000 0000

    000 0000

    000 0000

    out[B]

    000 0000

    000 0000

    000 0000

    000 0000

    pass2

    in[B]

    000 0000

    000 0000

    000 0000

    000 0000

    out[B]

    000 0000

    000 0000

    000 0000

    000 0000

    pass3

    in[B]

    000 0000

    000 0000

    000 0000

    000 0000

    out[B]

    000 0000

    000 0000

    000 0000

    000 0000


    Generalizations other data flow analyses
    Generalizations: Other Data-Flow Analyses

    • Reaching definitions is a (forward; some-path) analysis

    • For backward analysis:

      • interchange in / out sets in the previous algorithm, lines (1-5)

    • For all-path analysis:

      • intersection is substituted for union in line (2)


    Common subexpression elimination
    Common Subexpression Elimination

    • Rule used to eliminate subexpression within a basic block

      • The subexpression was already defined

      • The value of the subexpression is not modified

        • i.e. none of the values needed to compute the subexpression are redefined

    • What about eliminating subexpressions across basic blocks?


    Available expressions
    Available Expressions

    • An expression x+y is available at a point p:

      • if every path from the initial node to p evaluates x+y, and

      • after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y.

    • Definitions:

      • forward, all-path,

      • e-gen[S]: expressions definitely generated by S,

        • e.g. “z := x+y”: expression “x+y” is generated

      • e-kill[S]: expressions that may be killed by S

        • e.g. “z := x+y”: all expression containing “z” are killed.

      • order: compute e-gen and then e-kill, e.g. “x:= x+y”


    Available expressions cont
    Available Expressions (cont.)

    • Algorithm:

      for each basic block B: out[B] := e-gen[B]; (1)

      do

      change := false;

      for each basic block B do

      in[B] = Ç (P a predecessor of B) out[P]; (2)

      old-out = out[B]; (3)

      out[B] = e-gen[B] È (in[B] - e-kill[B]); (4)

      if (out[B] != old-out) then change := true; (5)

      end

      while change

      difference: line (2), use intersection instead of union


    Pointer analysis
    Pointer Analysis

    • Identify the memory locations that may be addressed by a pointer

      • may be formalized as a system of data-flow equations.

    • Simple programming model:

      • pointer to integer (or float, arrays of integer, arrays of float)

      • no pointer to pointers allowed

    • Definitions:

      • in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S.

      • out[S]: the set of pairs (p, a), where p might point to a after statement S.

      • gen[S]: the new pairs (p, a) generated by the statement S.

      • kill[S]: the pairs (p, a) killed by the statement S.


    Pointer analysis cont

    S: a=b+c

    S: p = &a

    S: p = q

    Pointer Analysis (cont.)

    input set

    gen[S ] = { }

    kill[S ] = { }

    input set

    gen[S ] = { (p, a) }

    kill[S, input set ] = { (p, b)

    | (p, b) is in input set }

    input set

    gen[S, input set ] = { (p, b)

    | (q, b) is in input set }

    kill[S, input set ] = { (p, b)

    | (p, b) is in input set }


    Pointer analysis cont1
    Pointer Analysis (cont.)

    • Algorithm:

      for each basic block B: out[B] := gen[B ]; (1)

      do

      change := false;

      for each basic block B do

      in[B] =È (P a predecessor of B) out[P]; (2)

      old-out = out[B]; (3)

      out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4)

      if (out[B] != old-out) then change := true; (5)

      end

      while change

      difference: line (4): gen and kill are functions of B and in[B].


    Performance of iterative solutions
    Performance of Iterative Solutions

    • Global analysis may be memory-space / computing intensive

    • May be reduced by

      • using bitvector representations for sets

      • analyzing only relevant variables

        • e.g. temporary variables may be ignored

      • synthesizing data-flow within basic block

      • mixing inductive and iterative solutions

      • suitably ordering the basic block

        • e.g. depth first order is good for forward analysis

      • limiting scope

        • may reduce the precision of analysis


    Summary
    Summary

    • Iterative algorithm:

      • solve data-flow problem for arbitrary control flow graph

    • To solve a new data-flow problem:

      • define gen/kill accordingly

      • determine properties:

        • forward / backward

        • some-path / all-path


    ad