CMPUT680

CMPUT680 Topic G: Static Single-Assignment Form José Nelson Amaral CMPUT 680 - Compiler Design and Optimization

Reading Material Chapter 19 of the “Tiger book” (with a grain of salt!!). Bilardi, G., Pingali, K., “The Static Single Assignment Form and its Computation,” unpublished? (citeseer). Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., Zadeck, F. K., “An Efficient Method of Computing Static Single Assignment Form,” ACM Symposium on Principles of Programming Languages (PoPL), pp. 25-35, Austin, TX, Jan., 1989. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., “Efficiently Computing Static Single Assignment Form and the Control Dependence Graph,”ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 13, No. 4, October, 1991, pp. 451-490. Sreedhar, V. C., Gao, G. R., “A Linear Time Algorithm for Placing -Nodes,” ACM Symposium on Principles of Programming Languages (PoPL), pp. 62-73, 1995. CMPUT 680 - Compiler Design and Optimization

Static Single-Assignment Form Each variable has only one definition in the program text. This single staticdefinition can be in a loop and may be executed many times. Thus even in a program expressed in SSA, a variable can be dynamically defined many times. CMPUT 680 - Compiler Design and Optimization

Advantages of SSA Simpler dataflow analysis No need to use use-def/def-use chains, which requires N∗M space for N uses and M definitions SSA form relates in a useful way with dominance structures. SSA simplifies algorithms that construct interference graphs. CMPUT 680 - Compiler Design and Optimization

Main Goal of SSA • To create a sparse representation of the flow of values in a computer program. • Prevents propagation of a value through regions of the program that do not use it. • Factorize value propagation • If p definitions must be propagated to q uses, a dense flow representation requires p×q edges. An SSA representation requires only p+q edges. CMPUT 680 - Compiler Design and Optimization

SSA Form in Control-Flow Path Merges Is this code in SSA form? B1 b ← M[x] a ← 0 No, two definitions of a appear in the code (in B1 and B3) B2 if b<4 How can we transform this code into a code in SSA form? B3 a ← b We can create two versions of a, one for B1 and another for B3. B4 c ← a + b CMPUT 680 - Compiler Design and Optimization

We define a fictional function that “knows” which control path was taken to reach the basic block B4: SSA Form in Control-Flow Path Merges But which version should we use in B4 now? B1 b ← M[x] a1 ← 0 B2 if b<4 B3 a2 ← b B4 c ←a?+ b CMPUT 680 - Compiler Design and Optimization

We define a fictional function that “knows” which control path was taken to reach the basic block B4: SSA Form in Control-Flow Path Merges But which version should we use in B4 now? B1 b ← M[x] a1 ← 0 B2 if b<4 B3 a2 ← b B4 a3 ← ψ(a2,a1) c ← a3 + b CMPUT 680 - Compiler Design and Optimization

SSA Form in Control-Flow Path Merges B1 b ← M[x] a1 ← 0 In some compilers the actual representation of this Φ function is: Φ(a1,a2,B2,B3) B2 if b<4 B3 a2 ← b B4 a3 ← ϕ(a2,a1) c ← a3 + b CMPUT 680 - Compiler Design and Optimization

A Loop Example a ← 0 a0 ← undef b0 ← undef c0 ← undef a1 ← 0 b ← a+1 c ← c+b a ← b*2 if a < N b1 ← a?+1 c1 ← c?+b1 a2 ← b1*2 if a2 < N return return CMPUT 680 - Compiler Design and Optimization

A Loop Example a ← 0 a0 ← undef b0 ← undef c0 ← undef a1 ← 0 b ← a+1 c ← c+b a ← b*2 if a < N a3 ← Φ(a1,a2) b1 ← a3+1 c1 ← c?+b1 a2 ← b1*2 if a2 < N return return CMPUT 680 - Compiler Design and Optimization

A Loop Example a ← 0 a0 ← undef b0 ← undef c0 ← undef a1 ← 0 b ← a+1 c ← c+b a ← b*2 if a < N a3 ← Φ(a1,a2) c2 ← Φ(c0,c1) b1 ← a3+1 c1 ← c2+b1 a2 ← b1*2 if a2 < N return return CMPUT 680 - Compiler Design and Optimization

A Loop Example a ← 0 a0 ← undef b0 ← undef c0 ← undef a1 ← 0 b ← a+1 c ← c+b a ← b*2 if a < N a3 ← ϕ(a1,a2) c2 ← ϕ(c0,c1) b2 ← ϕ(b0,b1) b1 ← a3+1 c1 ← c2+b1 a2 ← b1*2 if a2 < N return Φ(b0,b1) is not necessary because b2 is never used. But the phase that generates Φ functions does not know it. Unnecessary Φ functions are later eliminated by dead code elimination. return CMPUT 680 - Compiler Design and Optimization

The Φ Function How can we implement a Φ function that “knows” which control path was taken? Answer 1: We don’t!! The Φ function is used only to connect use to definitions during optimization, but is never implemented. Answer 2: If we must execute the Φ function, we can implement it by inserting MOVE instructions in all control paths. CMPUT 680 - Compiler Design and Optimization

Criteria for Inserting Φ Functions We could insert one Φ function for each variable at every joinpoint (a point in the CFG with more than one predecessor). But that would be wasteful. What criteria should we use to insert a Φ function for a variable a at nodez of the CFG? Intuitively, we should add a function Φ if there are two definitions of a that can reach the point z through distinct paths. CMPUT 680 - Compiler Design and Optimization

Path Convergence Criterion (Cytron-Ferrante/89) Insert a Φ function for a variable a at a node z if allthe following conditions are true: 1. There is a block x that defines a 2. There is a block y≠ x that defines a 3. There are paths x→z and y→z 4. Paths x→z and y→z don’t have any nodes in common other than z 5. The node z does not appear in bothx→z and y→z prior to the end, but it may appear in one or the other. Note: The start node contains an implicit definition of every variable. CMPUT 680 - Compiler Design and Optimization

Examples y y y y x x x x a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ a ← ⋅⋅⋅ z z z z CMPUT 680 - Compiler Design and Optimization

Φ-Candidates are Join Nodes Notice that according to the path convergence criterion, the node z that will receive the Φ function must be a join node. z is the first node that joins the paths Pxz and Pyz. CMPUT 680 - Compiler Design and Optimization

This algorithm is extremely costly, because it requires the examination of every triple of nodes x, y, z and every path from x to z and from y to z. Can we do better? Iterated Path-Convergence Criterion The Φ function itself is a definition of a. Therefore the path-convergence criterion is a set of equations that must be satisfied. whilethere are nodes x, y, z satisfying conditions 1-5 andz does not contain a Φ function for a do inserta← Φ(a0, a1, …, an) at node z CMPUT 680 - Compiler Design and Optimization

The SSA Conversion Problem For each variable x defined in a CFG G=(V,E), given the set of nodes S ⊆ V such that each S contains adefinition for x, find the minimal set J(S) of nodes that requires a Φ(xi,xj) function. By definition, the START node defines all the variables, therefore ∀ S ⊆ V, START ∈ S. If we need to compute Φ nodes for several variables, it may be efficient to precompute data structures based on the CFG. CMPUT 680 - Compiler Design and Optimization

Processing Time for SSA Conversion The performance of an SSA conversion algorithm should be measured by the processing timeTp, the preprocessing spaceSp, and the query timeTq. (Shapiro and Saint 1970): outline an algorithm (Reif and Tarjan 1981): extend the Lengauer-Tarjan dominator algorithm to compute Φ-nodes. (Cytron et al. 1991): show that SSA conversion can use the idea of dominance frontiers, resulting on an O(|V|2) algorithm. (Sreedhar and Gao, 1995): An O(|E|) algorithm, but in private commun. with Pingali in 1996 admits that it is in practice 5 times slower than Cytron et al. CMPUT 680 - Compiler Design and Optimization

Processing Time for SSA Conversion Bilardi, Pingali, 1999:present a generalized framework and a parameterized Augmented Dominator Tree (ADT) algorithm that allows for a space-time tradeoff. They show that Cytron et al. and Gao-Shreedhar are special cases of the ADT algorithm. • Bilardi and Pingali describe three strategies to compute • Φ-placement: • Two-Phase Algorithms • Lock-Step Algorithms • Lazy Algorithms CMPUT 680 - Compiler Design and Optimization

CFG DF Computation DF Graph S Reachability J(S) Two-Phase Algorithms First build the entire Dominance Frontier Graph, then find the nodes reachable from S Simple DF Graph may be quite large CMPUT 680 - Compiler Design and Optimization

Lock-Step Algorithms Performs the reachability computation incrementally while the DF relation is computed. CFG • Avoid storing the DF Graph. • Perform computations at all • nodes of the graph, even though • most are irrelevant • Inneficient when computing the • Φ-nodes for many variables. DF Computation Reachability S J(S) CMPUT 680 - Compiler Design and Optimization

Lazy Algorithms Lazily compute only the portion fo the DF Graph that is needed. Carefully select a portion of the DF Graph to compute eagerly (before it is needed). CFG DF Computation A Two-Phase Algorithm is an extreme case of a lazy algorithm. DF Graph SubGraph Reachability S J(S) CMPUT 680 - Compiler Design and Optimization

Computing a Dominator Tree (n: # of nodes; m: # of edges) (Lowry and Medlock, 1969):Introduce the problem and give an O(n4) algorithm. (Lengauer and Tarjan, 1979):Give a complicated O(mα(m.n)) algorithm [α(m.n) is the inverse Ackermann’s function]. (Harel, 1985):Give a linear time algorithm. (Alstrup, Harel and Thorup, 1997):Give a simpler version of Harel’s algorithm. CMPUT 680 - Compiler Design and Optimization

Dominance Property of the SSA Form In SSA form definitions dominate uses, i.e.: 1. If x is used in a Φ function in block n, then the definition of x dominates every predecessor of n. 2. If x is used in a non-Φ statement in block n, then the definition of x dominates n. CMPUT 680 - Compiler Design and Optimization

The Dominance Frontier A node xdominates a node w if every path from the start node to w must go through x. A node xstrictly dominates a node w if x dominates w and x≠ w. The dominance frontier of a node x is the set of all nodes w such that x dominates a predecessor of w, but x does not strictly dominates w. CMPUT 680 - Compiler Design and Optimization

2 9 5 3 10 6 11 7 12 8 4 Example 1 13 What is the dominance frontier of node 5? CMPUT 680 - Compiler Design and Optimization

2 9 5 3 10 6 11 7 12 8 4 Example 1 13 First we must find all nodes that node 5 dominates. CMPUT 680 - Compiler Design and Optimization

2 9 5 3 10 6 11 7 12 8 4 Example 1 13 A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominatesw itself. What is the dominance frontier of 5? CMPUT 680 - Compiler Design and Optimization

10 6 11 7 Example 1 2 5 9 3 8 12 4 13 A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominatesw itself. What is the dominance frontier of 5? CMPUT 680 - Compiler Design and Optimization

10 6 7 11 Example 1 2 5 9 3 8 12 4 DF(5) = {4, 5, 12, 13} 13 A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominatesw itself. What is the dominance frontier of 5? CMPUT 680 - Compiler Design and Optimization

Dominance Frontier Criterion Dominance Frontier Criterion: If a node x contains a definition of variable a, then any node z in the dominance frontier of x needs a Φ function for a. Can you think of an intuitive explanation for why a node in the dominance frontier of another node must be a join node? CMPUT 680 - Compiler Design and Optimization

10 6 7 11 Example 1 If a node (12) is in the dominance frontier of another node (5), than there must be at least two paths converging to (12). 2 5 9 3 8 12 4 These paths must be non-intersecting, and one of them (5,7,12) must contain a node strictly dominated by (5). 13 CMPUT 680 - Compiler Design and Optimization

Dominator Tree To compute the dominance frontiers, we first compute the dominator tree of the CFG. There is an edge from node x to node y in the dominator tree if node x immediately dominates node y. I.e., xdominatesy≠x, and xdoes not dominate any other dominator of y. Dominator trees can be computed using the Lengauer-Tarjan algorithm(1979). See sec. 19.2 of Appel. CMPUT 680 - Compiler Design and Optimization

6 10 7 11 Example: Dominator Tree 1 2 5 9 Dominator Tree 3 1 8 12 4 2 4 5 12 9 13 3 10 11 13 Control Flow Graph 6 7 8 CMPUT 680 - Compiler Design and Optimization

Local Dominance Frontier Cytron-Ferrante define the local dominance frontier of a node n as: DFlocal[n] = successors of n in the CFG that are not strictly dominated by n CMPUT 680 - Compiler Design and Optimization

10 6 7 11 Example: Local Dominance Frontier In the example, what are the local dominance frontiers of nodes 5, 6 and 7? 1 2 5 9 DFlocal[5] =  DFlocal[6] = {4,8} DFlocal[7] = {8,12} 3 8 12 4 13 Control Flow Graph CMPUT 680 - Compiler Design and Optimization

Dominance Frontier Inherited From Its Children The dominance frontier of a node n is formed by its local dominance frontier plus nodes that are passed up by the children of n in the dominator tree. The contribution of a node c to its parent’s dominance frontier is defined as [Cytron-Ferrante, 1991]: DFup[c] = nodes in the dominance frontier of c that are not strictly dominated by the immediate dominator of c CMPUT 680 - Compiler Design and Optimization

10 6 7 11 Example: Local Dominance Frontier 1 In the example, what are the contributions of nodes 6, 7, and 8 to its parent dominance frontier? 2 5 9 3 8 12 First we compute the DF and the immediate dominator of each node: DF[6] = {4,8}, idom(6)= 5 DF[7] = {8,12}, idom(7)= 5 DF[8] = {5,13}, idom(8)= 5 4 13 Control Flow Graph CMPUT 680 - Compiler Design and Optimization

6 10 11 7 DFup[c] = nodes in the dominance frontier of c that are not strictly dominated by the immediate dominator of c Example: Local Dominance Frontier First we compute the DF and the immediate dominator of each node: DF[6] = {4,8}, idom(6)= 5 DF[7] = {8,12}, idom(7)= 5 DF[8] = {5,13}, idom(8)= 5 1 2 5 9 3 8 12 4 Now we check for the DFup condition: DFup[6] = {4} DFup[7] = {12} DFup[8] = {5,13} 13 Control Flow Graph CMPUT 680 - Compiler Design and Optimization

A note on implementation We want to represent these sets efficiently: DF[6] = {4,8} DF[7] = {8,12} DF[8] = {5,13} If we use bitvectors to represent these sets: DF[6] = 0000 0001 0001 0000 DF[7] = 0001 0001 0000 0000 DF[8] = 0010 0000 0010 0000 CMPUT 680 - Compiler Design and Optimization

Strictly Dominated Sets We can also represent the strictly dominated sets as vectors: SD[1] = 0011 1111 1111 1100 SD[2] = 0000 0000 0000 1000 SD[5] = 0000 0001 1100 0000 SD[9] = 0000 1100 0000 0000 Dominator Tree 1 2 4 5 12 9 13 3 10 11 6 7 8 CMPUT 680 - Compiler Design and Optimization

A note on implementation If we use bitvectors to represent these sets: DF[6] = 0000 0001 0001 0000 DF[7] = 0001 0001 0000 0000 DF[8] = 0010 0000 0010 0000 SD[5] = 0000 0001 1100 0000 DFup[c] = DF[6] ^ ~SD[5] DFup[c] = nodes in the dominance frontier of c that are not strictly dominated by the immediate dominator of c CMPUT 680 - Compiler Design and Optimization

Dominance Frontier Inherited From Its Children The dominance frontier of a node n is formed by its local dominance frontier plus nodes that are passed up by the children of n in the dominator tree. Thus the dominance frontier of a node n is defined as [Cytron-Ferrante, 1991]: CMPUT 680 - Compiler Design and Optimization

10 6 7 11 Example: Local Dominance Frontier 1 What is DF[5]? Remember that: DFlocal[5] =  DFup[6] = {4} DFup[7] = {12} DFup[8] = {5,13} DTchildren[5] = {6,7,8} 2 5 9 3 8 12 4 13 Control Flow Graph CMPUT 680 - Compiler Design and Optimization

6 10 11 7 Example: Local Dominance Frontier 1 What is DF[5]? Remember that: DFlocal[5] = ∅ DFup[6] = {4} DFup[7] = {12} DFup[8] = {5,13} DTchildren[5] = {6,7,8} 2 5 9 3 8 12 4 13 Control Flow Graph Thus, DF[5] = {4, 5, 12, 13} CMPUT 680 - Compiler Design and Optimization

Join Sets In order to insert Φ-nodes for a variable x that is defined in a set of nodes S={n1, n2, …, nk} we need to compute the iterated set of join nodes of S. Given a set of nodes S of a control flow graph G, the set of join nodes of S, J(S), is defined as follows: J(S) ={z∈ G| ∃ two paths Pxz and Pyz in G that have z as its first common node, x ∈ S and y ∈ S} CMPUT 680 - Compiler Design and Optimization

Iterated Join Sets Because a Φ-node is itself a definition of a variable, once we insert Φ-nodes in the join set of S, we need to find out the join set of S ∪ J(S). Thus, Cytron-Ferrante define the iterated join set of a set of nodes S, J+(S), as the limit of the sequence: CMPUT 680 - Compiler Design and Optimization

CMPUT680

CMPUT680

Presentation Transcript

CMPUT680 - Winter 2006

CMPUT680 - Winter 2006

CMPUT680 - Fall 2003

CMPUT680 - Fall 2003

CMPUT680 - Winter 2006

CMPUT680 - Fall 2003

CMPUT680 - Fall 2003

CMPUT680 - Winter 2006

CMPUT680 - Winter 2006

CMPUT680 - Winter 2006

CMPUT680 - Winter 2001

CMPUT680 - Fall 2006

CMPUT680 - Winter 2006