1 / 95

CMPUT680 - Winter 2001

CMPUT680 - Winter 2001. Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680.

hedy-hoover
Download Presentation

CMPUT680 - Winter 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPUT680 - Winter 2001 Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization

  2. Touati, Sid Ahmed Ali, “Register Saturation in Superscalar and VLIW Codes,” 10th International Conference on Compiler Construction, Genova, Italy, April 2001, pp. 213-228. Touati, S.-A.-A., Thomasset, F., “Register Saturation in Data Dependence Graphs,” Research Report RR-3978, INRIA, July 2000. Touati, S.-A.-A., “Optimal Register Saturation in Acyclic Superscalar and VLIW Codes,” Researchh Report, INRIA, Nov. 2000. Reading List CMPUT 680 - Compiler Design and Optimization

  3. Minimum Register Instruction Sequence (MRIS)Problem Given the Data Dependence GraphG for a basic block, derive an instruction sequenceS for G that is optimal in the sense that its register requirement is minimum. CMPUT 680 - Compiler Design and Optimization

  4. b c d e f g Intuition for Our Solution a Our intuition is to find sub-sets of nodes that can definitely share a register to inform the instruction sequencing algorithm. h i Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  5. a d e b L1 = [a, b, f, h, i) f h Instruction Lineages An instruction lineage is a sequence of instructions in which a single register is passed from instruction to instruction (except for the last). a b c f g h How can we ensure that instructions a, b, f, and h will be able to share the same register? i Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  6. L1 = [a, b, f, h, i) Thus the lineage formation inserts sequencing edges in the DDG. Sequencing Edges The lineage formation imposed a scheduling restriction in the DDG: the selected heir of a node must be the last node listed among its siblings. a b c d e f g h i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  7. Node Height L1 = [a, b, f, h, i) If the introduction of sequencing edges was to produce a cycle in the DDG, it would be impossible to find a legal instruction sequence. a b c d e f g Thus we use the height of the nodes, recomputed after each lineage formation, to select the heir. Ties are broken arbitrarily. h i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  8. c d e L2 = [c, f) g L3 = [e, g, h) L4 = [d, g) Lineage Formation L1 = [a, b, f, h, i) For the next lineage, the heighest nodes not in a lineage are c, d, e, all with a height of 5. a b c d e f g h i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  9. a b c d e f g h i Lineage Interference L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vm) definitely overlap if: (i) u1reaches vn, and (ii) v1reaches um. Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  10. Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) a L4 = [d, g) b c d e Which lineages does lineage L1 definely overlap with? f g h L1 L4 How about lineages L2 and L4? i Augmented Data Dependence Graph L2 L3 Lineage Interference Graph CMPUT 680 - Compiler Design and Optimization

  11. Lineage Fusion Condition L1 = [a, b, f, h, i) L1 L4 L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a L2 L3 Lineages Lineage Interference Graph b c d e Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) can be fused into a single lineage if: (i) u1reaches vn, and (ii) v1does not reach um. f g h i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  12. Lineage Fusion Condition L1 = [a, b, f, h, I) L1 L4 L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a L2 L3 Lineages Lineage Interference Graph b c d e f g Which lineages can be fused in the example? h d reaches f, and c does not reach g i Augmented Data Dependence Graph Thus L4 can be fused with L2 to form L5 = [d, g)  [c, f) CMPUT 680 - Compiler Design and Optimization

  13. Lineage Fusion L1 = {a, b, f, h, i} L1 L4 L2 = {c, f} L3 = {e, g, h} L4 = {d, g} a L2 L3 Lineages Lineage Interference Graph b c d e When Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) are fused: (1) a scheduling edge from um to v1 is introduced in the augmented DDG (2) Lu and Lv are removed from the LIG (3) a new lineage Lw = Lu  Lv is inserted in LIG f g h i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  14. Lineage Fusion Condition L1 = [a, b, f, h, I) L1 L3 = [e, g, h) L5 = [d, g)  [c, f) a L5 L3 Lineages Lineage Interference Graph b c d e f g Thus the fusion of L4 with L2 form L5 = [d, g)  [c, f) h How many colors we need to color the LIG? i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  15. Lineage Fusion Condition L1 = [a, b, f, h, I) L1 L3 = [e, g, h) L5 = [d, g)  [c, f) a L5 L3 Lineages Lineage Interference Graph b c d e f g We need three colors. h Can we find an instruction sequence? i Augmented Data Dependence Graph CMPUT 680 - Compiler Design and Optimization

  16. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph Sequence CMPUT 680 - Compiler Design and Optimization

  17. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a Sequence CMPUT 680 - Compiler Design and Optimization

  18. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d Sequence CMPUT 680 - Compiler Design and Optimization

  19. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e Sequence CMPUT 680 - Compiler Design and Optimization

  20. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g Sequence CMPUT 680 - Compiler Design and Optimization

  21. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g c Sequence CMPUT 680 - Compiler Design and Optimization

  22. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g c b Sequence CMPUT 680 - Compiler Design and Optimization

  23. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g c b f Sequence CMPUT 680 - Compiler Design and Optimization

  24. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g c b f h Sequence CMPUT 680 - Compiler Design and Optimization

  25. Sequencing by List Scheduling a L1 RA L1 = [a, b, f, h, I) RB L3 = [e, g, h) b c d e L5 = [d, g)  [c, f) RC L5 L3 Lineages f g Lineage Interference Graph Registers h i Augmented Data Dependence Graph a d e g c b f h i Sequence CMPUT 680 - Compiler Design and Optimization

  26. Summary of Our Solution Method • A “good” construction algorithm for LIG (dynamic) • An effective heuristic method to calculate the HRB • An efficient scheduling method (do not backtrack) DDG Form Lineage Interference Graph (LIG) Derive HRB Extended list-scheduling guided by HRB A good instruction sequence CMPUT 680 - Compiler Design and Optimization

  27. Register Saturation (Touati) Given a data depende graph G, the register saturation (RS) of G is the maximal register need for any schedule of G. Touati’s strategy is to compute the RS of the G and, if RS exceeds the number of available registers, to reduce the RS by introducing new arcs in G. The intuition is that by using either (1) all available registers or (2) the maximal registers that G can use, instruction level parallelism is maximized. CMPUT 680 - Compiler Design and Optimization

  28. The HRB and the RS Govind, Gao, Yang, Amaral, and Zhang had earlier proposed an alternative method: to find an heuristic register bound (HRB) to be used as a guidance in a modified list scheduling. Their goal is to find a schedule that uses a minimum number of registers. To compare both methods we will apply Touati’s method to Govind et al.’s example, and Govind’s method to Touati’s example. CMPUT 680 - Compiler Design and Optimization

  29. pkillG(u) = { v Cons(u) / v  Cons(u) = {v} } v is the set of all descendents of v, including v. w  Cons(u) iff (w,u)  G Potencial Killers To find the RS(G), we need to know which operation must kill each value generated. Touati’s define the set of operations that are potential killers of the value generated by an operation u G. Thus a node v is a potential killer of the value generated by a node u if and only if v consumes u and no descendent of v consumes u. CMPUT 680 - Compiler Design and Optimization

  30. Potencial Killing Graph The edges of the Potential Killing Graph of a DDG G, PK(G)=(V, EPK), are defined as follows: EPK = {(u,v) / u VR v  pkillG(u)} VR is the set of operations that define a value, i.e., operations that need a register. CMPUT 680 - Compiler Design and Optimization

  31. b c d e f g Govind’s Example: Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a h i DDG G CMPUT 680 - Compiler Design and Optimization

  32. b c d e f g Govind’s Example: Potential Kill Graph pkillG(a) = {b, c, d, e} pkillG(b) = {f} pkillG(c) = {f} pkillG(d) = {g} pkillG(e) = {g} pkillG(f) = {h} pkillG(g) = {h} pkillG(h) = {i} a h i DDG G CMPUT 680 - Compiler Design and Optimization

  33. b b c c d d e e f f g g Govind’s Example: Potential Kill Graph a a h h i i DDG G PK(G) * In this example the DDG G and the potential kill graph PK(G) are identical. In general that is not the case. CMPUT 680 - Compiler Design and Optimization

  34. Choosing the Killer If a node u has more than one potential killer, Touati defines a killing function, k(u), that specifies which one among the potential killers of u will actually kill u. A killing function imposes a scheduling order in the DDG: all other consumers of u , Cons(u), must be scheduled before k(u) is scheduled. To represent these scheduling constraints, Touati defines an extended DAG, Gk, induced by the killing function k. CMPUT 680 - Compiler Design and Optimization

  35. b c d e f g Govind’s Example: Killing Function In this example, node a is the only node with multiple potential killers. a pkillG(a) = {b, c, d, e} pkillG(b) = {f} pkillG(c) = {f} pkillG(d) = {g} pkillG(e) = {g} pkillG(f) = {h} pkillG(g) = {h} pkillG(h) = {i} h i PK(G) CMPUT 680 - Compiler Design and Optimization

  36. Govind’s Example: Killing Function a If we choose k(a) = b, we obtain the Gk on the left. b c d e f g pkillG(a) = {b, c, d, e} pkillG(b) = {f} pkillG(c) = {f} pkillG(d) = {g} pkillG(e) = {g} pkillG(f) = {h} pkillG(g) = {h} pkillG(h) = {i} h i Gk CMPUT 680 - Compiler Design and Optimization

  37. Selecting a Good Set of Killers... If the killing function for multiple nodes with multiple potential killers is choosen arbitrarily, it might induce cycles in Gk. A valid killing function is one that does not induce cycles in Gk. CMPUT 680 - Compiler Design and Optimization

  38. The descendents of k(u) cannot be simultaneously alive with u. Touati defines the Disjoint Value Graph, DVk(G) = (VR, EDV), by: EDV = {(u,v) / u, v VR  v  Rk(u)} Avoiding Vengeance... A killer must kill before it has children, thus... An edge (u,v) in DVk(G) means that the live interval of u is always before the live interval of v in any schedule of Gk. CMPUT 680 - Compiler Design and Optimization

  39. Govind’s Example: Disjoint Value Graph k(a) = {b} k(b) = {f} k(c) = {f} k(d) = {g} k(e) = {g} k(f) = {h} k(g) = {h} k(h) = {i} a b c d e a f g b c d e h f g i h Gk i * simplified by transitive reduction DVk(G) CMPUT 680 - Compiler Design and Optimization

  40. An antichain in a graph G(E,V) is a set of nodes A such that there are no paths between the nodes in A: A = {u, v  V / (u,v)  Ec  (v,u)  Ec} Register Need and Maximal Antichains The register need of any schedule of Gkis always less than or equal to a maximal antichain in DVk(G). Where Ec is the transitive closure of G: (u,v) Ec: (u,v)  Ec iff  a path p = (u, …, v) in G. CMPUT 680 - Compiler Design and Optimization

  41. Govind’s Example: Maximal Antichain a The maximal antichain in this example is: b c d e f g AMk = {a, c, d, e} h Thus this graph, with this killing function can use at most 4 registers. i DVk(G) CMPUT 680 - Compiler Design and Optimization

  42. Register Saturating Scheduling Touati proves that: For every valid killing k(V) function, there is always a schedule that makes all the values in the maximal antichain of the disjoint value DAG DVk(G) simultaneously alive. CMPUT 680 - Compiler Design and Optimization

  43. Saturating Killing Function To find the register saturation of a DDG, we need to find a killing function that maximizes the maximal antichain in DVk(G). In other words, we need to find a killing function that maximizes the number of nodes that are not connected by a path in DVk(G). Touati calls this the maximizing maximal antichain (MMA) problem. A solution to the MMA problem is a saturating killing function. MMA is NP-complete. CMPUT 680 - Compiler Design and Optimization

  44. /  e, e’  Ecb / target(e) = source (e’) Heuristic to Compute Register Saturation To compute the register saturation, Touati starts by decomposing the potential kill graph PK(G) into connected bipartite components. A bipartite component, cb = (Scb, Tcb, Ecb), is a graph with a set of source nodes Scb, a set of target nodes Tcb, and a set of edges Ecb. cb must obey the following conditions. If e  EPK  e’  Ecb  e, e’ share an endpoint, then e  Ecb CMPUT 680 - Compiler Design and Optimization

  45. Bipartite Decomposition of PK(G) A bipartite decomposition of the potential killing graph PK(G) is a set of bipartite components such that for every edge e PK(G), there is a bipartite component cb in the decomposition such that e  Ecb. Touati proves that given a DDG G, there is only one bipartite decomposition of G. CMPUT 680 - Compiler Design and Optimization

  46. a b b b c c c d d d e e e f g f g h h i Govind’s Example: Bipartite Decomposition a f g h i PK(G) Bipartite Decomposition CMPUT 680 - Compiler Design and Optimization

  47. Saturating Killing Set Touati defines the Saturating Killing Set of a connected bipartite component cb, SKS(cb), as a subset of the target nodes, Tcb’  Tcbsuch that: (1) All the source nodes, Scb, are contained in the union of all predecessors of the nodes in Tcb’. (2) Tcb’ contains a minimum number of nodes. Computing the SKS is an NP-complete problem. CMPUT 680 - Compiler Design and Optimization

  48. a b b c c d d e e f g f g h h i Govind’s Example: Saturating Killing Set In this example the computation of SKS is trivial. The only component with a non-unitary target set is the top one. The selection of any single node in the set Tcb = {b, c, d, e} covers the set Scb = {a}. Thus the selection can be arbitrary. Bipartite Decomposition CMPUT 680 - Compiler Design and Optimization

  49. Govind’s Example As we seen earlier with k(a) = b, the register saturation in Govind’s example is 4. And a schedule that has four values alive at the same time can be found. Using the lineage method, Govind et al. found a schedule for their example that uses three registers. What does Touati’s method does if only three registers are available? CMPUT 680 - Compiler Design and Optimization

  50. Reducing RS Touati proposes an algorithm to reduce the register saturation while trying not to increase the length of the critical path. The algorithm starts by computing the maximal antichain AMk. Then it starts an interative process in which the first step is to construct the set Uk of all admissible serializations between the saturating values in AMk with their costs. CMPUT 680 - Compiler Design and Optimization

More Related