1 / 35

CMPUT680 - Fall 2003

CMPUT680 - Fall 2003. Topic J: Wavefront Scheduling José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Reading Material. Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and

kevork
Download Presentation

CMPUT680 - Fall 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization

  2. Reading Material Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp. 100-113. Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April 3 1999 CMPUT 680 - Compiler Design and Optimization

  3. New Concepts Global Code Scheduler (GCS) Region Formation Wavefront Scheduling Path Vectors Deferred Compensation P-ready Code Motion CMPUT 680 - Compiler Design and Optimization

  4. Scheduling Regions Similar to Mahlke’s definition, here a region is a subgraph of a control flow graph that has a unique entry node that dominates all the nodes in the region. There is a further restriction that the regions must be acyclic. CMPUT 680 - Compiler Design and Optimization

  5. JS-nodes A split node in a CFG is a node that has more than one immediate successor. B B C A join node in a CFG is a node that has more than one immediate predecessor. D D A Join-Split (JS) edge in a CFG goes from a split node to a join node. CMPUT 680 - Compiler Design and Optimization

  6. B C D B C G D Removal of JS-nodes The application of the wavefront scheduling technique requires the removal of al JS-nodes. A JS-node is removed by adding an empty block (called a JS block) between the split node and the join node. CMPUT 680 - Compiler Design and Optimization

  7. D D Interface Blocks A side entry node is a node in the region that has at least one immediate predecessor in the region, and at least one immediate predecessor outside the region. B C D E Which nodes are side entry nodes in the example? CMPUT 680 - Compiler Design and Optimization

  8. C C C D D D C and D Interface Blocks A side exit node is a node in the region that has at least one immediate successor in the region, and at least one immediate successor outside the region. B E Which nodes are side exit nodes in the example? CMPUT 680 - Compiler Design and Optimization

  9. Interface Blocks When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff: (i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or (ii) y is outside the region, x is a side exit node, and there is an edge (x,y). CMPUT 680 - Compiler Design and Optimization

  10. B C D E Interface Blocks Where do we need interface blocks in the following example? CMPUT 680 - Compiler Design and Optimization

  11. Interface Blocks We need three interface blocks. B F C D H G E CMPUT 680 - Compiler Design and Optimization

  12. Hierarchical Regions For the global code scheduler, regions are hierarchical: (1) First the code of an inner most loop is selected and scheduled. (2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph. CMPUT 680 - Compiler Design and Optimization

  13. A A B B F1 F1 F2 C G F2 C F3 F3 D D H J K I E E Nested Regions H and I are interface blocks G, J, and K are JS blocks CMPUT 680 - Compiler Design and Optimization

  14. Path Vectors There is a finite number of control paths in an acyclic scheduling region. A path vector is a bit vector in which each bit in the vector represents a unique path in a region. A subset of paths can be represented by a path vector by writing 1 for the paths in the subset and writing 0 for the paths not in the subset. CMPUT 680 - Compiler Design and Optimization

  15. A B F C G K I D H J E Paths in our Example Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI We can define the subset of all paths that include basic block G as BP(G) = {P2, P3} And we can represent this set by the block path vector: BPV(G) = [ 0 0 1 1 0 0] CMPUT 680 - Compiler Design and Optimization

  16. A B F C G K I D H J E Paths in our Example Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI CMPUT 680 - Compiler Design and Optimization

  17. Control Flow Relations We can compute control flow relations such as dominance, post-dominance, control equivalence, disjointness, etc, by performing bitwise operations on these path vectors. If BPV(x)=BPV(y), then blocks x and y are control flow equivalent. If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y. CMPUT 680 - Compiler Design and Optimization

  18. A B F C G K I D H J E Paths in our Example Example1: What is the relation between blocks B and D? Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Blocks B and D are control flow equivalent because BPV(B) = BPV(D). CMPUT 680 - Compiler Design and Optimization

  19. A B F C G K I D H J E Paths in our Example Example 2: What is the relation between blocks B and D? Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Either block A dominates or post-dominates block E because and BPV(A) is a superset of BPV(E). CMPUT 680 - Compiler Design and Optimization

  20. B D Paths in our Example Paths: P0: ABCDH P1: ABCDJE P2: ABGDH P3: ABGDJE P4: AFKE P5: AFI Example3: Likewise block E either dominates or post-dominates block K because and BPV(E) is a superset of BPV(K). A F C G K I H J E CMPUT 680 - Compiler Design and Optimization

  21. Problems with Cross-Block Scheduling Compensation code is code that needs to be scheduled somewhere else to compensate for the execution of an instruction M on a block x. Most cross-block scheduling techniques are not judicious when scheduling compensation code. Consider that the scheduling of an instruction M in block x requires compensation code in block y. Most schedulers cannot evaluate how desirable it is to place the compensation code in y. Some schedulers only allow M to be scheduled in x if y has not been scheduled yet. CMPUT 680 - Compiler Design and Optimization

  22. Wavefront A scheduling region is an acyclic region with JS edges eliminated and interface blocks added. A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:  nodes above the wavefront  nodes on the wavefront  nodes below the wavefront The wavefront is strongly independent in the sense that no control flow path flows through more than one node in the wavefront. CMPUT 680 - Compiler Design and Optimization

  23. Wavefront Dominance Property The wavefront nodes collectively dominate all the nodes below the wavefront, and collectively post-dominate all the nodes above the wavefront. Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefront This property guarantees that when an instruction originally in block k is scheduled in block w, compensation code can be inserted entirely into blocks in the wavefront. CMPUT 680 - Compiler Design and Optimization

  24. K I First try: {C, F} JS-nodes and Strongly Independent Cuts Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? A B F C D H J This wavefront does not post-dominate A,B nor it dominates D, H, J, E. E CMPUT 680 - Compiler Design and Optimization

  25. K I Second try: {C, D, F} JS-nodes and Strongly Independent Cuts Can you build a wavefront that includes C and satisfy the conditions of dominance, post-dominance, and no control path including more than one node in the wavefront? A B F C D H J The path ABCDH includes two nodes in the wavefront therefore the wavefront is not a strongly independent cut set. E CMPUT 680 - Compiler Design and Optimization

  26. A B F C G K I D H J E JS-nodes and Strongly Independent Cuts When the proper JS-node is inserted, we can easily find a wavefront that: (1) post-dominates all predecessors, (2) dominates all successors, and (3) is a strongly independent cut set (no control path includes more than one node in the wavefront). CMPUT 680 - Compiler Design and Optimization

  27. Wavefront Scheduling In directional scheduling (either top-down or bottom-up) there is a region of code that is already scheduled, another region that is not yet scheduled, and a boundary. In wavefront scheduling, the wavefront is this boundary. The wavefront moves up or down according to the direction of scheduling choosen. CMPUT 680 - Compiler Design and Optimization

  28. W0 A W1 B W2 F C G K I W3 W6 W4 D W5 H J E Example of Wavefront Scheduling CMPUT 680 - Compiler Design and Optimization

  29. B E Deferred Compensation Consider that an instruction M is originally in block A. If we want to move M downward we have to schedule M in all paths that contain an use of the variable defined by M. A C D F For instance, assume that there is an use of M in G. G CMPUT 680 - Compiler Design and Optimization

  30. B E Deferred Compensation Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG A Thus a clone of M must appear in paths P0, P1, and P2. C D F The compensation path vector of an instruction M is the set of all paths that must contain a clone of M when M is not scheduled in its original basic block. G CPV(M) = [1 1 1] CMPUT 680 - Compiler Design and Optimization

  31. B E Deferred Compensation Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG A W1 CPV(M) = [1 1 1] M’ C D F Assume that we decide that it is desirable to schedule a clone of M, M’, in block F. We update CPV(M) to: CPV(M) = CPV(M) - BPV(F) = [1 1 1] - [0 0 1] = [1 1 0] G CMPUT 680 - Compiler Design and Optimization

  32. B E Deferred Compensation Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG A CPV(M) = [1 1 0] W2 M’ C D F Assume that at W2 we decide to schedule a clone of M, M’’, in block C. CPV(M) = CPV(M) - BPV(C) = [1 1 1] - [1 0 0] = [0 1 0] G CMPUT 680 - Compiler Design and Optimization

  33. B E Deferred Compensation Path Summary: P0 = AFG P1 = ABDEG P2 = ABCEG A CPV(M) = [0 1 0] M’’ W2 M’ C D F Now we cannot close block D unless we schedule M. Because BPV(B) is a superset of CPV(M) we know that this is the last compensation copy of M to be scheduled. G CMPUT 680 - Compiler Design and Optimization

  34. When to Move Code? Bharadwaj, Menezes and McKinsey define the usefulness of moving code from an origin block O to a target block T in terms of the likelihood that control will flow through T andO given that control reaches T. CMPUT 680 - Compiler Design and Optimization

  35. CMPUT 680 - Compiler Design and Optimization

More Related