html5-img
1 / 32

VLIW Speculative Trace Scheduling

phaedra
Download Presentation

VLIW Speculative Trace Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. VLIW Speculative Trace Scheduling University of Tehran College of Engineering Department of Electrical & Computer Eng. Advanced Architecture Fall 2007 Presented on 10 Dec 2007 My website: http://www.behdadh.net University of Tehran: http://www.ut.ac.ir College of Engineering: http://www.eng.ut.ac.ir Department of Electrical & Computer Eng: http://ece.ut.ac.ir Presented on 10 Dec 2007 My website: http://www.behdadh.net University of Tehran: http://www.ut.ac.ir College of Engineering: http://www.eng.ut.ac.ir Department of Electrical & Computer Eng: http://ece.ut.ac.ir

    2. VLIW Speculative Trace Scheduling 2 VLIW Very Long Instruction Word Multiple operations packed into one instruction. Each operation slot is for a fixed function. Constant operation latencies are specified. [1]

    3. VLIW Speculative Trace Scheduling 3 VLIW Cont’d Statically scheduled versus Dynamically scheduled multi-issue superscalars. No hazard detection in hardware. Advantages: Simple hardware implementation as a logical extension to RISC. Compilers have a broader sight for extracting ILP. [2] Not burden run time execution with any inefficiency. [2] Disadvantage: Conservative compilers because of “use only compile time information”. [2] Object code compatibility & size. [1]

    4. VLIW Speculative Trace Scheduling 4 Static Scheduling Idea: try to keep pipeline full (in single issue pipelines) [2] or utilize all FUs in each cycle (in VLIW) [1] as much as possible to reach better ILP and therefore higher parallel speedups. Note1: many of the following techniques are also applicable to a simple multi-cycle, single issue, statically scheduled pipeline supposedly with lower benefits. Methods: Simple code motion [2] Loop unrolling & loop peeling [2][3] Software pipeline [1][2] Global code scheduling (across basic block) [3][4] Trace scheduling Superblock scheduling Hyperblock scheduling Speculative Trace scheduling Basic block: a sequence of straight non-branch instructions. Note2: these methods are not mutually exclusive; some are another one extension, some rely primarily on another method. Basic block: a sequence of straight non-branch instructions. Note2: these methods are not mutually exclusive; some are another one extension, some rely primarily on another method.

    5. VLIW Speculative Trace Scheduling 5 Simple Code Motion Idea: changing the order of instruction while preserving dependencies. Taking care of FU latencies. By constructing Data Dependency Graphs and finding critical path. Issue critical path instructions as soon as possible. Compaction in instruction words.

    6. VLIW Speculative Trace Scheduling 6 Simple Code Motion Example [2] (For single issue pipeline) It’s called basic pipeline scheduling in single issue multi-cycle pipelinesIt’s called basic pipeline scheduling in single issue multi-cycle pipelines

    7. VLIW Speculative Trace Scheduling 7 Loop Unrolling & Peeling Loop unrolling: loop which tends to iterate many times (guessed from profiles) [3] Simply replicates the loop body multiple times, adjusting the loop termination code. [2] “to generate long, straight-line code sequences” [2] which enables exploring more ILP. Loop peeling: modifies a loop which tends to iterate only a few times (guessed from profiles) [3] The loop body is replaced by straight-line code consisting of the first several iterations. The original loop body is moved to the end of the function to compute additional iterations.

    8. VLIW Speculative Trace Scheduling 8 Loop Unrolling Example [1]

    9. VLIW Speculative Trace Scheduling 9 Loop Peeling Example [3] Some changes are made for the sake of simplicitySome changes are made for the sake of simplicity

    10. VLIW Speculative Trace Scheduling 10 Software Pipeline A technique for reorganizing loops such that each iteration in the software-pipelined code is made from instructions chosen from different iterations of the original loop. [2] Try to mimic what happens in a dynamically scheduled pipeline. Software pipelining pays startup/wind-down costs only once per loop, not once per iteration. [1]

    11. VLIW Speculative Trace Scheduling 11 Software Pipeline Example [2]

    12. VLIW Speculative Trace Scheduling 12 Trace Scheduling Trace : a sequence of instructions which may Include branches but not including loops. [3] Therefore a trace may consists of one or more basic block. Also a trace can have more than entry point as well as more than one exit points For example some traces in the control flow graph: [3] B1,B3 B4 B5,B7 B1,B2 B1,B2,B5,B6,B7

    13. VLIW Speculative Trace Scheduling 13 Trace Scheduling Cont’d Trace Scheduling: finding common path and scheduling traces in that path independently. Scheduling in a trace rely on basic code motion [3] but now has a global taste across more that one basic block by appropriate use of renaming. Compensation codes are needed for side entry points (i.e. points except beginning) and slide exit points (i.e. points except ending). [3] Blocks on non common path may now have added overhead, so there must be a high probability of taking common path according to profile (may not be clear for some programs). Problems: compensation codes are difficult to generate specially for entry points. [3] Common path may not be critical path.Common path may not be critical path.

    14. VLIW Speculative Trace Scheduling 14 Trace Scheduling Example [3] For example suppose that B1,B3,B4,B5,B7 is the most frequently executed path Therefore traces are: B1,B3 B4 B5,B7 Scheduling in a loop can be extended by use of loop unrolling, loop peeling or software pipeliningScheduling in a loop can be extended by use of loop unrolling, loop peeling or software pipelining

    15. VLIW Speculative Trace Scheduling 15 Trace Scheduling Compensation Example 1 [3] Moving an instruction below a side exit (simple)

    16. VLIW Speculative Trace Scheduling 16 Trace Scheduling Compensation Example 2 [3] Moving an instruction below a side entry (complex)

    17. VLIW Speculative Trace Scheduling 17 Code Motion in Trace Scheduling In addition to need of compensation codes there are restrictions on movement of a code in a trace: [2] The dataflow of the program must not change The exception behaviour must be preserved Dataflow can be guaranteed to be correct by maintaining two dependencies: [2] Data dependency Control dependency There are two solutions to eliminate control dependency: By use of predicate instructions (Hyperblock scheduling) and removing the branch. By use of speculative instructions (Speculative Scheduling) and speculatively move an instruction before the branch. Before hyperblock and speculative it’s better to talk about superblock which is a natural simplification of trace schedulingBefore hyperblock and speculative it’s better to talk about superblock which is a natural simplification of trace scheduling

    18. VLIW Speculative Trace Scheduling 18 Superblock Scheduling Superblock: A trace which has no side entrances. [3] Control may only enter from the top but may leave at one or more exit points. i.e. no side entries. Superblock scheduling: just like trace scheduling but eliminating side entries by use of tail duplication. Tail duplication: [3] A copy is made of the tail portion of the trace from the first side entrance to the end. All side entrances are moved to the corresponding duplicate basic blocks. Simple compensation codes as well as more freedom to block code motion scheduling. Bigger code specially if first entry points are located near beginning (approximately double code side). A natural simplification to trace scheduling. A natural simplification to trace scheduling.

    19. VLIW Speculative Trace Scheduling 19 Superblock Scheduling Example [3]

    20. VLIW Speculative Trace Scheduling 20 Hyperblock Scheduling Hyperblock: a collection of basic blocks along a control flow graph multiple paths are scheduled in a single unit. [4] it’s a single entry structure with multiple side exits [4] like superblocks. Hyperblock Scheduling: like superblock scheduling but uses predication to form scheduling scopes. [4] By (partial) removing of branches, the control dependencies become data dependencies and there would be more opportunity to exploit ILP. [2] Needs explicit hardware support by means of predicated instructions. [2] Larger code size because of using a predicate indicator (usually a register) in each issue slot. Appropriate when execution paths have similar frequency (according to profile). [3] Predicated instructions: an instruction which executed when the associated predicate is true else it becomes nop.Predicated instructions: an instruction which executed when the associated predicate is true else it becomes nop.

    21. VLIW Speculative Trace Scheduling 21 Hyperblock Scheduling Example [3] Hyperblock selection: [3] Execution frequency Block size Instruction characteristics Hyperblock formation: [3] Tail duplication Node splitting

    22. VLIW Speculative Trace Scheduling 22 Speculative Scheduling, 1st Approach Simple form, by movement of instruction(s) before condition evaluation of a branch in trace scheduling. [3] Register renaming plays a critical role in the ability of effective speculative scheduling. [2] Addition of nullifying / correcting codes after condition evaluation to cancel results of misprecidtion if needed. May have overhead if the control flow goes unpredictably thus profiling must justify speculation. Hardware support can help scheduling [2] By means of addition of instructions marked speculative to ISA as well as markers in previous place/slot. Hardware may use a ROB as in dynamic speculation and commit speculative instructions when reaching markers. Compiler is not forced to add nullifying or correcting codes if a misprediction happens.

    23. VLIW Speculative Trace Scheduling 23 Speculative Scheduling, 1st Approach Example [3] I’ve added compensation code to original chartI’ve added compensation code to original chart

    24. VLIW Speculative Trace Scheduling 24 Speculative Scheduling, 1st Approach Example [2] Second LD is moved before branch by use of register renaming In this case because of an overwrite to R14 in else clause there is no need for canceling instruction(s). Afterwards any usage of R1 must be changed to R14

    25. VLIW Speculative Trace Scheduling 25 Speculative Scheduling, 2nd Approach Idea: speculatively execute an entire trace ignoring all branches (may be more that one) until the end and check the correctness. The application is first divided into decision trees and then further split into traces. [4] All the decision points are removed from the body of the trace and extra code is inserted at the tail to check for correct conditions. [4] Traces are chosen to execute according to the profile. Highly optimized for VLIW.

    26. VLIW Speculative Trace Scheduling 26 Decision Trees A tree of control flow of the program. Each leaf of a decision tree ends in a procedure call or jump to a different tree. [4] There are no side exits from the interior basic blocks of a decision tree and there is only one entry point which is the root of the decision tree. [4] Similar to superblock but contains path alternatives. Predication can be employed in decision trees similar to hyperblock scheduling. [4]

    27. VLIW Speculative Trace Scheduling 27 Decision Tree Example [4]

    28. VLIW Speculative Trace Scheduling 28 Traces in Decision Trees [4] The tree file is transformed into a trace file i.e. each decision tree in the tree file is split into its corresponding traces with different probabilities (specified by profile). These trace files are then scheduled on the underlying VLIW processor using list scheduling. These traces scheduled independently. Note that each trace doesn’t have a branch. A tail is attached to each trace to check whether the trace is actually taken by processor or not. If the prediction is wrong a roll back to a safe place (usually start of decision tree) is made and the next probable trace is taken. Roll back facility can be provided by Hardware, shadow registers that are committed if path was correct. Software, by explicitly save register at the start of each trace. For example, in below diagram, ABCF, ABEI are two traces.

    29. VLIW Speculative Trace Scheduling 29 A Trace Example in Decision A Tree [4]

    30. VLIW Speculative Trace Scheduling 30 Simulation and Results [4] Three cases are simulated and the results are compared to case 0. Case 0 and 1 corresponds to default compiler code which is based on a slightly modified trace scheduling with use of Decision trees. Underlying architecture had the ability of dynamic branch prediction. As observed in result an average minimum speedup of 1.14 is obtained by practicing the method (case 2).

    31. VLIW Speculative Trace Scheduling 31 References & Links [1] Joel Emer, Statically Scheduled ILP, CS & AI Laboratory, MIT, 2005. [2] J.L. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, Chapter 2 and Appendix G, 4th Edition, 2007. [3] S.M. Moon, Trace Scheduling, Superblock Scheduling and Hyperblock Scheduling, Seoul National University, 2006. [4] M. Agarwal, S.K. Nandy, .v. Eijndhoven and S. Balakrishnan, Speculative Trace Scheduling in VLIW Processors, IEEE conf. on VLSI in Computers and Processors, 2002. [5] Trimedia SDE 2 Documents, available: http://www.tm1300.com/doc/mainmenu.pdf http://en.wikipedia.org/wiki/TriMedia http://www.trimedia.philips.com/

    32. VLIW Speculative Trace Scheduling 32 Q&A

More Related