1 / 49

Dynamic instruction scheduling

Instr. gets stuck here. IF. ID. EX. M. WB. Dynamic instruction scheduling. Key idea: allow subsequent independent instructions to proceed DIVD F0,F2,F4 ; takes long time ADDD F10,F0,F8 ; stalls waiting for F0 SUBD F12,F8,F13 ; Let this instr. bypass the ADDD

yuri
Download Presentation

Dynamic instruction scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instr. gets stuck here IF ID EX M WB Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVD F0,F2,F4 ; takes long time ADDD F10,F0,F8 ; stalls waiting for F0 SUBD F12,F8,F13 ; Let this instr. bypass the ADDD • Enables out-of-order execution => out-of-order completion • Two historical schemes used in recent machines: • Scoreboard dates back to CDC 6600 in 1963 • Tomasulo in IBM 360/91 in 1967

  2. ID stage Int Mem FP Add FP Mult1 Read operands IF Issue Write FP Mult2 FP Div Scoreboard Scoreboard pipeline • Issue: Decode and check for structural hazards • Read operands: wait until no data hazard, then read operands • All data hazards are handled by the scoreboard mechanism

  3. Scoreboard complications Out-of-order completion => WAR, WAW hazards • WAR: instruction is stalled in the WB stage until a previous instruction has read the operand • WAW: instruction is stalled in the Issue stage until a previous instruction has written its result Scoreboard keeps track of dependencies and state of operations

  4. Scoreboard functionality Issue: Instruction is issued when: • No structural hazard for a functional unit • No WAW with an instruction in execution • Read: Instruction reads operands when • they become available (RAW) • EX: normal execution • Write: Instruction writes when all previous instructions • have read this operand The scoreboard is updated when an instruction proceeds to a new stage

  5. Data structures in the scoreboard 1.Instruction status—keeps track of in which stage an instruction is. • 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each FU: • Busy: Indicates whether the unit is busy or not • Op: Operation to perform in the unit (e.g. add or sub) • Fi: Destination register name • Fj, Fk: Source register names • Qj, Qk: Name of functional unit producing regs Fj, Fk • Rj, Rk: Flags indicating when Fj and Fk are ready 3. Register result status—Indicates which functional unit will write to each register, if any.

  6. Scoreboard example

  7. Detailed Scoreboard Pipeline Control

  8. Scoreboard example, cycle 1

  9. Issue 2nd load? Scoreboard example, cycle 2

  10. Issue MULT? Scoreboard example, cycle 3

  11. Scoreboard example, cycle 4

  12. Scoreboard example, cycle 5

  13. Scoreboard example, cycle 6

  14. Scoreboard example, cycle 7

  15. Scoreboard example, cycle 8a

  16. Scoreboard example, cycle 8

  17. Read operands for MULT & SUB Issue ADDD? Scoreboard example, cycle 9

  18. SUBD completes execution Scoreboard example, cycle 11

  19. Read operands for DIVD? Scoreboard example, cycle 12

  20. Issue ADDD Scoreboard example, cycle 13

  21. Scoreboard example, cycle 14

  22. Can ADDD write result? Scoreboard example, cycle 16

  23. ADDD stalls, waiting for DIVD to read F6 Resolves a WAR hazard! Scoreboard example, cycle 17

  24. Scoreboard example, cycle 19

  25. Scoreboard example, cycle 20

  26. Scoreboard example, cycle 21

  27. Now ADDD can safely write its result in F6 Scoreboard example, cycle 22

  28. Scoreboard example, cycle 61

  29. Scoreboard example, cycle 62

  30. Limitations with scoreboards The scoreboard technique is limited by: • Number of scoreboard entries (window size) • Number and types of functional units • Number of ports to the register bank • Hazards caused by name dependencies Tomasulo’s algorithm addresses the last two limitations

  31. Tomasulo’s Algorithm In IBM 360/91, 4 years after the CDC 6600 Differences between Tomasulo & Scoreboard: • Control & Buffers distributed with FUs (called reservation stations) vs. centralised in Scoreboard • Register names in instructions replaced by pointers to reservation station buffer (HW register renaming) • Common Data Bus broadcasts results to all FUs • Loads and Stores treated as FUs as well Goal: High performance without compiler support This technique has been adopted in many recent machines (e.g. PowerPC)

  32. Hardware Organization

  33. Three stages of Tomasulo’s Alg. • 1. Issue—get instruction from FP Op Queue • Issue if no structural hazard for a reservation station • Normal bus: data + destination • Common Data Bus: data + source (snooping) • 2. Execution—operate on operands (EX) • Execute when both operands are available; if not ready, watch Common Data Bus (CDB) for result • 3. Write result—finish execution (WB) • Write on CDB to all awaiting functional units;mark reservation station available

  34. Tomasulo example, cycle 0

  35. Tomasulo example, cycle 1

  36. Tomasulo example, cycle 2

  37. Tomasulo example, cycle 3

  38. Tomasulo example, cycle 4

  39. Tomasulo example, cycle 5

  40. Tomasulo example, cycle 6

  41. Tomasulo example, cycle 7

  42. Tomasulo example, cycle 8

  43. Tomasulo example, cycle 10

  44. Tomasulo example, cycle 11

  45. Tomasulo example, cycle 15

  46. Tomasulo example, cycle 16

  47. Tomasulo example, cycle 56

  48. Tomasulo example, cycle 57

  49. Example of WAR hazardsin Tomasulo’s Algorithm Example: LF F6, 34(R2) DIVF F10, F6, F0 ADDF F6, F8, F2 • ADDF can safely finish before DIVF has read register F6 because: • DIVF has renamed register F6 to point at LFs functional unit • LFbroadcasts its result on the Common Data Bus • Register renaming can thus be done: • statically by the compiler • dynamically by the hardware

More Related