html5-img
1 / 24

Tomasulo Algorithm

Tomasulo Algorithm. Control & buffers distributed with Function Units (FU) FU buffers called “ reservation stations ”; have pending operands Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming ; avoids WAR WAW hazards

thy
Download Presentation

Tomasulo Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tomasulo Algorithm • Control & buffers distributed with Function Units (FU) • FU buffers called “reservation stations”; have pending operands • Registers in instructions replaced by values or pointers to reservation stations(RS); called registerrenaming ; • avoids • WAR • WAW hazards • More reservation stations than registers, so can do optimizations compilers cannot • Results to FU from RS, not through registers, over Common Data Busthat broadcasts results to all FUs • Load and Stores treated as FUs with RSs as well. • Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue. R1  R1

  2. From memory From instruction unit 6 5 4 3 2 1 FP Registers FP Operation queue Load buffers Operand buses Store buffers 3 2 1 Operation bus To memory 3 2 1 Reservation Stations 2 1 FP Multipliers FP adders Common data bus (CDB)

  3. Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Vj, Vk—Value of Source operands • Store buffers has V field, result to be stored Qj, Qk—Reservation stations producing source registers (value to be written) • Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready • Store buffers only have Qi for RS producing result Busy—Indicates reservation station or FU is busy Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

  4. Three Stages of Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available • Common data bus: data + source (“come from” bus) • 64 bits of data + 4 bits of Functional Unit source address • Write if matches expected Functional Unit (produces result) • Does the broadcast

  5. Program Tomasulo Example Cycle 0

  6. Cycle: 0 From memory From instruction unit FP Registers 6 5 4 3 2 1 FP operation queue Load buffers Store buffers Operand buses 3 2 1 LD F6, 34(R2) Operation bus To memory 3 2 1 Reservation Stations 2 1 FP Multipliers FP adders Common data bus (CDB)

  7. Cycle: 1 From memory From instruction unit FP Registers 6 5 4 3 2 1 34+R2 F6 : load1 FP operation queue Load buffers Store buffers LD F2, 45(R3) Operand buses 3 2 1 LD F6, 34(R2) Operation bus To memory 3 2 1 Reservation Stations 2 1 FP Multipliers FP adders Common data bus (CDB)

  8. Cycle: 2 From memory From instruction unit FP Registers 6 5 4 3 2 45+R3 1 34+R2 F2 : load2 F6 : load1 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers LD F2, 45(R3) Operand buses 3 2 1 LD F6, 34(R2) Operation bus To memory 3 2 1 Reservation Stations 2 1 FP Multipliers FP adders Common data bus (CDB)

  9. Cycle: 3 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 45+R3 1 Mem[34+R2] F2 : load2 F6 : load1 SUB F8,F6,F2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers LD F2, 45(R3) Operand buses 3 2 1 LD F6, 34(R2) Operation bus To memory 3 2 1 Reservation Stations 2 M load2 “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  10. Cycle: 4 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 Mem[45+R3] 1 F2 : load2 DIVD F10,F0,F6 F6 Mem[34+R2] F6 : load1 SUB F8,F6,F2 F8: add1 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers LD F2, 45(R3) Operand buses 3 2 1 LD F6, 34(R2) L1: Mem[34+R2] Operation bus To memory 3 2 1 S load1 load2 Reservation Stations 2 M load2 “F4” 1 Mem[34+R2] FP Multipliers FP adders Common data bus (CDB)

  11. Cycle: 5 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F2  Mem[45+R3] F2 : load2 DIVD F10,F0,F6 F8: add1 SUB F8,F6,F2 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers LD F2, 45(R3) Operand buses 3 2 1 L2: Mem[45+R3] Operation bus To memory 3 2 1 S Mem[R2] load2 Reservation Stations D Mult1 2 M load2 “F4” 1 Mem[45+R3] Mem[45+R3] Mem[45+R3] FP Multipliers FP Multipliers FP adders FP adders Common data bus (CDB)

  12. Cycle: 6 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6: add2 DIVD F10,F0,F6 F8: add1 SUB F8,F6,F2 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A add1 M[R3] 1 S Mem[R2] M[R3] Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  13. Cycle: 7 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6: add2 DIVD F10,F0,F6 F8: add1 SUB F8,F6,F2 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A add1 M[R3] 1 S Mem[R2] M[R3] Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  14. Cycle: 8 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6: add2 DIVD F10,F0,F6 F8: add1 F8  M()-M() SUB F8,F6,F2 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A add1 M[R3] 1 S Mem[R2] M[R3] Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 M()-M() FP Multipliers FP adders Common data bus (CDB) Add1: M()-M()

  15. Cycle: 9 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6: add2 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A M()-M() M[R3] 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  16. Cycle: 10 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6: add2 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A M()-M() M[R3] 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  17. Cycle: 11 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 ADD F6,F8,F2 F6  (M()-m())+M() F6: add2 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 A M()-M() M[R3] 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB) Add2: (M()-M())+M()

  18. Cycle: 12 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  19. Cycle: 13 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  20. Cycle: 14 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  21. Cycle: 15 From memory From instruction unit FP Registers F0 : mult1 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 FP Multipliers FP adders Common data bus (CDB)

  22. Cycle: 16 From memory From instruction unit FP Registers F0 : mult1 F0  M()*F4 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers MULTD F0,F2,F4 Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D Mult1 M[R3] 2 M M[R3] “F4” 1 M()*F4 FP Multipliers FP adders Common data bus (CDB) Mult1: M()*F4

  23. Cycle: 17 From memory From instruction unit FP Registers 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 FP operation queue Load buffers Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D M()*F4 M[R3] 2 1 FP Multipliers FP adders Common data bus (CDB)

  24. Cycle: 57 From memory From instruction unit FP Registers 6 5 4 3 2 1 DIVD F10,F0,F6 F10: mult2 F10  M()*F4 / M() FP operation queue Load buffers Store buffers Operand buses 3 2 1 Operation bus To memory 3 2 1 Reservation Stations D M()*F4 M[R3] 2 1 FP Multipliers FP adders Common data bus (CDB) Mult2: M()*F4 / M()

More Related