1 / 58

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

Lecture 07: Pipelining Multicycle, MIPS R4000, and More. Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2016. Integer Op in 1 CC. IF ID EX MEM WB. What about floating-point operation?. FP Operation.

Download Presentation

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 07: PipeliningMulticycle, MIPS R4000, and More Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2016

  2. Integer Op in 1 CC IF ID EX MEM WB

  3. What about floating-point operation?

  4. FP Operation • Floating-point (FP) operations take more time than integer operations do • To complete an FP op in 1 cc: a slow clock? many logic in FP units?

  5. Multicycle FP Operation • FP pipeline allow for a longer latency for op; two changes over integer pipeline: repeat EX; use multiple FP functional units;

  6. FP Pipeline

  7. Preview • Multicycle FP Operations • Hazards and Forwarding • Example: MIPS R4000 Pipeline

  8. Appendix C.5-C.7

  9. How FP operations pipeline?

  10. FP Pipeline loads and stores integer ALU operations branches use multiple FP units FP and integer multiplier repeat EX FP add FP subtract FP conversion FP and integer divider

  11. FP Pipeline • EX is not pipelined • Until the previous instruction leaves EX, no other instruction using that functional unit may issue • If an instruction cannot proceed to EX, the entire pipeline behind that instruction will be stalled

  12. Latency & Ini/Repeat Interval • Latency the number of intervening cycles between an instruction that produces a result and an instruction that uses the result • Initiation/Repeat Interval the number of cycles that must elapse between issuing two operations of a given type

  13. Latency & Ini/Repeat Interval Essentially, pipeline latency is 1 cycle less than the depth of the execution pipeline, which is the number of stages from the EX stage to the stage that produces the result

  14. Generalized FP Pipeline • EX is pipelined (except for FP divider) • Additional pipeline registers e.g., ID/A1 FP divider: 24 CCs

  15. Generalized FP Pipeline • Example italics: stage where data is needed bold: stage where a result is available

  16. Generalized FP Pipeline • Example italics: stage where data is needed bold: stage where a result is available Intervening cycles

  17. Any FP pipeline hazards?

  18. Structural Hazard • Divider is not fully pipelined – structural hazard

  19. Structural Hazard • Instructions have varying running times, maybe >1 register write in a cycle - structural hazard

  20. Structural Hazards

  21. Structural Hazards • Interlock Detection • Method 1: track the use of the write port in the ID stage and stall an instruction before it issues ::a shift register tracks when already-issued instructions will use the register file; if the instruction in ID is needs to use the register file at the same time, stall

  22. Structural Hazards • Interlock Detection • Method 2: stall a conflicting instruction when it tries to enter MEM/WB ::could stall either issuing or issued one; give priority to the unit with the longest latency; more complicated: stall arises from MEM/WB

  23. WAW Hazard • Instructions no longer reach WB in order – Write after write (WAW) hazard

  24. WAW Hazards • If L.D were issued one cycle earlier • L.D would write F2 one cycle earlier than ADD.D – WAW hazard what if another instruction using F2 between them? --- No WAW

  25. RAW Hazard • Longer latency of operations – more frequent stalls for read after write (RAW) hazards

  26. RAW Hazards

  27. Hazard: Exceptions • Instructions may complete in a different order than they were issued – exceptions

  28. How to detect and solve pipeline hazards?

  29. Hazard Detection in ID • 1. Check for structural hazards wait until the required functional unit is not busy (only for divides); make sure the register write port is available when it will be needed;

  30. Hazard Detection in ID • 2. Check for RAW data hazards wait until source registers are available when needed --- when they are not pending destinations of issued instructions

  31. Hazard Detection in ID • 3. Check for WAW data hazards determine if any instruction in A1 – A4, D, M1-M7 has the same register destination as this instruction; if so, stall the issue of the instr in ID

  32. Forwarding • Generalized with more sources EX/MEM, A4/MEM, M7/MEM, D/MEM, MEM/WB -> source registers of an FP instruction

  33. Out-of-order Completion • ADD and SUB complete before DIV • Out-of-order completion: instructions are completing in a different order than they were issued

  34. Out-of-order Completion How to deal with out-of-order? • 1. ignore the problem • 2. buffer the results of an operation until all the operations issued earlier complete • 3. tracking what operations were in the pipeline and their PCs • 4. issue an instruction only if it is certain that all previous instructions will complete without exception

  35. All in MIPS R4000

  36. MIPS R4000: • 5-stage -> 8-stage • Higher clock rate

  37. IF MIPS R4000: • IF: first half of instruction fetch; PC selection; initiation of instruction cache access;

  38. IS MIPS R4000: • IS: second half of instruction fetch; completion of instruction cache access;

  39. RF MIPS R4000: • RF: instruction decode and register fetch; hazard checking; instruction cache hit detection;

  40. EX MIPS R4000: • EX: execution effective address calculation; ALU operation; branch-target computation and condition evaluation;

  41. DF MIPS R4000: • DF: data fetch first half of data access;

  42. DS MIPS R4000: • DS: second half of data fetch completion of data cache access;

  43. TC MIPS R4000: • TC: tag check determine whether the data cache access hit;

  44. WB MIPS R4000: • WB: write back for loads and register-register operations;

  45. Load Delay • 2-cycle load delay

  46. Load Delay • 2-cycle load delay

  47. Branch Delay • 3-cycle branch delay: • predicted-not-taken

  48. Branch Delay • 3-cycle branch delay: predicted-not-taken taken branch untaken branch

  49. Forwarding • Forwarding ALU/MEM or MEM/WB -> EX/DF, DF/DS, DS/TC, TC/WB

  50. FP Operations • FP Pipeline • FP unit with three functional units: FP divider, FP multiplier, FP adder • 2 cycles to 112 cycles

More Related