1 / 46

EECS 470

EECS 470. Pipeline Hazards Lecture 4 Coverage: Appendix A. Basic Pipelining. Data hazards What are they? How do you detect them? How do you deal with them? Micro-architectural changes Pipeline depth Pipeline width Forwarding ISA. +. +. A L U.

bell
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A

  2. Basic Pipelining • Data hazards • What are they? • How do you detect them? • How do you deal with them? • Micro-architectural changes • Pipeline depth • Pipeline width • Forwarding ISA

  3. + + A L U Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 16-18 M U X Bits 22-24 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  4. + + A L U Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 R6 M U X data R7 offset dest valB dest dest dest op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  5. + + A L U fwd fwd fwd Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 data R6 M U X R7 offset valB op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  6. Pipeline function for ADD • Fetch: read instruction from memory • Decode: read source operands from reg • Execute: calculate sum • Memory: Pass results to next stage • Writeback: write sum into register file

  7. Data Hazards add 1 2 3 nand 3 4 5 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you will read the wrong value of R3

  8. Three approaches to handling data hazards • Avoidance • Make sure there are no hazards in the code • Detect and Stall • If hazards exist, stall the processor until they go away. • Detect and Forward • If hazards exist, fix up the pipeline to get the correct value (if possible)

  9. Handling data hazards: avoid all hazards • Assume the programmer (or the compiler) knows about the processor implementation. • Make sure no hazards exist. • Put noops between any dependent instructions. write R3 in cycle 5 add 1 2 3 noop noop nand 3 4 5 read R3in cycle 6

  10. Problems with this solution • Old programs (legacy code) may not run correctly on new implementations • Longer pipelines need more noops • Programs get larger as noops are included • Especially a problem for machines that try to execute more than one instruction every cycle • Intel EPIC: Often 25% - 40% of instructions are noops • Program execution is slower • CPI is one, but some I’s are noops

  11. Handling data hazards: detect and stall • Detection: • Compare regA with previous DestRegs • 3 bit operand fields • Compare regB with previous DestRegs • 3 bit operand fields • Stall: • Keep current instructions in fetch and decode • Pass a noop to execute

  12. + + A L U End of Cycle 1 M U X 1 target PC+1 PC+1 0 R0 eq? 14 R1 regA ALU result 7 R2 Inst mem Register file regB valA M U X PC Data memory 10 R3 add 1 2 3 ALU result mdata R4 M U X valB R5 data R6 M U X R7 offset valB op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  13. + + A L U End of Cycle 2 M U X 1 target PC+1 PC+1 0 R0 eq? 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory 10 R3 nand 3 4 5 3 ALU result mdata R4 M U X 7 R5 data R6 M U X R7 3 valB add op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  14. + + Hazard detection A L U First half of cycle 3 M U X 1 target PC+1 PC+1 0 R0 eq? 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 ALU result mdata R4 M U X 7 R5 data R6 M U X R7 3 valB add op op IF/ ID ID/ EX EX/ Mem Mem/ WB

  15. compare compare compare Hazard detected compare REG file regA 3 regB 3 IF/ ID ID/ EX

  16. 1 Hazard detected compare 0 0 0 0 1 1 regA regB 0 1 1 3

  17. Handling data hazards: detect and stall the pipeline until ready • Detection: • Compare regA with previous DestReg • 3 bit operand fields • Compare regB with previous DestReg • 3 bit operand fields • Stall: Keep current instructions in fetch and decode Pass a noop to execute

  18. en + + Hazard en A L U First half of cycle 3 M U X 1 target 2 1 0 R0 eq? 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 ALU result mdata 11 R4 M U X 7 R5 data R6 M U X R7 valB add IF/ ID ID/ EX EX/ Mem Mem/ WB

  19. Handling data hazards: detect and stall the pipeline until ready • Detection: • Compare regA with previous DestReg • 3 bit operand fields • Compare regB with previous DestReg • 3 bit operand fields • Stall: • Keep current instructions in fetch and decode • Pass a noop to execute

  20. + + A L U noop End of cycle 3 M U X 1 2 0 R0 14 R1 regA ALU result 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 21 mdata 3 11 R4 M U X R5 data R6 M U X R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB

  21. en + + Hazard en A L U First half of cycle 4 M U X 1 2 0 R0 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 21 mdata 3 11 R4 M U X R5 data R6 M U X R7 noop add IF/ ID ID/ EX EX/ Mem Mem/ WB

  22. + + A L U noop End of cycle 4 M U X 1 2 0 R0 14 R1 regA 21 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X R5 data R6 M U X R7 noop noop add IF/ ID ID/ EX EX/ Mem Mem/ WB

  23. + + No Hazard A L U First half of cycle 5 M U X 1 2 0 R0 3 14 R1 regA 21 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X R5 data R6 M U X R7 noop noop add IF/ ID ID/ EX EX/ Mem Mem/ WB

  24. End of cycle 5 + + A L U M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 21 M U X PC Data memory add 3 7 7 21 R3 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB

  25. No more hazard: stalling add 1 2 3 nand 3 4 5 time add fetch decode execute memory writeback nand fetch decodedecodedecodeexecute hazard hazard We are careful to get the right value of R3

  26. Problems with detect and stall • CPI increases every time a hazard is detected! • Is that necessary? Not always! • Re-route the result of the add to the nand • nand no longer needs to read R3 from reg file • It can get the data later (when it is ready) • This lets us complete the decode this cycle • But we need more control to remember that the data that we aren’t getting from the reg file at this time will be found elsewhere in the pipeline at a later cycle.

  27. Handling data hazards: detect and forward • Detection: same as detect and stall • Except that all 4 hazards are treated differently • i.e., you can’t logical-OR the 4 hazard signals • Forward: • New datapaths to route computed data to where it is needed • New Mux and control to pick the right data

  28. First half of cycle 3 + + Hazard A L U fwd fwd fwd M U X 1 2 1 0 R0 3 14 R1 regA 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X 77 7 R5 data 1 R6 M U X 8 R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB

  29. End of cycle 3 + + A L U H1 M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 10 M U X PC Data memory add 6 3 7 10 R3 3 21 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB

  30. First half of cycle 4 + + New Hazard A L U H1 M U X 1 3 2 0 R0 21 14 R1 regA M U X 3 7 R2 Inst mem Register file regB 10 M U X PC Data memory add 6 3 7 10 R3 3 21 11 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB

  31. End of cycle 4 + + A L U H2 H1 M U X 1 4 3 0 R0 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory lw 3 6 10 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB

  32. First half of cycle 5 + + 1 21 A L U H2 H1 M U X 1 4 3 No Hazard 0 R0 3 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory lw 3 6 10 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB

  33. + + A L U H2 H1 End of cycle 5 M U X 1 5 4 0 R0 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data memory sw 6 2 12 21 R3 6 22 11 R4 7 5 M U X 77 R5 data 1 R6 M U X 8 R7 10 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB

  34. en + + en A L U H2 H1 First half of cycle 6 M U X 1 5 4 Hazard 0 R0 6 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data memory sw 6 2 12 21 R3 22 11 R4 6 7 5 M U X 77 R5 L 1 R6 M U X data 8 R7 10 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB

  35. + + A L U noop H2 End of cycle 6 M U X 1 5 0 R0 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data memory sw 6 2 12 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 lw add IF/ ID ID/ EX EX/ Mem Mem/ WB

  36. + + A L U H2 First half of cycle 7 M U X 1 5 Hazard 0 R0 6 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data memory sw 6 2 12 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 noop lw add IF/ ID ID/ EX EX/ Mem Mem/ WB

  37. + + A L U H3 End of cycle 7 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory 21 R3 99 11 R4 6 M U X -2 7 R5 data 1 R6 M U X 22 R7 12 sw noop lw IF/ ID ID/ EX EX/ Mem Mem/ WB

  38. First half of cycle 8 + + 99 12 A L U H3 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory 21 R3 99 11 R4 6 M U X -2 7 R5 data 1 R6 M U X 8 R7 12 sw noop lw IF/ ID ID/ EX EX/ Mem Mem/ WB

  39. End of cycle 8 + + A L U H3 M U X 1 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB M U X PC Data memory 21 R3 111 11 R4 M U X -2 R5 data 99 R6 M U X 8 R7 7 sw noop IF/ ID ID/ EX EX/ Mem Mem/ WB

  40. FP pipeline support I add M1 M2 M3 M4 M5 M6 M7 Mem WB fetch decode FP multiply A1 A2 A3 A4 FP adder Non-pipelined divide

  41. Adding pipeline stages • Pipeline frontend • Fetch, Decode • Pipeline middle • Execute • Pipeline backend • Memory, Writeback

  42. Adding stages to fetch, decode • Delays hazard detection • No change in forwarding paths • No performance penalty with respect to data hazards

  43. Adding stages to execute • Check for structural hazards • ALU not pipelined • Multiple ALU ops completing at same time • Data hazards may cause delays • If multicycle op hasn't computed data before the dependent instruction is ready to execute • Performance penalty for each stall

  44. Adding stages to memory, writeback • Instructions ready to execute may need to wait longer for multi-cycle memory stage • Adds more pipeline registers • Thus more source registers to forward • More complex hazard detection • Wider muxes • More control bits to manage muxes

  45. Wider pipelines fetch decode execute mem WB fetch decode execute mem WB More complex hazard detection 2X pipeline registers to forward from 2X more instructions to check 2X more destinations (muxes)

  46. Making forwarding explicit • add r1  r2, EX/Mem ALU result • Include direct mux controls into the ISA • Hazard detection is now a compiler task • New micro-architecture leads to new ISA • Can reduce some resources • No longer need to build a heavily ported reg file Ref: TTAs: Missing the ILP complexity wall

More Related