1 / 19

(Recap) Control Hazards

(Recap) Control Hazards. Control (Branch) Hazard. A : beqz r2, label B : --------- label: P : Problem : The outcome of the branch (taken/not taken) is determined deep in the pipeline Do not know whether to execute B or execute P? Solutions :

ramiro
Download Presentation

(Recap) Control Hazards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Recap)Control Hazards

  2. Control (Branch) Hazard A: beqz r2, label B: --------- label: P: Problem: The outcome of the branch (taken/not taken) is determined deep in the pipeline Do not know whether to execute B or execute P? Solutions: • Stall pipeline for required number of cycles • Methods to reduce the branch penalty: 1. Speculatively predict branch outcome and recover if mispredicted 2. Delayed Branch: Always execute instruction(s) following the branch

  3. 1 2 3 4 5 6 7 8 9 IF ID EX MEM WB A IF ID EX MEM WB B P Branch Hazards: Stall pipeline A: beqz r2, label stall stall stall B ----- label: P Execution sequence: A, NOP, NOP, NOP, B ….. orA, NOP, NOP, NOP, P

  4. Reducing Branch Penalty 1. Stall scheme had a branch penalty of 3 cycles. CPI assuming branch frequency of 30% is 1.9 (with no data stalls) 2. Predicting branch not taken: • Speculatively fetch and execute in-line instructions following the branch • If prediction incorrect flush pipeline of speculated instructions • Convert these instructions to NOPs by clearing pipeline registers. • Fetch instructions from the branch target

  5. Predict branch not taken A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 Branch actually not taken: No stalls D C B A E IF ID EX MEM WB T = 4

  6. Predict branch not taken A B IF ID EX MEM WB T = 1 B A C IF ID EX MEM WB T = 2 C B A D IF ID EX MEM WB T = 3 Branch actually taken: FLUSH pipeline Make B,C,D NOPS A P IF ID EX MEM WB T = 4

  7. Reducing Branch Stall Cycles 1- Find out whether a branch is taken earlier in the pipeline. 2- Compute the taken PC earlier in the pipeline. In MIPS: • In MIPS branch instructions BEQZ, BNE, test a register for equality to zero. • This can be completed in the ID cycle by moving the zero test into that cycle. • Both PCs (taken and not taken) must be computed early. • Requires an additional adder because the current ALU is not useable until EX cycle. • This results in just a single cycle stall on branches.

  8. Reducing Branch Penalty (contd … ) 3. Predicting branch taken: • Speculatively fetch and execute instructions at the branch target • But haven’t calculated branch target address in MIPS • MIPS still incurs 1 cycle branch penalty • Other machines: branch target known before outcome

  9. Reducing Branch Penalty (contd … ) 4. Delayed Branch: Branch delay slots: In-line instructions following a branch instruction conditional branch instruction sequential successor1 sequential successor2 …….. sequential successorn branch target if taken Always executed: Usually 1 delay slot (short pipelines) • Compiler tries to put useful instructions in delay slot to mask delay

  10. Delayed Branch • Instruction in branch delay slot is always executed • Compiler (tries to) move a useful instruction into delay slot. • From before the Branch: Always helpful when possible ADD R1, R2, R3 BEQZ R2, L1 BEQZ R2, L1 DELAY SLOT ADD R1, R2, R3 - - L1: L1: • If the ADD instruction were: ADD R2, R1, R3 the move would not be possible

  11. Delayed Branch (b) From the Target: Helps when branch is taken. May duplicate instructions ADD R2, R1, R3 ADD R2, R1, R3 BEQZ R2, L1 BEQZ R2, L2 DELAY SLOT SUB R4, R5, R6 - - L1: SUB R4, R5, R6 L1: SUB R4, R5, R6 L2: L2: Instructions between BEQ and SUB (in fall through) must not useR4.

  12. Delayed Branch ( c ) From Fall Through: Helps when branch is not taken. ADD R2, R1, R3 ADD R2, R1, R3 BEQZ R2, L1 BEQZ R2, L1 DELAY SLOT SUB R4, R5, R6 SUB R4, R5, R6 - - L1: L1: Instructions at target (L1 and after) must not use R4 till set again. • Cancelling (Nullifying) Branch: Branch instruction indicates direction of prediction. If mispredicted the instruction in the delay slot is cancelled. Greater flexibility for compiler to schedule instructions.

  13. Canceling branch • Cancelling (Nullifying) Branch: • Branch instruction indicates direction of prediction. • Mispredicted branch “annuls” the instruction in delay slot. Instruction behaves as a no-op. • Gives more leverage to compiler to select instructions to fill delay slots.

  14. Delayed Branch • Limitations of delayed branch • Compiler may not find appropriate instructions to fill delay slots. Then it fills delay slots with no-ops. • Visible architectural feature – likely to change with new implementations • Pipeline structure is exposed to compiler. Need to know how many delay slots.

  15. Delayed Branch • Compiler effectiveness for single branch delay slot: • Fills about 60% of branch delay slots • About 80% of instructions executed in branch delay slots useful in computation • About 50% (60% x 80%) of slots usefully filled • Delayed Branch downside: As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot • Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches • Growth in available transistors has made dynamic approaches relatively cheaper

  16. Dynamic Branch Prediction • Builds on the premise that history matters • Observe the behavior of branches in previous instances and try to predict future branch behavior • Try to predict the outcome of a branch early on in order to avoid stalls • Branch prediction is critical for multiple issue processors • In an n-issue processor, branches will come n times faster than a single issue processor

  17. T NT NT State 1 Predict Taken State 0 Predict Not Taken T Basic Branch Predictor • Use a 1-bit branch predictor buffer or branch history table • 1 bit of memory stating whether the branch was recently taken or not • Bit entry updated each time the branch instruction is executed

  18. 1-bit Branch Prediction Buffer • Problem – even simplest branches are mispredicted twice LD R1, #5 Loop: LD R2, 0(R5) ADD R2, R2, R4 STORE R2, 0(R5) ADD R5, R5, #4 SUB R1, R1, #1 BNEZ R1, Loop First time: prediction = 0 but the branch is taken  change prediction to 1 miss Time 2, 3, 4: prediction = 1 and the branch is taken Time 5: prediction = 1 but the branch is not taken  change prediction to 0 miss

  19. Dynamic Branch Prediction Accuracy

More Related