1 / 23

Branch Prediction

Branch Prediction. Static, Dynamic Branch prediction techniques. Next fetch started. PC. Fetch. I-cache. Fetch Buffer. Decode. Issue Buffer. Execute. Func. Units. Result Buffer. Branch executed. Commit. Arch. State. Control Flow Penalty Why Branch Prediction.

adriann
Download Presentation

Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Branch Prediction Static, Dynamic Branch prediction techniques

  2. Next fetch started PC Fetch I-cache Fetch Buffer Decode Issue Buffer Execute Func. Units Result Buffer Branchexecuted Commit Arch. State Control Flow PenaltyWhy Branch Prediction Modern processors have 10 -14 pipeline stages between next PC calculation and branch resolution ! work lost if pipeline makes wrong prediction ~ Loop length x pipeline width

  3. Branch Penalties in a Superscalarare extensive

  4. Reducing Control Flow Penalty • Software solutions • Minimize branches - loop unrolling • Increases the run length • Hardware solutions • Find something else to do - delay slots • Speculate –Dynamicbranch prediction • Speculative execution of instructionsbeyond branch

  5. Branch Prediction • Motivation: • Branch penalties limit performance of deeply pipelined processors • Much worse for superscalar processors • Modern branch predictors have high accuracy • (>95%) and can reduce branch penalties significantly • Required hardware support: • Dynamic Prediction HW: • Branch history tables, branch target buffers, etc. • Mispredict recovery mechanisms: • Keep computation result separate from commit • Kill instructions following branch • Restore state to state following branch

  6. JZ JZ Static Branch Prediction- review Overall probability a branch is taken is ~60-70% but: backward 90% forward 50% • ISA can attach preferred direction semantics to branches, e.g., Motorola MC88110 • bne0 (preferred taken) beq0 (not taken) • ISA can allow arbitrary choice of statically predicted direction, e.g., HP PA-RISC, Intel IA-64 typically reported as ~80% accurate

  7. Branch Prediction Needs • Target address generation • Get register: PC, Link reg, GP reg. • Calculate: +/- offset, auto inc/dec • Target speculation • Condition resolution • Get register: condition code reg, count reg., other reg. • Compare registers • Condition speculation

  8. Target address generation takes time

  9. Condition resolution takes time

  10. Solution: Branch speculation

  11. Branch Prediction Schemes • 2-bit Branch-Prediction Buffer • Branch Target Buffer • Correlating Branch Prediction Buffer • Tournament Branch Predictor • Integrated Instruction Fetch Units • Return Address Predictors (for subroutines, Pentium, Core Duo) • Predicated Execution (Itanium)

  12. Dynamic Branch Predictionlearning based on past behavior History Information • Incoming stream of addresses • Fast outgoing stream of predictions • Correction information returned from pipeline Branch Predictor Incoming Branches { Address } Prediction { Address, Value } Corrections { Address, Value }

  13. Branch History Table (BHT)Table of predictors Predictor 0 Branch PC Predictor 1 • Each branch given its own predictor • BHT is table of “Predictors” • Could be 1-bit or more • Indexed by PC address of Branch • Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit): • End of loop case: when it exits loop • First time through loop, it predicts exit instead of looping • most schemes use at least 2 bit predictors • Performance = ƒ(accuracy, cost of misprediction) • Misprediction  Flush Reorder Buffer • In Fetch state of branch: • Use Predictor to make prediction • When branch completes • Update corresponding Predictor Predictor 7

  14. Fetch PC 0 0 I-Cache k 2k-entry BHT, 2 bits/entry BHT Index Instruction Opcode offset + Branch? Taken/¬Taken? Target PC Branch History Table Organization Target PC calculation takes time 4K-entry BHT, 2 bits/entry, ~80-90% correct predictions

  15. 2-bit Dynamic Branch Predictionmore accurate than 1-bit • Better Solution: 2-bit scheme where change prediction only if get misprediction twice: • Red: stop, not taken • Green: go, taken • Adds hysteresis to decision making process T NT Predict Taken Predict Taken T T NT NT Predict Not Taken Predict Not Taken T NT

  16. Branch PC Predicted PC BTB: Branch Address at Same Time as Prediction • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) PC of instruction FETCH Yes: instruction is branch and use predicted PC as next PC =? prediction state bits No: branch not predicted, proceed normally (Next PC = PC+4) Only predicted taken branches and jumps held in BTB Next PC determined before branch fetched and decoded later: check prediction, if wrong kill instruction, update BPb

  17. BTB contains only Branch & Jump Instructions BTB contains information for branch and jump instructions only  not updated for other instructions For all other instructions the next PC is PC+4 ! Achieved without decoding instruction

  18. A PC Generation/Mux P Instruction Fetch Stage 1 BTB F Instruction Fetch Stage 2 BHT in later pipeline stage corrects when BTB misses a predicted taken branch B BHT Branch Address Calc/Begin Decode I Complete Decode J Steer Instructions to Functional units R Register File Read E Integer Execute BTB/BHT only updated after branch resolves in E stage Combining BTB and BHT • BTB entries considerably more expensive than BHT, • fetch redirected earlier in pipeline - can accelerate indirect branches (JR) • BHT can hold many more entries - more accurate

  19. Pop return address when subroutine return decoded Push return address when function call executed k entries (typically k=8-16) Subroutine Return Stack • Small stack – accelerate subroutine returns • more accurate than BTBs. &nextc &nextb &nexta

  20. Mispredict Recovery • In-order execution machines: • Instructions issued after branch cannot write-back before branch resolves • all instructions in pipeline behind mispredicted branch Killed

  21. Predicated Execution • Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP • If false, then neither store result nor cause exception • Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr. • IA-64: 64 1-bit condition fields selected so conditional execution of any instruction • This transformation is called “if-conversion” • Drawbacks to conditional instructions • Still takes a clock even if “annulled” • Stall if condition evaluated late • Complex conditions reduce effectiveness; condition becomes known late in pipeline x A = B op C

  22. Accuracy v. Size (SPEC89)

  23. Dynamic Branch Prediction Summary • Prediction becoming important part of scalar execution • Branch History Table: 2 bits for loop accuracy • Correlation: Recently executed branches correlated with next branch. • Tournament Predictor: more resources to competitive solutions and pick between them • Branch Target Buffer: include branch address & prediction • Predicated Execution can reduce number of branches, number of mispredicted branches • Return address stack for prediction of indirect jump

More Related