1 / 39

CSL718 : Pipelined Processors

CSL718 : Pipelined Processors. Improving Branch Performance – contd. 21st Jan, 2006. Improving Branch Performance. Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction

bette
Download Presentation

CSL718 : Pipelined Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSL718 : Pipelined Processors Improving Branch Performance – contd. 21st Jan, 2006 Anshul Kumar, CSE IITD

  2. Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD

  3. Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD

  4. Branch Elimination Use conditional/guarded instructions (predicated execution) F C T S C : S OP1 BC CC = Z,  + 2 ADD R3, R2, R1 OP2 OP1 ADD R3, R2, R1, NZ OP2 Examples: HP PA (all integer arithmetic/logical instructions) DEC Alpha, SPARC V9 (conditional move) Anshul Kumar, CSE IITD

  5. Branch Elimination - contd. CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) Anshul Kumar, CSE IITD

  6. Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD

  7. Branch Speed Up : early target address generation • Assume each instruction is Branch • Generate target address while decoding • If target in same page omit translation • After decoding discard target address if not Branch IF IF IF D TIF TIF TIF AG BC Anshul Kumar, CSE IITD

  8. Branch Speed Up : increase CC - branch gap Increase the gap between the instruction which sets CC and branching • Early CC setting • Delayed branch Anshul Kumar, CSE IITD

  9. Summary - Branch Speed Up n=0 n=1 n=2 n=3 n=4 n=5 uncond 4 4 4 4 4 4 cond (T) 6 5 4 4 4 4 cond (I) 5 4 3 2 1 0 uncond 4 3 2 1 0 0 cond (T) 6 5 4 3 2 1 cond (I) 5 4 3 2 1 0 delayedearly CC branchsetting Anshul Kumar, CSE IITD

  10. Delayed Branch with Nullification (Also called annulment ) • Delay slot is used optionally • Branch instruction specifies the option • Option may be exercised based on correctness of branch prediction • Helps in better utilization of delay slots Anshul Kumar, CSE IITD

  11. Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD

  12. Branch Prediction • Treat conditional branches as unconditional branches / NOP • Undo if necessary Strategies: • Fixed (always guess inline) • Static (guess on the basis of instruction type / displacement) • Dynamic (guess based on recent history) Anshul Kumar, CSE IITD

  13. Static Branch Prediction Total 68.2% Anshul Kumar, CSE IITD

  14. Threshold forStatic prediction actual  T I guessT4 5  I6 0 CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 Anshul Kumar, CSE IITD

  15. Dynamic Branch Prediction -basic idea Predict based on the history of previous branch loop: xxx2 mispredictions xxxfor every xxxoccurrence xxx BC loop Anshul Kumar, CSE IITD

  16. Dynamic Branch Prediction -2 bit prediction scheme N 0 1 T 3/2 0/1 T N T predict not taken predict taken N N 2 3 T Anshul Kumar, CSE IITD

  17. Dynamic Branch Prediction -second scheme Predict based on the history of previous n branches e.g., if n = 3 then 3 branches taken  predict taken 2 branches taken  predict taken 1 branch taken  predict not taken 0 branches taken  predict not taken Anshul Kumar, CSE IITD

  18. Dynamic Branch Prediction -Bimodal predictor Maintain saturating counters T T T T 0 1 2 3 N N N N One counter per branch or One counter per cache line - merge results if multiple branches Anshul Kumar, CSE IITD

  19. Dynamic Branch Prediction -History of last n occurrences current entry updated entry outcome of last three occurrences of this branch 0 : not taken 1 : taken actual outcome ‘taken’ 1 1 0 1 1 1 prediction using majority decision Anshul Kumar, CSE IITD

  20. Dynamic Branch Prediction -storing prediction counters store in separate buffer or store in cache directory CACHE directory storage cache line counter Anshul Kumar, CSE IITD

  21. Correct guesses vs. history length Anshul Kumar, CSE IITD

  22. Two-Level Prediction • Uses two levels of information to make a direction prediction • Branch History Table (BHT) - last n occurrences • Pattern History Table (PHT) - saturating 2 bit counters • Captures patterned behavior of branches • Groups of branches are correlated • Particular branches have particular behavior Anshul Kumar, CSE IITD

  23. B1: if (x) ... B2: if (y) ... z = x && y B3: if (z) ... B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2 Correlation between branches Anshul Kumar, CSE IITD

  24. Some Two-level Predictors PC BHT GBHR PHT PHT 1 0 1 1 0 1 1 0 1 0 T/NT T/NT 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT Anshul Kumar, CSE IITD

  25. Two-level Predictor Classification • Yeh and Patt 3-letter naming scheme • Type of history collected • G (global), P (per branch), S (per set) • PHT type • A (adaptive), S (static) • PHT organization • g (global), p (per branch), s (per set) • Examples - GAs, PAp etc. Anshul Kumar, CSE IITD

  26. Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD

  27. Branch Target Capture • Branch Target Buffer (BTB) • Target Instruction Buffer (TIB) instr addr pred stats target target addr target instr prob of target change < 5% Anshul Kumar, CSE IITD

  28. BTB Performance BTB miss go inline BTB hit go to target decision .4 .6 result inline target inline target .8 .2 .2 .8 delay 0 5 4 0 .4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*0 = 0.88 Anshul Kumar, CSE IITD

  29. Previous branch decisions Explicit prediction Stored in cache directory Branch History Table, BHT Previous target address / instruction Implicit prediction Stored in separate buffer Branch Target Buffer, BTB Br Target Addr Cache, BTAC Target Instr Buffer, TIB Br Target Instr Cache, BTIC Dynamic information about branch These two can be combined Anshul Kumar, CSE IITD

  30. instr addr pred stats target Storing prediction info directory storage In cache cache line counter In separate buffer Anshul Kumar, CSE IITD

  31. Combined prediction mechanism • Explicit : use history bits • Implicit : use BTB hit/miss • hit  go to target, miss  go inline • Combined : BTB hit/miss followed by explicit prediction using history bits. One of the following is commonly used • hit  go to target, miss  explicit prediction • miss  go inline, hit  explicit prediction Anshul Kumar, CSE IITD

  32. Combined prediction BTB miss BTB hit T BTB miss I BTB hit expl predict expl predict I T I T I T I T I T I T I T I T Prediction  T: Target, I: Inline Actual outcome  T: Target, I: Inline Anshul Kumar, CSE IITD

  33. Structure of Tables Instruction fetch path with • BHT • BTAC • BTIC Anshul Kumar, CSE IITD

  34. Compute/fetch scheme (no dynamic branch prediction) A I I + 1 I + 2 I + 3 Instruction Fetch address I F AR I - cache BTA IIFA Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD

  35. BHT (Branch History Table) Instruction Fetch address 2 2 2 2 I-cache 16 K 4-way set assoc BHT 128 x 4 lines 8 instr/line 128 x 4 entries 2 2 2 2 4 instr/cycle History bits 4 x 1 instr Prediction logic decode queue issue queue 4 x 1 instr Taken / not taken BTA for a taken guess Anshul Kumar, CSE IITD

  36. BTAC scheme A I I + 1 I + 2 I + 3 Instruction Fetch address BA BTA I F AR I - cache BTA IIFA BTAC + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD

  37. BTIC scheme - 1 A I Instruction Fetch address BA BTI BTA+ I F AR I - cache BTA IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD

  38. BTIC scheme - 2 computed A I I+1 Instruction Fetch address BA BTI BTI+1 I F AR I - cache BTA+ IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD

  39. Successor index in I-cache successor index A I I + 1 I + 2 I + 3 Instruction Fetch address I F AR IIFA I - cache Next address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD

More Related