Dynamic branch prediction
1 / 30

Dynamic Branch Prediction - PowerPoint PPT Presentation

  • Uploaded on

Dynamic Branch Prediction. Ali Azarpeyvand. Tomasulo Review. Reservations stations: renaming to larger set of registers + buffering source operands Prevents registers as bottleneck Avoids WAR, WAW hazards of Scoreboard Allows loop unrolling in HW

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Dynamic Branch Prediction' - keisha

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Tomasulo review
Tomasulo Review

  • Reservations stations: renaming to larger set of registers + buffering source operands

    • Prevents registers as bottleneck

    • Avoids WAR, WAW hazards of Scoreboard

    • Allows loop unrolling in HW

  • Not limited to basic blocks (integer units gets ahead, beyond branches)

  • Lasting Contributions

    • Dynamic scheduling

    • Register renaming

    • Load/store disambiguation

  • 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264


  • Dynamic Branch Prediction

    • Branch prediction buffer or branch history table

    • Correlating branch predictors

    • Tournament predictors

  • Branch target buffers

  • Integrated Instruction fetch unit

  • Return address predictors

Dynamic branch prediction1
Dynamic Branch Prediction

  • Performance = ƒ(accuracy, cost of misprediction)

  • Branch History Table (branch-prediction buffer) is simplest

    • Lower bits of PC address index table of 1-bit values

    • Says whether or not branch taken last time

    • No address check

  • Problem: in a loop, 1-bit BHT will cause two mispredictions (example: 9 iterations before exit  80%):

  • Solution  2 bit

Dynamic branch prediction2
Dynamic Branch Prediction

  • Solution: 2-bit scheme where change prediction only if get mispredictiontwice:

  • Dark: stop, not taken

  • Light: go, taken

Bht accuracy
BHT Accuracy

  • Mispredict because either:

    • Wrong guess for that branch

    • Got branch history of wrong branch when index the table

  • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%,

  • 4096 about as good as infinite table(in Alpha 21164),

  • Branch penalty and branch frequency are also important

Bht accuracy1
BHT Accuracy

4096 entry, two bit prediction

Correlating branches
Correlating Branches

  • Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch

  • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table

  • In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters

    • Old 2-bit BHT is then a (0,2) predictor


Code from eqntottfrom SPEC92

b3 has correlation with b1, b2

Branch prediction result
Branch Prediction Result

1 bit predictor, (d is 0 or 2)

Correlating prediction performance
Correlating Prediction Performance

One bit predictor with one bit correlation

Correlating branches1
Correlating Branches

(2,2) predictor

  • Then behavior of recent branches selects between, say, four predictions of next branch, updating just that prediction

  • Simple implementation:

    • global history can be stored in a shift register

  • Branch address is concatenated withglobal branch history and then indexed.

    Number of stored bits
    Number of Stored Bits

    • For an (m,n) predictor:

      • 2^m * n * Number of prediction entries

    • Example:

    • 2-bit predictor with 4096 entries:

      • 2^0 * 2 * 4k = 8k

    • (2,2) predictor, how many entries to be 8k:

      • 2^2 * 2 * x = 8k  x = 1k

    • Comparison in the next slide

    Accuracy of different schemes
    Accuracy of Different Schemes


    4096 Entries 2-bit BHT

    Unlimited Entries 2-bit BHT

    1024 Entries (2,2) BHT

    Frequency of Mispredictions


    Tournament branch predictor












    Tournament Branch Predictor

    • Used in Alpha 21264: Track both “local” and global history

    • Intended for mixed types of applications

    • Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T)

    Tournament branch predictor1

    Global history12-bit





















    Tournament Branch Predictor

    • Local predictor: use 10-bit local history, 3-bit counters

    • Global and choice predictors:


    Local historytable (1Kx10)

    Counters (1Kx3)




    Reducing branch stalls
    Reducing Branch Stalls

    • In MIPS, branch predicted as taken

      • We need the target address 

    • High Performance Instruction Delivery

      • Branch target buffer

      • integrated instruction fetch unit

      • predicting return addresses

    Need address at same time as prediction
    Need Address at Same Time as Prediction

    • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)


    Prediction accuracy: 90% (for instructions in the buffer)

    Hit rate in the buffer: 90% (for branches predicted taken)

    Taken branch frequency: 60%

    Probability (branch in buffer, but actually not taken) =

    Percent buffer hit rate × Percent incorrect predictions=90% × 10%=0.09

    Probability (branch not in buffer, but actually taken) = 10% × 60%=0.06

    Branch penalty =(0.09 + 0.06)× 2

    Branch penalty = 0.30

    Branch folding
    Branch Folding

    • Idea: to store one or more target instructions

      • instead of, or in addition to, the predicted target address.

    • Advantages:

      • it allows the branch-target buffer access to take longer than the time between successive instruction fetches

      • allows us to perform an optimization called branch folding

    • Branch Folding:

      • zero-cycle unconditional branches, and sometimes zero-cycle conditional branches.

    Branch target buffer summary

    Branch PC

    Predicted PC

    PC of instruction




    prediction state


    Yes: instruction is branch and use predicted PC as next PC

    No: branch not

    predicted, proceed normally

    (Next PC = PC+4)

    Branch Target Buffer (summary)

    • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)

      • Note: must check for branch match now, since can’t use wrong branch address

    • Example: BTB combined with BHT

    Return addresses prediction
    Return Addresses Prediction

    • Register indirect branch hard to predict address

      • If we use branch prediction buffer techniques in this situation doesn’t work:

      • Many callers, one callee

      • Jump to multiple return addresses from a single address (no PC-target correlation)

    • SPEC89 85% such branches for procedure return

    • Use stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate

    Short seminar
    Short Seminar

    • Section 2.10 on Pentium 4, Branch prediction

    • Pentium 4 Tomasulo

    Dynamic branch prediction summary
    Dynamic Branch Prediction Summary

    • Prediction becoming important part of scalar execution.

    • Branch History Table: 2 bits for loop accuracy.

    • Correlation: Recently executed branches correlated with next branch.

      • Either different branches.

      • Or different executions of same branches.

    • Tournament Predictor: more resources to competitive solutions and pick between them.

    • Branch Target Buffer: include branch address & prediction.

    • Return address stack for prediction of indirect jump.