Dynamic branch prediction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Dynamic Branch Prediction PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Dynamic Branch Prediction. Ali Azarpeyvand. Tomasulo Review. Reservations stations: renaming to larger set of registers + buffering source operands Prevents registers as bottleneck Avoids WAR, WAW hazards of Scoreboard Allows loop unrolling in HW

Download Presentation

Dynamic Branch Prediction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dynamic branch prediction

Dynamic Branch Prediction

Ali Azarpeyvand


Tomasulo review

Tomasulo Review

  • Reservations stations: renaming to larger set of registers + buffering source operands

    • Prevents registers as bottleneck

    • Avoids WAR, WAW hazards of Scoreboard

    • Allows loop unrolling in HW

  • Not limited to basic blocks (integer units gets ahead, beyond branches)

  • Lasting Contributions

    • Dynamic scheduling

    • Register renaming

    • Load/store disambiguation

  • 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264


Outline

Outline

  • Dynamic Branch Prediction

    • Branch prediction buffer or branch history table

    • Correlating branch predictors

    • Tournament predictors

  • Branch target buffers

  • Integrated Instruction fetch unit

  • Return address predictors


Dynamic branch prediction1

Dynamic Branch Prediction

  • Performance = ƒ(accuracy, cost of misprediction)

  • Branch History Table (branch-prediction buffer) is simplest

    • Lower bits of PC address index table of 1-bit values

    • Says whether or not branch taken last time

    • No address check

  • Problem: in a loop, 1-bit BHT will cause two mispredictions (example: 9 iterations before exit  80%):

  • Solution  2 bit


Dynamic branch prediction2

Dynamic Branch Prediction

  • Solution: 2-bit scheme where change prediction only if get mispredictiontwice:

  • Dark: stop, not taken

  • Light: go, taken


Bht accuracy

BHT Accuracy

  • Mispredict because either:

    • Wrong guess for that branch

    • Got branch history of wrong branch when index the table

  • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%,

  • 4096 about as good as infinite table(in Alpha 21164),

  • Branch penalty and branch frequency are also important


Bht accuracy1

BHT Accuracy

4096 entry, two bit prediction


Unlimited entries

Unlimited Entries


Correlating branches

Correlating Branches

  • Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch

  • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table

  • In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters

    • Old 2-bit BHT is then a (0,2) predictor


Examples

Examples

Code from eqntottfrom SPEC92

b3 has correlation with b1, b2


Branch prediction result

Branch Prediction Result

1 bit predictor, (d is 0 or 2)


Correlating prediction performance

Correlating Prediction Performance

One bit predictor with one bit correlation


Correlating branches1

Correlating Branches

(2,2) predictor

  • Then behavior of recent branches selects between, say, four predictions of next branch, updating just that prediction

  • Simple implementation:

    • global history can be stored in a shift register

  • Branch address is concatenated withglobal branch history and then indexed.


    Number of stored bits

    Number of Stored Bits

    • For an (m,n) predictor:

      • 2^m * n * Number of prediction entries

    • Example:

    • 2-bit predictor with 4096 entries:

      • 2^0 * 2 * 4k = 8k

    • (2,2) predictor, how many entries to be 8k:

      • 2^2 * 2 * x = 8k  x = 1k

    • Comparison in the next slide


    Accuracy of different schemes

    Accuracy of Different Schemes

    18%

    4096 Entries 2-bit BHT

    Unlimited Entries 2-bit BHT

    1024 Entries (2,2) BHT

    Frequency of Mispredictions

    0%


    Tournament branch predictor

    PC

    Local

    Predictor

    Global

    Predictor

    Choice

    Predictor

    mux

    Global

    history

    NT/T

    Tournament Branch Predictor

    • Used in Alpha 21264: Track both “local” and global history

    • Intended for mixed types of applications

    • Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T)


    Predictor select

    Predictor Select


    Local predictor percentage

    Local Predictor Percentage


    Performance comparison

    Performance Comparison


    Tournament branch predictor1

    Global history12-bit

    Counters(4Kx2)

    NT/T

    12

    1

    Counters(4Kx2)

    NT/T

    0

    1

    0

    1

    0

    1

    0

    1

    0

    1

    0

    1

    local/global

    1

    Tournament Branch Predictor

    • Local predictor: use 10-bit local history, 3-bit counters

    • Global and choice predictors:

    PC

    Local historytable (1Kx10)

    Counters (1Kx3)

    NT/T

    10

    1


    Reducing branch stalls

    Reducing Branch Stalls

    • In MIPS, branch predicted as taken

      • We need the target address 

    • High Performance Instruction Delivery

      • Branch target buffer

      • integrated instruction fetch unit

      • predicting return addresses


    Need address at same time as prediction

    Need Address at Same Time as Prediction

    • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)


    Branch target buffer flow chart

    Branch Target Buffer flow chart


    Example

    Example

    Prediction accuracy: 90% (for instructions in the buffer)

    Hit rate in the buffer: 90% (for branches predicted taken)

    Taken branch frequency: 60%

    Probability (branch in buffer, but actually not taken) =

    Percent buffer hit rate × Percent incorrect predictions=90% × 10%=0.09

    Probability (branch not in buffer, but actually taken) = 10% × 60%=0.06

    Branch penalty =(0.09 + 0.06)× 2

    Branch penalty = 0.30


    Branch folding

    Branch Folding

    • Idea: to store one or more target instructions

      • instead of, or in addition to, the predicted target address.

    • Advantages:

      • it allows the branch-target buffer access to take longer than the time between successive instruction fetches

      • allows us to perform an optimization called branch folding

    • Branch Folding:

      • zero-cycle unconditional branches, and sometimes zero-cycle conditional branches.


    Branch target buffer summary

    Branch PC

    Predicted PC

    PC of instruction

    FETCH

    =?

    Extra

    prediction state

    bits

    Yes: instruction is branch and use predicted PC as next PC

    No: branch not

    predicted, proceed normally

    (Next PC = PC+4)

    Branch Target Buffer (summary)

    • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)

      • Note: must check for branch match now, since can’t use wrong branch address

    • Example: BTB combined with BHT


    Return addresses prediction

    Return Addresses Prediction

    • Register indirect branch hard to predict address

      • If we use branch prediction buffer techniques in this situation doesn’t work:

      • Many callers, one callee

      • Jump to multiple return addresses from a single address (no PC-target correlation)

    • SPEC89 85% such branches for procedure return

    • Use stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate


    Accuracy of return address predictor

    Accuracy of Return Address Predictor


    Short seminar

    Short Seminar

    • Section 2.10 on Pentium 4, Branch prediction

    • Pentium 4 Tomasulo


    Dynamic branch prediction summary

    Dynamic Branch Prediction Summary

    • Prediction becoming important part of scalar execution.

    • Branch History Table: 2 bits for loop accuracy.

    • Correlation: Recently executed branches correlated with next branch.

      • Either different branches.

      • Or different executions of same branches.

    • Tournament Predictor: more resources to competitive solutions and pick between them.

    • Branch Target Buffer: include branch address & prediction.

    • Return address stack for prediction of indirect jump.


  • Login