dynamic branch prediction n.
Download
Skip this Video
Download Presentation
Dynamic Branch Prediction

Loading in 2 Seconds...

play fullscreen
1 / 30

Dynamic Branch Prediction - PowerPoint PPT Presentation


  • 173 Views
  • Uploaded on

Dynamic Branch Prediction. Ali Azarpeyvand. Tomasulo Review. Reservations stations: renaming to larger set of registers + buffering source operands Prevents registers as bottleneck Avoids WAR, WAW hazards of Scoreboard Allows loop unrolling in HW

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dynamic Branch Prediction' - keisha


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tomasulo review
Tomasulo Review
  • Reservations stations: renaming to larger set of registers + buffering source operands
    • Prevents registers as bottleneck
    • Avoids WAR, WAW hazards of Scoreboard
    • Allows loop unrolling in HW
  • Not limited to basic blocks (integer units gets ahead, beyond branches)
  • Lasting Contributions
    • Dynamic scheduling
    • Register renaming
    • Load/store disambiguation
  • 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264
outline
Outline
  • Dynamic Branch Prediction
    • Branch prediction buffer or branch history table
    • Correlating branch predictors
    • Tournament predictors
  • Branch target buffers
  • Integrated Instruction fetch unit
  • Return address predictors
dynamic branch prediction1
Dynamic Branch Prediction
  • Performance = ƒ(accuracy, cost of misprediction)
  • Branch History Table (branch-prediction buffer) is simplest
    • Lower bits of PC address index table of 1-bit values
    • Says whether or not branch taken last time
    • No address check
  • Problem: in a loop, 1-bit BHT will cause two mispredictions (example: 9 iterations before exit  80%):
  • Solution  2 bit
dynamic branch prediction2
Dynamic Branch Prediction
  • Solution: 2-bit scheme where change prediction only if get mispredictiontwice:
  • Dark: stop, not taken
  • Light: go, taken
bht accuracy
BHT Accuracy
  • Mispredict because either:
    • Wrong guess for that branch
    • Got branch history of wrong branch when index the table
  • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%,
  • 4096 about as good as infinite table(in Alpha 21164),
  • Branch penalty and branch frequency are also important
bht accuracy1
BHT Accuracy

4096 entry, two bit prediction

correlating branches
Correlating Branches
  • Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch
  • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table
  • In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters
    • Old 2-bit BHT is then a (0,2) predictor
examples
Examples

Code from eqntottfrom SPEC92

b3 has correlation with b1, b2

branch prediction result
Branch Prediction Result

1 bit predictor, (d is 0 or 2)

correlating prediction performance
Correlating Prediction Performance

One bit predictor with one bit correlation

correlating branches1
Correlating Branches

(2,2) predictor

    • Then behavior of recent branches selects between, say, four predictions of next branch, updating just that prediction
  • Simple implementation:
    • global history can be stored in a shift register

Branch address is concatenated withglobal branch history and then indexed.

number of stored bits
Number of Stored Bits
  • For an (m,n) predictor:
    • 2^m * n * Number of prediction entries
  • Example:
  • 2-bit predictor with 4096 entries:
    • 2^0 * 2 * 4k = 8k
  • (2,2) predictor, how many entries to be 8k:
    • 2^2 * 2 * x = 8k  x = 1k
  • Comparison in the next slide
accuracy of different schemes
Accuracy of Different Schemes

18%

4096 Entries 2-bit BHT

Unlimited Entries 2-bit BHT

1024 Entries (2,2) BHT

Frequency of Mispredictions

0%

tournament branch predictor

PC

Local

Predictor

Global

Predictor

Choice

Predictor

mux

Global

history

NT/T

Tournament Branch Predictor
  • Used in Alpha 21264: Track both “local” and global history
  • Intended for mixed types of applications
  • Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T)
tournament branch predictor1

Global history12-bit

Counters(4Kx2)

NT/T

12

1

Counters(4Kx2)

NT/T

0

1

0

1

0

1

0

1

0

1

0

1

local/global

1

Tournament Branch Predictor
  • Local predictor: use 10-bit local history, 3-bit counters
  • Global and choice predictors:

PC

Local historytable (1Kx10)

Counters (1Kx3)

NT/T

10

1

reducing branch stalls
Reducing Branch Stalls
  • In MIPS, branch predicted as taken
    • We need the target address 
  • High Performance Instruction Delivery
    • Branch target buffer
    • integrated instruction fetch unit
    • predicting return addresses
need address at same time as prediction
Need Address at Same Time as Prediction
  • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)
example
Example

Prediction accuracy: 90% (for instructions in the buffer)

Hit rate in the buffer: 90% (for branches predicted taken)

Taken branch frequency: 60%

Probability (branch in buffer, but actually not taken) =

Percent buffer hit rate × Percent incorrect predictions=90% × 10%=0.09

Probability (branch not in buffer, but actually taken) = 10% × 60%=0.06

Branch penalty =(0.09 + 0.06)× 2

Branch penalty = 0.30

branch folding
Branch Folding
  • Idea: to store one or more target instructions
    • instead of, or in addition to, the predicted target address.
  • Advantages:
    • it allows the branch-target buffer access to take longer than the time between successive instruction fetches
    • allows us to perform an optimization called branch folding
  • Branch Folding:
    • zero-cycle unconditional branches, and sometimes zero-cycle conditional branches.
branch target buffer summary

Branch PC

Predicted PC

PC of instruction

FETCH

=?

Extra

prediction state

bits

Yes: instruction is branch and use predicted PC as next PC

No: branch not

predicted, proceed normally

(Next PC = PC+4)

Branch Target Buffer (summary)
  • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)
    • Note: must check for branch match now, since can’t use wrong branch address
  • Example: BTB combined with BHT
return addresses prediction
Return Addresses Prediction
  • Register indirect branch hard to predict address
    • If we use branch prediction buffer techniques in this situation doesn’t work:
    • Many callers, one callee
    • Jump to multiple return addresses from a single address (no PC-target correlation)
  • SPEC89 85% such branches for procedure return
  • Use stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate
short seminar
Short Seminar
  • Section 2.10 on Pentium 4, Branch prediction
  • Pentium 4 Tomasulo
dynamic branch prediction summary
Dynamic Branch Prediction Summary
  • Prediction becoming important part of scalar execution.
  • Branch History Table: 2 bits for loop accuracy.
  • Correlation: Recently executed branches correlated with next branch.
    • Either different branches.
    • Or different executions of same branches.
  • Tournament Predictor: more resources to competitive solutions and pick between them.
  • Branch Target Buffer: include branch address & prediction.
  • Return address stack for prediction of indirect jump.