1 / 14

Lecture 10 : Branch Prediction and Instruction Delivery

Lecture 10 : Branch Prediction and Instruction Delivery. Branch target buffer, return address prediction, tournament predictor, high-performance instruction delivery. Correlating Branch Predictor. General form: (m, n) predictor m bits for global history, n bits for local history

vea
Download Presentation

Lecture 10 : Branch Prediction and Instruction Delivery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10: Branch Prediction and Instruction Delivery Branch target buffer, return address prediction, tournament predictor, high-performance instruction delivery

  2. Correlating Branch Predictor General form: (m, n) predictor • m bits for global history, n bits for local history • Records correlation between m+1 branches • Simple implementation: global history can be store in a shift register • Example: (2,2) predictor, 2-bit global, 2-bit local • Branch address (4 bits) 2-bits per branch local predictors Prediction 2-bit global branch history (01 = not taken then taken)

  3. Accuracy of Different Schemes(Figure 3.15, p. 206) 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT Frequency of Mispredictions

  4. Branch PC Predicted PC Branch Target Buffer • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) • Note: must check for branch match now, since can’t use wrong branch address • Example: BTB combined with BHT PC of instruction FETCH =? Extra prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4)

  5. EX: BHT correct rate is 95%, BTB hit rate is 95% Average miss penalty is 6 cycles How much is the branch penalty? Estimate Branch Penalty

  6. Return Addresses Prediction • Register indirect branch hard to predict address • Many callers, one callee • Jump to multiple return addresses from a single address (no PC-target correlation) • SPEC89 85% such branches for procedure return • Since stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate

  7. Accuracy of Return Address Predictor

  8. Tournament Branch Predictor • Used in Alpha 21264: Track both “local” and global history • Intended for mixed types of applications • Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T) PC Local Predictor Global Predictor Choice Predictor mux Global history NT/T

  9. Tournament Branch Predictor • Local predictor: use 10-bit local history, shared 3-bit counters • Global and choice predictors: PC Local historytable (1Kx10) Counters (1Kx3) NT/T 10 1 Global history12-bit Counters(4Kx2) NT/T 12 1 Counters(4Kx2) NT/T 0 1 0 1 0 1 0 1 0 1 0 1 local/global 1

  10. Branch Prediction With n-way Issue • Branches will arrive up to n times faster in an n-issue processor • Amdahl’s Law => relative impact of the control stalls will be larger with the lower potential CPI in an n-issue processor

  11. Integrated Instruction Fetch Units • Integrated branch prediction: branch predictor becomes part of the instruction fetch unit • Instruction prefetch: fetch ahead to deliver multiple instructions per cycle • Instruction memory access and buffering: may access multiple cache lines in one cycle, use prefetch to hide the cost • Another approach: trace cache

  12. Fetch predictor Predicts next fetch addresses to avoid fetch delay; may pre-predict branch direction; may be integrated with I-cache Branch predictor overrides and trains fetch predictor Instruction Fetch Unit Fetch Predictor I-cache Fetch Branch Predictor Decode/REN Out-of-erder Execution Engine In-order commit

  13. 21264 uses tournament predictor (29 Kbits) Earlier 21164 uses a simple 2-bit predictor with 2K entries (or a total of 4 Kbits) SPEC95 benchmarks, 21264 outperforms 21264 avg. 11.5 mispredictions per 1000 instructions 21164 avg. 16.5 mispredictions per 1000 instructions Reversed for transaction processing (TP) ! 21264 avg. 17 mispredictions per 1000 instructions 21164 avg. 15 mispredictions per 1000 instructions TP code much larger & 21164 hold 2X branch predictions based on local behavior (2K vs. 1K local predictor in the 21264) Pitfall: Sometimes bigger and dumber is better

  14. Dynamic Branch Prediction Summary • Prediction becoming important part of scalar execution • Branch History Table: 2 bits for loop accuracy • Correlation: Recently executed branches correlated with next branch. • Either different branches • Or different executions of same branches • Tournament Predictor: more resources to competitive solutions and pick between them • Branch Target Buffer: include branch address & prediction • Return address stack for prediction of indirect jump

More Related