1 / 16

CS 7960-4 Lecture 7

This paper discusses the combination of bimodal, local, and global branch predictors to improve prediction accuracy. Results show that a combination of local and global predictors achieves a prediction accuracy of 98.1%. The paper also explores future work on detecting conflicts, correlations, and common predictions through profiling and compiler analysis.

gloriahill
Download Presentation

CS 7960-4 Lecture 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 7960-4 Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993

  2. Bimodal Branch Prediction • Identifies most popular prediction in recent past • Updates happen during commit 1 0 PC 10-bit index 1024 entries 2-bit saturating counters

  3. Results • SPEC’89 programs simulated for 10M instrs • (modern studies use hard-to-predict programs) • A larger predictor reduces contention for counters • Prediction rates saturate at 93.5% (at 2K bytes) • (Fig.3)

  4. Local Predictors • Two-Level predictor: The first level has history, • the second level has saturating counters • History gets updated immediately 0 1 1 1 PC 1 0 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table

  5. Results • For small predictors, there could be contention • at both levels, resulting in inaccurate predictions • Will also take longer to warm up – after every • context switch • Does very well for large predictors – saturates at • 97.1%

  6. Global Predictors • A single history register – neighboring branches • have correlated results • However, the PC is not used 1 0 1024 entries 10-bit global history 2-bit saturating counters

  7. Do We Need PC? • Note that the global history reveals which branch • is being examined • Hence, it outdoes bimodal predictors when the • transistor budget is large (Fig.7) • Local predictor does better – it is more important • to identify the PC and local history than behavior • of neighboring branches

  8. Gselect • Use a combination of PC and global history • Bimodal and global prediction are special cases • (Fig.9) 1 0 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters

  9. GShare • Xor-ing 10 history bits and 10 PC bits has more • info than the concatenation of 5 bits of each and • more info than each individual component 01111110 00000001 11100001 01111111

  10. Terminology • GAG: Global history indexes into global array • of saturating counters • PAG: Per-address history indexes into global array • of saturating counters • GAP: Global history indexes into each PC’s private • array of counters (gselect) • PAP: Per-address history indexes into each PC’s • private array of counters

  11. Trade-Offs • Some predictors warm-up faster than others • Some programs benefit from global history, some • from local history • Some programs have branches that interfere • with each other • Note that a 64KB local predictor has fewer • saturating counters than a 64KB bimodal predictor • – the former won’t be better for every program

  12. Combining Predictors • Use an array of saturating counters to pick the • best available predictor for each PC Predictor A 1 0 PC 1024 entries Predictor B 2-bit saturating counters

  13. Results • The combination of local and gshare increases • the prediction accuracy to 98.1% (Fig.16) • For smaller transistor budgets, the combination • of bimodal and gshare is better (gshare is twice • the size to make sure the total is a power of two) • A 1KB combined predictor does as well as a • 16KB gselect predictor

  14. Future Work • Detect conflicts, correlations, and common • predictions through profiling/compiler analysis • Functions that compress information in history • or PC • Pipeline predictions – predict two branches ahead • Hierarchical predictors – get a quick prediction in • a cycle and a more accurate one two cycles later

  15. Next Week’s Paper • “Design Trade-Offs for the Alpha EV8 Conditional • Branch Predictor”, Seznec et al., ISCA’02

  16. Title • Bullet

More Related