TAGE-SC-L Branch Predictors

TAGE-SC-L Branch Predictors André Seznec INRIA/IRISA

The TAGE-SC-L branch predictorSorry, nothing really new .. • TAGE, JILP 2006 • Considered as state-of-the-art global historypredictor • Can beaugmentedwithsmalladjunctpredictors Looppredictor: CBP-2 (2006) Statistical Corrector + LoopPredictor, Global historyCBP-3 (2011) Local historyMicro 2011

Optimized all parameters • Number, size, width of the tables • Types of the histories for the statistical components All that for decreasing the misprediction number by 3% !!

Global, local, skeleton histories (Main) TAGE Predictor PPC +Global history Prediction + Confidence Stat. Cor. LoopPredictor

TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !!

TAGE: Tagged and prediction by the longest history matching entry h[0:L1] pc pc pc h[0:L2] pc h[0:L3] ctr ctr ctr tag tag tag u u u 1 1 1 1 1 1 1 =? =? =? 1 1 prediction Tagless base predictor

Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through 4-bit counters

A tagged table entry Tag U Ctr • Ctr: 3-bit prediction counter • U: 2-bit counters • Was the entry recently useful ? • Tag: partial tag

Allocate entries on mispredictions • Allocate entries in longer historylength tables • On tables with U unset • Set Ctr to Weak and U to 0 • Limited storage budget: • Allocate 2 entries for 256Kbits • Allocate 1 or 2 for 32Kbits • UNLIMITED STORAGE BUDGET: • multiple entries allocated in different tables

Managing the (U)seful counter • Increment when avoids a misprediction • (Pred = taken) & (Alt ≠ taken) • 256K: Global decrement if « difficult » to allocate • 32K: Probabilistic decrement when conflict • Unlimited: don’t care

Adjunct predictors • TAGE tracks strong correlation with the global branch history • Small adjunct predictors to capture some missed correlation: • Loop predictor • Statistical Corrector

The loop predictor • Predict loop with constant number of iterations: • 16/32 entries • less than 5 bytes per entry • Capture loops with long bodies and/or irregular internal branches S: 1.2 %  M: 1 %  U:0.4%  Good tradeoff for the Championship Implementation: Not that great

The Statistical Corrector predictor • Branches with poor correlation with global history: • Sometimes better predicted by a single wide PC indexed counter than by TAGE • More generally, track cases such that: • « In this case (PC, history, prediction), TAGE is likely (>50 %) to mispredict »

Small predictor: very limited budget for the SC predictor • Just track the statistically PC biased branches • « TAGE predicts this direction on this branch, but in most cases this was wrong » • The corrector filter: A small partially tagged associative table 1.5 % misp. reduction: Much simpler than a loop predictor

Medium predictor • « Statistically » correlated branches: • Not strongly correlated with the global history, but exhibit a bias • better predicted by averaging than tags • neural  tags • Branches correlatedwith local history, • but irregular global history pattern (on other branches) • TAGE does not learn the pattern

MultiGehl Statistical Correlator Predictor H + LH PC Pred + Gehl-like Prediction + ctr value TAGE Stat. Corr. H PC Local hist.

Why does it work • The bias table indexedwith PC+TAGE output: • Correct (most of the time) • High counter value • Dominates, not many updates • Wrong • Othercounterscanbetrained • Correlation (if itexists) canbecaptured

MultiGehl Statistical Correlator Predictor for the Championship + RAS associatedhistory + 2 different local histories + simple choser 6.8 % mispreduction Prediction + ctr value TAGE Stat. Corr. H PC Local hist.

« Realistic » 256 Kbits TAGE-SC-L • « Only » • 12 equal size TAGE tables + • (local hist., global hist.) 4-tables SC • + loop predictor • No history tuning Only 2.8 % extra mispredictions

SC for Unlimited predictor • GEHL based SC predictor: • Use any form of history information • Very long global • Mutiple local • « Skeleton » global history • ignore some branches • Recycle old ideas from the MAC-RHSP predictor (2004)

SC for unlimited predictor • 460 predictor tables + 10 choser tables • Globally about 20 % less misp. than TAGE alone • If one removes only : • The bias: 1.6 % for a single table • All global history components: 3.7 % • All local history components: 3.9 % • The choser: 3.2 %

Conclusion • TAGE-SC-L fits (nearly) all storage sizes • 32Kbits ≈ 64Kbits CBP1 champion on CBP1 traces • 256Kbits ≈ 512Kbits CBP3 champion on CBP4 traces • Unlimited predictor: • poTAGE-SC does better

TAGE-SC-L Branch Predictors

TAGE-SC-L Branch Predictors

Presentation Transcript

A 256 Kbits L-TAGE branch predictor

Global-Local Combined Branch History The Alternative Way to Improve TAGE Branch Predictor

Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind

SC.912.L.16.3

SC.912.L.14.3

A 256 Kbits L-TAGE branch predictor

SC.912.L.18.9

TAGE-SC-L Branch Predictors

SC.912.L.17.5

SC.912.L.14.7

SC.912.L.15.8

SC.912.L.18.9

SC.912.L.15.6

SC.912.L.14.52

TAGE-SC-L Again MTAGE-SC

SC.912.L.16.13

SC.912.L.15.6

SC.912.L.17.20

SC.912.L.17.5