A 256 Kbits L-TAGE branch predictor

A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC

Directly derived from: A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories

TAGE: TAgged GEometric history length predictors The genesis

Back around 2003 • 2bcgskew was state-of-the-art, but: • but was lagging behind neural inspired predictors on a few benchmarks • Just wanted to get best of both behaviors and maintain: • Reasonable implementation cost: • Use only global history • Medium number of tables • In-time response

The basis : A Multiple length global history predictor TO T1 T2 ? L(0) T3 L(1) L(2) T4 L(3) L(4)

GEometric History Length predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

Combining multiple predictions ? • Classical solution: • Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? • Neural inspired predictors, Jimenez and Lin 2001 • Use an adder tree instead of a meta-predictor • Partial matching • Use tagged tables and the longest matching history Chen et al 96, Michaud 2005

TO T1 T2 ∑ T3 L(1) L(2) T4 L(3) L(4) CBP-1 (2004): OGEHL Final computation through a sum L(0) Prediction=Sign 12 components 3.670 misp/KI

h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction TAGEGeometric history length + PPM-like + optimized update policy Tagless base predictor

Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

TAGE update policy • General principle: Minimize the footprint of the prediction. • Just update the longest history matching component and allocate at most one entry on mispredictions

U Tag Ctr A tagged table entry • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

Updating the U counter • If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 • Graceful aging: • Periodic shift of all U counters • implemented through the reset of a single bit

Allocating a new entry on a misprediction • Find a single “useless” entry with a longer history: • Priviledge the smallest possible history • To minimize footprint • But not too much • To avoid ping-pong phenomena • Initialize Ctr as weak and U as zero

Improve the global history • Address + conditional branch history: • path confusion on short histories  • Address + path: • Direct hashing leads to path confusion  • Represent all branches in branch history • Use also path history ( 1 bit per branch, limited to 16 bits)

Design tradeoff for CBP2 (1) • 13 components: • Bring the best accuracy on distributed traces • 8 components not very far ! • History length: • Min=4 , Max = 640 Could use any Min in [2,6] and any Max in [300, 2000]

Design tradeoff for CBP2 (2) • Tag width tradeoff: • (destructive) false match is better tolerated on shorter history • 7 bits on T1 to 15 bits on T12 • Tuning the number of table entries: • Smaller number for very long histories • Smaller number for very short histories

Adding a loop predictor • The loop predictor captures the number of iterations of a loop • When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction. • Advantages: • Very reliable • Small storage budget: 256 52-bit entries • Complexity ? • Might be difficult to manage speculative iteration numbers on deep pipelines

Using a kernel history and a user history • Traces mix user and kernel activities: • Kernel activity after exception • Global history pollution • Solution: use two separate global histories • User history is updated only in user mode • Kernel history is updated in both modes

L-TAGE submission accuracy (distributed traces) 3.314 misp/KI

Reducing L-TAGE complexity • Included 241,5 Kbits TAGE predictor: • 3.368 misp/KI • Loop predictor beneficial only on gzip: Might not be worth the extra complexity

Using less tables • 8 components 256 Kbits TAGE predictor: • 3.446 misp/KI

TAGE prediction computation time ? • 3 successive steps: • Index computation • Table read • Partial match + multiplexor • Does not fit on a single cycle: • But can be ahead pipelined !

Ahead pipelining a global history branch predictor (principle) • Initiate branch prediction X+1 cycles in advance to provide the prediction in time • Use information available: • X-block ahead instruction address • X-block ahead history • To ensure accuracy: • Use intermediate path information

Practice C A B bc Ahead pipelined TAGE: 4// prediction computations Ha A

3-branch ahead pipelined 8 component 256 Kbits TAGE 3.552 misp/KI

A final case for the Geometric History Length predictors • delivers state-of-the-art accuracy • uses only global information: • Very long history: 300+ bits !! • can be ahead pipelined • many effective design points • OGEHL or TAGE  • Nb of tables, history lengths

The End 

A 256 Kbits L-TAGE branch predictor

A 256 Kbits L-TAGE branch predictor

Presentation Transcript

A 256 Kbits L-TAGE branch predictor

The O-GEHL branch predictor

T-BAG: Bootstrap Aggregating the TAGE Predictor

A Penalty-Sensitive Branch Predictor

Global-Local Combined Branch History The Alternative Way to Improve TAGE Branch Predictor

Temporal Stream Branch Predictor (TS Predictor)

A 64 Kbytes ITTAGE indirect branch predictor

Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor

Branch Predictor Interface

Exploring Efficient SMT Branch Predictor Design

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

Branch Predictor Design for AE64000

TAGE-SC-L Branch Predictors

Storage Free Confidence Estimator for the TAGE predictor

TAGE-SC-L Branch Predictors

TAGE-SC-L Again MTAGE-SC

A Weather Predictor

The O-GEHL branch predictor