1 / 29

A 256 Kbits L-TAGE branch predictor

A 256 Kbits L-TAGE branch predictor. André Seznec IRISA/INRIA/HIPEAC. Directly derived from : A case for (partially) tagged branch predictors , A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories. TAGE: TAgged GEometric history length predictors.

bridie
Download Presentation

A 256 Kbits L-TAGE branch predictor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC

  2. Directly derived from: A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories

  3. TAGE: TAgged GEometric history length predictors The genesis

  4. Back around 2003 • 2bcgskew was state-of-the-art, but: • but was lagging behind neural inspired predictors on a few benchmarks • Just wanted to get best of both behaviors and maintain: • Reasonable implementation cost: • Use only global history • Medium number of tables • In-time response

  5. The basis : A Multiple length global history predictor TO T1 T2 ? L(0) T3 L(1) L(2) T4 L(3) L(4)

  6. GEometric History Length predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

  7. Combining multiple predictions ? • Classical solution: • Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? • Neural inspired predictors, Jimenez and Lin 2001 • Use an adder tree instead of a meta-predictor • Partial matching • Use tagged tables and the longest matching history Chen et al 96, Michaud 2005

  8. TO T1 T2 ∑ T3 L(1) L(2) T4 L(3) L(4) CBP-1 (2004): OGEHL Final computation through a sum L(0) Prediction=Sign 12 components 3.670 misp/KI

  9. h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction TAGEGeometric history length + PPM-like + optimized update policy Tagless base predictor

  10. Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

  11. Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

  12. TAGE update policy • General principle: Minimize the footprint of the prediction. • Just update the longest history matching component and allocate at most one entry on mispredictions

  13. U Tag Ctr A tagged table entry • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

  14. Updating the U counter • If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 • Graceful aging: • Periodic shift of all U counters • implemented through the reset of a single bit

  15. Allocating a new entry on a misprediction • Find a single “useless” entry with a longer history: • Priviledge the smallest possible history • To minimize footprint • But not too much • To avoid ping-pong phenomena • Initialize Ctr as weak and U as zero

  16. Improve the global history • Address + conditional branch history: • path confusion on short histories  • Address + path: • Direct hashing leads to path confusion  • Represent all branches in branch history • Use also path history ( 1 bit per branch, limited to 16 bits)

  17. Design tradeoff for CBP2 (1) • 13 components: • Bring the best accuracy on distributed traces • 8 components not very far ! • History length: • Min=4 , Max = 640 Could use any Min in [2,6] and any Max in [300, 2000]

  18. Design tradeoff for CBP2 (2) • Tag width tradeoff: • (destructive) false match is better tolerated on shorter history • 7 bits on T1 to 15 bits on T12 • Tuning the number of table entries: • Smaller number for very long histories • Smaller number for very short histories

  19. Adding a loop predictor • The loop predictor captures the number of iterations of a loop • When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction. • Advantages: • Very reliable • Small storage budget: 256 52-bit entries • Complexity ? • Might be difficult to manage speculative iteration numbers on deep pipelines

  20. Using a kernel history and a user history • Traces mix user and kernel activities: • Kernel activity after exception • Global history pollution • Solution: use two separate global histories • User history is updated only in user mode • Kernel history is updated in both modes

  21. L-TAGE submission accuracy (distributed traces) 3.314 misp/KI

  22. Reducing L-TAGE complexity • Included 241,5 Kbits TAGE predictor: • 3.368 misp/KI • Loop predictor beneficial only on gzip: Might not be worth the extra complexity

  23. Using less tables • 8 components 256 Kbits TAGE predictor: • 3.446 misp/KI

  24. TAGE prediction computation time ? • 3 successive steps: • Index computation • Table read • Partial match + multiplexor • Does not fit on a single cycle: • But can be ahead pipelined !

  25. Ahead pipelining a global history branch predictor (principle) • Initiate branch prediction X+1 cycles in advance to provide the prediction in time • Use information available: • X-block ahead instruction address • X-block ahead history • To ensure accuracy: • Use intermediate path information

  26. Practice C A B bc Ahead pipelined TAGE: 4// prediction computations Ha A

  27. 3-branch ahead pipelined 8 component 256 Kbits TAGE 3.552 misp/KI

  28. A final case for the Geometric History Length predictors • delivers state-of-the-art accuracy • uses only global information: • Very long history: 300+ bits !! • can be ahead pipelined • many effective design points • OGEHL or TAGE  • Nb of tables, history lengths

  29. The End 

More Related