1 / 32

Storage Free Confidence Estimator for the TAGE predictor

Storage Free Confidence Estimator for the TAGE predictor. André Seznec IRISA/INRIA. Why confidence estimation for branch predictors. Energy/performance tradeoffs: Guiding fetch gating or fetch throttling: Dynamic speculative structures resizing

Download Presentation

Storage Free Confidence Estimator for the TAGE predictor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA

  2. Why confidence estimation for branch predictors • Energy/performance tradeoffs: • Guiding fetch gating or fetch throttling: • Dynamic speculative structures resizing • Controlling SMT resource allocation through fetch policies • Fetch the “most” useful instructions • Dual Path execution

  3. What is confidence estimation ? • Assert a confidence to a prediction : • Is itlikelythat the predictionis correct ? • Generallydiscriminateonlylow and high confidence predictions: • High confidence: « very likely » to be correct • Low confidence: « not solikely » to be correct

  4. Confidence estimation for branch predictors • 1981, Jim Smith: • weakcounterspredictions are more likely to mispredict • 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters • Increment on correct prediction, reset on misprediction • low confidence < threshold ≤ high confidence • 1998 Enhanced JRS Grunwald et al: • Use the prediction in the index • A few otherproposals: • Self confidence for perceptrons .. Most studiesstill use enhanced JRS confidence estimators

  5. Metrics for confidence estimators(Grunwald et al 1998) • SENS Sensitivity: • Fraction of correct pred. classified as highconf. • PVP Predictive Value of a Positive test • Probability of highconf. to be correct • SPEC, Specificity: • Fraction of mispred. classified as lowconf. • PVN, Predictive Value of a Negative test • Probability of lowconf. to bemispredicted Differentqualities for different usages

  6. The current limits of confidence prediction • Discriminatingbetweenhigh and low confidence isunsufficient: • Whatis the misp. rate on high and low confidence ? • Malik et al: • Use probability for eachcounter value on an enhanced JRS • Enhanced JRS and state-of-the art branchpredictors ? • Eachpredictoritsown confidence estimator

  7. This study Cost-effective confidence estimatorfor TAGE • No storageoverhead • Discrimate: • Lowconf. pred. : ≈ 30 % misp. rate or more • Medium conf. pred.: 8-15% misp.rate • High conf. pred. : < 1 % misp rate

  8. TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

  9. TAGEGeometric history length + PPM-like + optimized update policy h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction Tagless base predictor

  10. Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

  11. Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

  12. A tagged table entry U Tag Ctr • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

  13. Updating the U counter • If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 • Graceful aging: • Periodic shift of all U counters • implemented through the reset of a single bit

  14. Allocating a new entry on a misprediction • Find a single “useless” entry with a longer history: • Priviledge the smallest possible history • To minimize footprint • But not too much • To avoid ping-pong phenomena • Initialize Ctr as weak and U as zero

  15. Confidence by observation on TAGE • Apart the prediction, the predictordelivers: • The provider component and the value of the predictioncounter • High correlationwith the quality of the predictions • The history of mispredictionscanalsobeobserved • burst of mispredictionsmightindicatepredictorwarming or program phase changing

  16. Experimental framework • 20 traces from the CBP-1 and 20 traces from the CBP-2 • 16Kbits TAGE : 5 tables, max hist 80 bits • 64Kbits TAGE : 8 tables, max hist 130 bits • 256Kbits TAGE : 9 tables, max hist 300 bits • Probability of misprediction as a metric of confidence: • Misprediction Per Kilopredictions (MKP)

  17. Bimodal as the provider component • Providesmany (oftenmost) of the predictions: • Allocation of a tagged table entry happens on a misprediction • Generally bimodal prediction = the bias of the branch • 256Kbits TAGE, bimodal= veryaccurateprediction • Oftenlessthan 1 MKP, alwayssignificantlylowerthan the global misprediction rate • 16Kbits TAGE: • Often bimodal= veryaccurateprediction • On demandingapps: bimodal not betterthanaverage

  18. Discriminating the bimodal predictions • Weakcounters: • Systematically more than250 MKP (generally more than 300 MKP) • Can beclassified as low confidence • « Identify » conflicts due to limitedpredictor size: • Wasthere a mispredictionprovided by the bimodal recently (10 last branches) ? • ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits • Can beclassified as medium confidence • The remaining: • High confidence: <10 MKP, generallymuchless

  19. A tagged component as the provider • Discrimate on the values of the prediction counter

  20. Tagged component as provider: a more thorough analysis • Weak, NearlyWeak , NearlySaturated: • For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher • Saturated: • Slightlylowerthan the global misprediction rate of the applications • Veryhigh confidence for predictable applications (< 10 MKP) • Not thathigh confidence for poorlypredictable applications (> 50 MKP) Problem: Saturatedoftenrepresents more than 50 % of the predictions

  21. Intermediate summary • High confidence class: • (Bimodal saturated, no recentmisprediction by bimodal) • Low confidence class: • Bimodal weak and not saturatedtagged • Medium confidence class: • (Bimodal and recentmisprediction by bimodal) • Taggedsaturated: • Depends on applications, predictor size etc • Very large class ..

  22. Tweaking the predictor to improve confidence

  23. How to improve confidence on tagged counter saturated class • Widening the predictioncounter ? • Not that good: • Slightlydecreasedaccuracy • Only marginal improvement on accuracy on saturated class • Modifying the counter update: • Transition to saturated state with a verylowprobability • P=1/128 in ourexperiments • Marginal accuracyloss ( ≈ 0.02 MPKI)

  24. Towards 3 confidence classes • Tagged Saturated is high confidence • Nearly Saturated is enlarged and is medium confidence

  25. Towards 3 confidence classes • Low confidence: • Weak bimodal + Weaktagged + NearlyWeaktagged • Medium confidence: • Bimodal recentlymispredicted + NearlySaturatedtagged • High confidence: • Bimodal saturated + Saturatedtagged

  26. Prediction and misprediction coverage Mispredictionrate Prediction coverage Misprediction coverage

  27. Behavior examples, 64Kbits Mispredictionrate Prediction coverage Misprediction coverage

  28. Predictions Mispredictions low medium high

  29. Summary • Manystudies on applications of confidence estimations, but a very few on confidence estimators. • Eachpredictorrequires a different confidence estimator • A verycost-effective and efficient confidence estimator for TAGE • Storage free, verylimitedlogic • Discriminatebetween 3 confidence classes: • Medium + lowconf > 90 % of the mispredictions • High conf in the range of 1 % mispredictions or less

  30. The End 

More Related