1 / 65

Two research studies related to branch prediction and instruction sequencing

Two research studies related to branch prediction and instruction sequencing. André Seznec INRIA/IRISA. Storage Free Confidence Estimator for the TAGE predictor. Why confidence estimation for branch predictors. Energy/performance tradeoffs: Guiding fetch gating or fetch throttling:

damali
Download Presentation

Two research studies related to branch prediction and instruction sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA

  2. Storage Free Confidence Estimator for the TAGE predictor

  3. Why confidence estimation for branch predictors • Energy/performance tradeoffs: • Guiding fetch gating or fetch throttling: • Dynamic speculative structures resizing • Controlling SMT resource allocation through fetch policies • Fetch the “most” useful instructions • Dual Path execution

  4. What is confidence estimation ? • Assert a confidence to a prediction : • Is itlikelythat the predictionis correct ? • Generallydiscriminateonlylow and high confidence predictions: • High confidence: « very likely » to be correct • Low confidence: « not solikely » to be correct

  5. Confidence estimation for branch predictors • 1981, Jim Smith: • weakcounterspredictions are more likely to mispredict • 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters • Increment on correct prediction, reset on misprediction • low confidence < threshold ≤ high confidence • 1998 Enhanced JRS Grunwald et al: • Use the prediction in the index • A few otherproposals: • Self confidence for perceptrons .. Most studiesstill use enhanced JRS confidence estimators

  6. Metrics for confidence estimators(Grunwald et al 1998) • SENS Sensitivity: • Fraction of correct pred. classified as highconf. • PVP Predictive Value of a Positive test • Probability of highconf. to be correct • SPEC, Specificity: • Fraction of mispred. classified as lowconf. • PVN, Predictive Value of a Negative test • Probability of lowconf. to bemispredicted Differentqualities for different usages

  7. The current limits of confidence prediction • Discriminatingbetweenhigh and low confidence isunsufficient: • Whatis the misp. rate on high and low confidence ? • Malik et al: • Use probability for eachcounter value on an enhanced JRS • Enhanced JRS and state-of-the art branchpredictors ? • Eachpredictoritsown confidence estimator

  8. This study Cost-effective confidence estimator for TAGE • No storageoverhead • Discrimate: • Lowconf. pred. : ≈ 30 % misp. rate or more • Medium conf. pred.: 8-15% misp.rate • High conf. pred. : < 1 % misp rate

  9. TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

  10. TAGEGeometric history length + PPM-like + optimized update policy h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction Tagless base predictor

  11. Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

  12. Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

  13. A tagged table entry U Tag Ctr • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

  14. Confidence by observation on TAGE • Apart the prediction, the predictordelivers: • The provider component and the value of the predictioncounter • High correlationwith the quality of the predictions • The history of mispredictionscanalsobeobserved • burst of mispredictionsmightindicatepredictorwarming or program phase changing

  15. Experimental framework • 20 traces from the CBP-1 and 20 traces from the CBP-2 • 16Kbits TAGE : 5 tables, max hist 80 bits • 64Kbits TAGE : 8 tables, max hist 130 bits • 256Kbits TAGE : 9 tables, max hist 300 bits • Probability of misprediction as a metric of confidence: • Misprediction Per Kilopredictions (MKP)

  16. Bimodal as the provider component • Providesmany (oftenmost) of the predictions: • Allocation of a tagged table entry happens on a misprediction • Generally bimodal prediction = the bias of the branch • 256Kbits TAGE, bimodal= veryaccurateprediction • Oftenlessthan 1 MKP, alwayssignificantlylowerthan the global misprediction rate • 16Kbits TAGE: • Often bimodal= veryaccurateprediction • On demandingapps: bimodal not betterthanaverage

  17. Discriminating the bimodal predictions • Weakcounters: • Systematically more than250 MKP (generally more than 300 MKP) • Can beclassified as low confidence • « Identify » conflicts due to limitedpredictor size: • Wasthere a mispredictionprovided by the bimodal recently (10 last branches) ? • ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits • Can beclassified as medium confidence • The remaining: • High confidence: <10 MKP, generallymuchless

  18. A tagged component as the provider • Discrimate on the values of the prediction counter

  19. Tagged component as provider: a more thorough analysis • Weak, NearlyWeak , NearlySaturated: • For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher • Saturated: • Slightlylowerthan the global misprediction rate of the applications • Veryhigh confidence for predictable applications (< 10 MKP) • Not thathigh confidence for poorlypredictable applications (> 50 MKP) Problem: Saturatedoftenrepresents more than 50 % of the predictions

  20. Intermediate summary • High confidence class: • (Bimodal saturated, no recentmisprediction by bimodal) • Low confidence class: • Bimodal weak and not saturatedtagged • Medium confidence class: • (Bimodal and recentmisprediction by bimodal) • Taggedsaturated: • Depends on applications, predictor size etc • Very large class ..

  21. Tweaking the predictor to improve confidence

  22. How to improve confidence on tagged counter saturated class • Widening the predictioncounter ? • Not that good: • Slightlydecreasedaccuracy • Only marginal improvement on accuracy on saturated class • Modifying the counter update: • Transition to saturated state with a verylowprobability • P=1/128 in ourexperiments • Marginal accuracyloss ( ≈ 0.02 MPKI)

  23. Towards 3 confidence classes • Tagged Saturated is high confidence • Nearly Saturated is enlarged and is medium confidence

  24. Towards 3 confidence classes • Low confidence: • Weak bimodal + Weaktagged + NearlyWeaktagged • Medium confidence: • Bimodal recentlymispredicted + NearlySaturatedtagged • High confidence: • Bimodal saturated + Saturatedtagged

  25. Prediction and misprediction coverage Mispredictionrate Prediction coverage Misprediction coverage

  26. Behavior examples, 64Kbits Mispredictionrate Prediction coverage Misprediction coverage

  27. Predictions Mispredictions low medium high

  28. Summary on confidence estimation • Manystudies on applications of confidence estimations, but a very few on confidence estimators. • Eachpredictorrequires a different confidence estimator • A verycost-effective and efficient confidence estimator for TAGE • Storage free, verylimitedlogic • Discriminatebetween 3 confidence classes: • Medium + lowconf > 90 % of the mispredictions • High conf in the range of 1 % mispredictions or less

  29. SYRANTwith Nathanael Prémillieu « Moderate cost » control independence exploitation

  30. Why ? • Branchpred. accuracyisreaching a plateau: • TAGE 2006, • ? • Trysomethingelse ..

  31. Control flow reconvergence Branch (if) taken path (else) not-taken path Reconvergence point Instruction flow

  32. Exploiting Control flow reconvergence Misprediction ! Can we save some useful work after the the reconvergence point

  33. Control Dependent (CD) To bedetected Shoud be conserved Reconvergence point To invalidate Control Independent Data Independent (CIDI) Control Independent Data Dependent (CIDD)

  34. Difficulties • Not the same renaming scheme on both paths: • How to conserve results ? • Identification of the reconvergence point: • Check against all previously fetched instructions on the wrong path ? • Identification of CIDI and CIDD instructions ?

  35. SYmmetric Resource Allocation on Not-taken and Taken paths Taken path Not-taken path P0 P0 Physical registers (LSQ entries, ROB entries) P1 P1 Branch P2 P2 P3 P3 Unused registers Gap P4 P4 P5 P5 P6 P6 Reconvergence point P7 P7 P8 P8 Insert gaps to reuse same physical registers

  36. Register validity through a tagging process at rename stage at refetch On a misprediction, increment the tag: X to Y X1 R1 X1 X2 R2 X2 T2 Branch Y3 R5 X3 Execution Y4 X4 N4 R21 R6 R7 R7 Y5 X5 N5 Reconvergence R9 X6 Y6 I0 X7 R5 X7 I1 T7 R6,R7 R1,R2 I2 Y8 R6 X8 R5,R21 I3 X9 X9 R7 R5,R1 Predicted path Corrected path

  37. Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1

  38. Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1

  39. Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1

  40. Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1

  41. Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1

  42. Reconvergence detection • Precise detection would require checking every PC for each instruction • Use approximate detection • Detect the first branch after reconvergence

  43. Approximate detection of the reconvergence point Shadow Branch List Active Branch List Branch Direction NbR B1 1 T B3 17 NT B2 12 T B4 22 NT B3 17 NT B5 23 T T B4 22 NT B6 29 B5 23 T B7 40 NT T B6 29 40 NT B7 Copy wrongpath on branchmispredictiondetection

  44. ABL SBL B1 1 T B3 17 NT B2 12 NT B4 22 NT B'3 23 T B5 23 T T B'4 27 NT B6 29 B'5 28 NT B7 40 NT T B6 32 Allows to monitor the resource consumption on both paths

  45. WP RP Taken Taken Not-Taken B1 B1 B1 B2 B2 RP2 RP2 RP1 RP1 RP1 RANT Determine the gap Use the gap

  46. Gap size issue • The twopathsmaybeverydifferent: • Waste of resource • Sometimes 100’s of instructions • Differentfilters: • Onlytrywhen gap size islimited • Onlytry if wrongpathwas the longest • Onlytry if branch confidence islow (or medium) • Onlytry if reconvergence point/gap confidence ishigh

  47. Continue execution after branch misprediction resolution • On « normal » superscalar processors: • Killevery instruction after the misprediction • Control independence exploitation: • Let execution continue untilresources are claimed back Phantomexecution

  48. Preliminary performance evaluation • 8-way superscalar, • deep pipeline 20-stage • Very large instruction window • TAGE predictor • SPEC 2006

  49. Reconvergenceisdetected in most cases

More Related