1 / 32

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction. ISCA '98. Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture University Polit ècnica de Catalunya Presented by Danyao Wang ECE1718, Fall 2008. Overview.

lizina
Download Presentation

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic History-Length Fitting:A third level of adaptivity for branch prediction ISCA '98 Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture University Politècnica de Catalunya Presented by Danyao Wang ECE1718, Fall 2008

  2. Overview • Branch prediction background • Dynamic branch predictors • Dynamic history-length fitting (DHLF) • Without context switches • With context switches • Results • Conclusion

  3. Why branch prediction? • Superscalar processors with deep pipelines • Intel Core 2 Duo: 14 stages • AMD Athlon 64: 12 stages • Intel Pentium 4: 31 stages • Many cycles before branch is resolved • Wasting time if wait… • Would be good if can do some useful work… • Branch prediction!

  4. Branch resolved Branch fetched Predict taken. Fetch from L1 Execute speculatively Validate prediction: What does it do? sub r1, r2, r3bne r1, r0, L1 add r4, r5, r6 …L1: add r4, r7, r8 sub r9, r4, r2 fetch decode sub fetch decode bne fetch decode add fetch decode sub Time Correct

  5. Branch resolved Branch fetched Predict taken. Fetch from L1 squash Execute speculatively Validate prediction: What happens when mispredicted? sub r1, r2, r3bne r1, r0, L1 add r4, r5, r6 …L1: add r4, r7, r8 sub r9, r4, r2 fetch decode sub fetch decode bne fetch decode add fetch decode sub Time Incorrect!

  6. How to predict branches? • Statically at compile time • Simple hardware • Not accurate enough… • Dynamically at execution time • Hardware predictors • Last-outcome predictor • Saturation counter • Pattern predictor • Tournament predictor More Complex More Accurate

  7. Last-Outcome Branch Predictor • Simplest dynamic branch predictor • Branch prediction table with 1-bit entries • Intuition: history repeats itself 1-bit Prediction: T or NT • Read at Fetch • Write on misprediction lower N bits of PC PC index 2N entries Branch Prediction Table

  8. Pred. Not-Taken Pred. Taken T T T N T 00 01 10 11 N N N Saturation Counter Predictor • Observation: branches highly bimodal • n-bit saturation counter • Hysteresis • n-bit entries in branch prediction table Strong bias e.g. 2-bit bimodal predictor WEAK bias

  9. Pattern Predictors • Near-by branches often correlate • Looks for patterns in branch history • Branch History Register (BHR): m most recent branch outcomes saturation counter Two-Level Predictor lower n bits of PC PC f 2N entries N-bit index BHR m-bit history Branch Prediction Table

  10. Tournament Predictor • No one-size-suits-all predictor • Dynamically choose among different predictors Predictor A PC Predictor B Predictor C Chooser or metapredictor

  11. What is the best predictor? Optimal Better

  12. Observations • Predictor performance depends on history length • Optimal history length differs for programs • Predictors with fixed history length underperforming potential • … dynamic history length?

  13. Dynamic History-Length Fitting (DHLF)

  14. Intuition • Tournament predictor • Picks best out of many predictors • Spatial multiplexing • Area cost … • DHLF: time multiplexing • Try different history lengths during execution • Adapt history length to code • Hope to find the best one

  15. Predetermined 2-Level Predictor Revisited saturation counter lower n bits of PC PC f 2n entries n-bit index BHR m-bit history Branch Prediction Table Figure out dynamically • Index = f(PC, BHR) • gshare, f = xor, m < n • 2-bit saturation counter

  16. DHLF Approach • Current history length • Best so far length • Misprediction counter • Branch counter • Table of measured misprediction rates per length • Initialized to zero • Sampling at fixed intervals (stepsize) • Try new length: get MR • Adjust if worse than best seen before • Move to a random length if length has not changed for a while • Avoids local minima

  17. DHLF Examples Index = 12 bits step = 16K Optimal

  18. Experimental Methodology • SPECint95 • gshare and dhlf-gshare • Trace-driven simulation • Simulated up to 200M conditional branches • Branch history register & pattern history table immediately updated with the true outcome

  19. DHLF Performance • Area overhead • Index length = 10; step size = 16K; overhead = 7% • Index length = 16; step size = 16K; overhead = 0.02% Better

  20. Optimization Strategies • Step size • Small: learns faster • Has to be big enough for meaningful misprediction stats • Big: learns slower • Change length incrementally • Test as many lengths as possible • Warm-up period • No MR count for 1 interval after length change

  21. Context Switches • Branch prediction table trashed periodically • Lower prediction accuracy immediately after a context switch • Context switch frequency affects optimal history length

  22. Impact on Misprediction Rate Context-switch distance: # branches executed between context switches Better gshare. Index = 16 bits

  23. Coping with Context Switches • Upon context switch • Discard current misprediction counter • Save current predictor data • misprediction table • current history length • Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter • Returning from a context switch • Warm-up: no MR counter for 1 interval

  24. DHLF with Context Switches x dhlf-gshare with step value = 16K gshare with all possible history length Misprediction rate Better Branch prediction table flush every 70K instructions to simulate context switch.

  25. Contributions • Dynamically finds near-optimal history lengths • Performs well for programs with different branch behaviours • Performs well under context switches • Can be applied to any two-level branch predictor • Small area overhead

  26. Backup Slides

  27. DHLF Performance: SPECint95 Better Better dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch)

  28. DHLP with Context Switches Better Better dhlf-gshare; step size = 16K; context-switch distance = 70K

  29. dhlf-gskew Better Step value = 16K. Compared to all history lengths for gskew,

  30. dhlf-gskew with Context Switch Better Step size = 16K; Context-switch distance = 70K.

  31. DHLF Structure DHLF Data Structure Initial history length Misprediction table 0 Run next interval 1 N entries step dynamic branches N current misprediction > min achieved? ptr. to min. misprediction count ptr. to entry for current history length Yes branch counter No Adjust history length misprediction counter

  32. Questions • Is fixed context switch distance realistic? • Does updating the PHT with true branch data immediately affect results? • Previous studies show little impact due to this

More Related