1 / 22

H-Pattern : A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation

H-Pattern : A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation. Indian Institute of Technology Madras Department of Computer Science & Engineering. Approach. Conditional branch instructions often follow patterns which periodically repeat.

bracha
Download Presentation

H-Pattern : A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. H-Pattern: A Hybrid Pattern Based Dynamic BranchPredictor with Performance Based Adaptation Indian Institute of Technology Madras Department of Computer Science & Engineering

  2. Approach Conditional branch instructions often follow patterns which periodically repeat. If a branch instruction is found to follow a certain repeating pattern, a predictor must have the ability to accurately predict its outcome for as long as the pattern persists. Predicting ALL patterns with periods of ANY length: Impossible, given a fixed storage budget.

  3. Approach STRATEGY:Restrict ourselves to capturing patterns with a period only up to a certain predetermined length Objective: Creating a predictor that captures patterns with periods of lengths of up to n-bits. Challenges: • Using minimum space • The patterns followed can change – must dynamically relearn

  4. Solution For every branch: Store local history of 2n bits If a branch instruction follows a pattern of execution with a period p, where p is at most equal to n, then the most recent set of n bits must be identical to the set of n bits that occurred p executions prior. outcome(hi) = outcome(hi+p) (where hi = ith most recent execution) To predict, all we do is compare the most recent n bits to successively older History Patterns (substrings of n bits of the local history), and stop at the first match. The bit, just after this matching substring, is our prediction for the next execution. (The picture on the next slide should clarify)

  5. Illustration Here, with n=8, we store a local history of 16 bits. The branch instruction follows a repeating pattern –(110)-, which has a period of 3. The bit string h0 to h7 (Current Pattern) matches precisely with the bit string h3 to h11 (Matched Current Pattern). The prediction returned is the bit just after the matched current pattern – h2.

  6. H-Pattern: nBPAT + AltPred nBPAT: n-Bit Pattern Predictor AltPred: Any other alternate branch predictor When no pattern is detected (i.e. no pattern match occurs), AltPred is used. When a pattern is detected, the better performing predictor is used.

  7. The nBPAT Predictor Every entry of the predictor is comprised of: • A 2n-bit shift register for local history • A saturating counter to keep track of the better performing predictor (as described in ‘Combining Branch Predictors’ by Scott McFarling) Storage: Various configurations possible – tagged/tagless/direct mapped/associative

  8. The nBPAT Algorithm To Predict: • Match the current pattern (h0 to hn-1) with successively older history patterns • If the first match is found at hi, then hi-1 is the predicted outcome. If the most significant bit of the saturating selection counter is 1, then return hi-1. • If there is no match, or if the most significant bit is 0, use AltPred To Update: • If AltPred mispredicted and nBPAT correctly predicted, increment the saturating selection counter. • If AltPred correctly predicted and nBPAT mispredicted, decrement the saturating selection counter. • If nBPAT was not ready, don’t change the saturating counter • Update the local history by inserting the outcome of the branch into the local history shift register

  9. Combinations of H-Pattern H-Pattern: Various configuration decisions AltPred Component: Several possible options, for instance: • Gshare • TAGE • ISL-TAGE nBPAT Storage Structure: • Tagged/Tagless • Associative/Direct Mapped

  10. H-Pattern with Gshare Configuration: • Tagless, direct-mapped table used for nBPAT – indexed by few of the least significant bits of the PC • 50% of the storage budget assigned to nBPAT Outcome: Distinct improvement in accuracy observed, as will be shown soon.

  11. H-Pattern with Gshare

  12. H-Pattern with TAGE/ISL-TAGE Minimal portion of storage allocated to nBPAT The storage structure must facilitate maximum accuracy by nBPAT for very small storage spaces. Proportion of the storage budget allocated to nBPAT was different for different budgets Improvement in accuracy was lesser than that achieved with Gshare

  13. H-Pattern with TAGE/ISL-TAGE CONFIGURATION: nBPAT STORAGE Partially tagged, 2-way set-associative. Selection Counter: 4-bits Useful Counter: Included in every entry. Serves as a measure of the effectiveness of an entry in the table. Decremented if: 1. No pattern match found 2. Misprediction by nBPAT & correct prediction by AltPred Incremented if misprediction by AltPred and correct prediction by nBPAT. All useful counters are reset periodically using a global reset counter. This correctly captures the notion of an entry in the table being effective or ineffective, and aids in the entry replacement policy.

  14. H-Pattern with TAGE/ISL-TAGE UPDATE ALGORITHM: • If the TAGE predictor MISPREDICTED and there is no tag match in nBPAT 2-way associative table, and, either of the 2 potential entry locations have Useful = 0, then, make Tag = [BranchTag] and Useful = [Maximum]. • If the entry ALREADY exists in the nBPAT 2-way associative table, then, • If nBPAT was not ready, OR, nBPAT mispredicted and TAGE correctly predicted, decrease useful. • If nBPAT correctly predicted and TAGE mispredicted, increase useful • Update the nBPAT entry as described earlier in the nBPAT algorithm • Update the TAGE/ISL-TAGE predictor

  15. Reference TAGE Configurations The optimized configuration for an 8-table TAGE predictor, as specified in the paper “A case for (partially) Tagged Geometric history length branch prediction”, by André Seznec and Pierre Michaud, was used. • 4KB: History Lengths = 5 to 127 • 32KB: History Lengths = 5 to 450 Whereas for the unlimited case, 18 tagged tables were used. History Lengths = 3 to 2000

  16. H-Pattern with TAGE Configurations • 4KB: Tag length was reduced by 1 in every alternate table starting from T2. 4-BPAT predictor used with 7-bit tagged entries & 3-bit useful counters. • 32KB: Table T6 of TAGE has been halved in size. 8-BPAT predictor used with 8-bit tagged entries & 4-bit useful counter. • Unlimited: 8-BPAT predictor used with 16-bit tagged width.

  17. H-Pattern with TAGE Mispredictions per Kilo Instructions – CBP 2014 Framework

  18. Reference ISL-TAGE Configurations • 4KB: Configuration was same as the 8-component predictor specified in the paper “A case for (partially) Tagged Geometric history length branch prediction”, by André Seznec and Pierre Michaud, with space freed from the base bimodal predictor by having only 2K prediction entries and 1K hysteresis entries to accommodate statistical corrector and loop predictor. History lengths = 5 to 126. • 32KB: Configuration (including history lengths) was identical to the one specified in the paper “A 64KBit ISL-TAGE branch predictor ”, by André Seznec, with all storage tables halved. • Unlimited: 18 tagged tables were used. History Lengths = 3 to 2000

  19. H-Pattern with ISL-TAGE Configurations • 4KB From the reference 4KB ISL-TAGE, freed one tag bit from every alternate table starting from T2. 4-BPAT predictor used with 7-bit tagged entries & 3-bit useful counters. • 32KB From the reference 32KB ISL-TAGE, halved the last shared table and reduced the size of statistical corrector and loop predictor. 4-BPAT predictor used with 6-bit tagged entries & 3-bit useful counters. • Unlimited In combination with the reference Unlimited ISL-TAGE predictor, an 8-BPAT predictor was used with 16-bit tagged entries & 4-bit useful counters.

  20. H-Pattern with ISL-TAGE Mispredictions per Kilo Instructions – CBP 2014 Framework

  21. Further Statistics: Success rates

  22. Thank You

More Related