1 / 26

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

Merging Path, Global and Local Indexing in Perceptron Branch Prediction. David Tarjan. Published in:. An Ahead Pipelined Alloyed Perceptron with Single Cycle Access Time D. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004

cerise
Download Presentation

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merging Path, Global and Local Indexing in Perceptron Branch Prediction David Tarjan 1

  2. Published in: • An Ahead Pipelined Alloyed Perceptron with Single Cycle Access TimeD. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004 • Merging path and gshare indexing in perceptron branch predictionD. Tarjan and K. Skadron ACM Transactions on Architecture and Code Optimization, 2(3), Sep. 2005 2

  3. Why Yet Another Branch Predictor? • Single-thread performance growth stalling • Pipeline length still slowly increasing • Buffer sizes also increasing • No more “free” clock scaling • Power budget goes to more cores -> If we want more single-thread performance, we have to go for efficiency! 3

  4. Outline • What is a perceptron? • Ahead-pipelining • Precomputing local sums • Results for ahead pipelined alloyed perceptron • Hashed Indexing • Results of a hashed perceptron • Conclusion 4

  5. Main Contributions • Reduced latency of perceptron predictors to one cycle • Showed how to reduce number of weights/adders by N (for N:6-12) for a given history length • Reduced mispredictions by up to 27.2% over path-based perceptron 5

  6. Global Perceptron 6

  7. Path-based Perceptron 7

  8. Main Problems in Branch Prediction: • Accuracy (larger tables, more logic) • Latency (smaller tables, less logic) • Multiple Branch/Trace/Stream/etc. per cycle Addressing these two points Tradeoff! 8

  9. Normal pipelined perceptron 9

  10. Ahead pipelined perceptron p_addr(x) = addr(x) + direction(x) 10

  11. Impact of ahead-pipelining on accuracy 11

  12. Precomputing a local history perceptron 12

  13. Impact of adding local history 13

  14. Hashed Perceptron: Motivation • Want longer history for accuracy • But that means more adders • Also need more bits per weight for very long history • With ahead-pipelining kind of have two bits of history for each weight… • But more ahead-pipelining means fewer address bits to select weight… We have been here before! 14

  15. We want gshare for perceptrons! 15

  16. Hashed Perceptron 16

  17. Benefits? • We can reduce number of tables and adders by n, where n is the number of hist. bits per table • We can accurately predict linearly inseparable branches (two branches which have XOR pattern) 17

  18. Comparison of misprediction rates 18

  19. Performance Results 19

  20. Related Work • O-GEHL: Optimized GEometric History Length Predictor [Seznec2004] • gDAC: global Divide And Conquer [Loh2005] • PWLB: Piecewise Linear Branch Predictor [Jimenez2004] • TAGE: TAgged GEometric History Branch Predictor [Seznec&Michaud2006] 20

  21. Conclusion • Can make a perceptron predictor single cycle latency • Assigning multiple bits to a single weight helps for both accuracy and power • More accuracy is only good with low latency 21

  22. Q & A 22

  23. RET1 RET2 IFU1 IFU2 IFU3 DEC1 DEC2 RAT ROB DIS EX PREF DEC DEC EXEC WB P5 Microarchitecture P6 Microarchitecture TC NextIP TC Fetch Drive Alloc Rename Queue Schedule Dispatch Reg File Exec Flags Br Ck Drive NetBurst Microarchitecture (Willamette) ~30 stages ??? NetBurst Microarchitecture (Prescott) Why Yet Another Branch Predictor? (ca. 2003) Graphics from Prof. Hsien-Hsin Sean Lee presentation On the pentium pro/pentium 4 microarchitecture 23

  24. Let’s start with the familiar predictors 24

  25. Just to remind you 25

  26. CBP Accuracy Comparison 26

More Related