Merging path global and local indexing in perceptron branch prediction
Download
1 / 26

Merging Path, Global and Local Indexing in Perceptron Branch Prediction - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Merging Path, Global and Local Indexing in Perceptron Branch Prediction. David Tarjan. Published in:. An Ahead Pipelined Alloyed Perceptron with Single Cycle Access Time D. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Merging Path, Global and Local Indexing in Perceptron Branch Prediction' - cerise


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Published in
Published in: Prediction

  • An Ahead Pipelined Alloyed Perceptron with Single Cycle Access TimeD. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004

  • Merging path and gshare indexing in perceptron branch predictionD. Tarjan and K. Skadron ACM Transactions on Architecture and Code Optimization, 2(3), Sep. 2005

2


Why yet another branch predictor
Why Yet Another Branch Predictor? Prediction

  • Single-thread performance growth stalling

  • Pipeline length still slowly increasing

  • Buffer sizes also increasing

  • No more “free” clock scaling

  • Power budget goes to more cores

    -> If we want more single-thread performance,

    we have to go for efficiency!

3


Outline
Outline Prediction

  • What is a perceptron?

  • Ahead-pipelining

  • Precomputing local sums

  • Results for ahead pipelined alloyed perceptron

  • Hashed Indexing

  • Results of a hashed perceptron

  • Conclusion

4


Main contributions
Main Contributions Prediction

  • Reduced latency of perceptron predictors to one cycle

  • Showed how to reduce number of weights/adders by N (for N:6-12) for a given history length

  • Reduced mispredictions by up to 27.2% over path-based perceptron

5


Global perceptron
Global Perceptron Prediction

6


Path based perceptron
Path-based Perceptron Prediction

7


Main Problems in Branch Prediction: Prediction

  • Accuracy (larger tables, more logic)

  • Latency (smaller tables, less logic)

  • Multiple Branch/Trace/Stream/etc. per cycle

Addressing these two points

Tradeoff!

8



P addr x addr x direction x

Ahead pipelined perceptron Prediction

p_addr(x) = addr(x) + direction(x)

10





Hashed perceptron motivation
Hashed Perceptron: Motivation Prediction

  • Want longer history for accuracy

  • But that means more adders

  • Also need more bits per weight for very long history

  • With ahead-pipelining kind of have two bits of history for each weight…

  • But more ahead-pipelining means fewer address bits to select weight…

    We have been here before!

14



Hashed perceptron
Hashed Perceptron Prediction

16


Benefits
Benefits? Prediction

  • We can reduce number of tables and adders by n, where n is the number of hist. bits per table

  • We can accurately predict linearly inseparable branches (two branches which have XOR pattern)

17



Performance results
Performance Results Prediction

19


Related work
Related Work Prediction

  • O-GEHL: Optimized GEometric History Length Predictor [Seznec2004]

  • gDAC: global Divide And Conquer [Loh2005]

  • PWLB: Piecewise Linear Branch Predictor [Jimenez2004]

  • TAGE: TAgged GEometric History Branch Predictor [Seznec&Michaud2006]

20


Conclusion
Conclusion Prediction

  • Can make a perceptron predictor single cycle latency

  • Assigning multiple bits to a single weight helps for both accuracy and power

  • More accuracy is only good with low latency

21


Q & A Prediction

22


Why yet another branch predictor ca 2003

RET1 Prediction

RET2

IFU1

IFU2

IFU3

DEC1

DEC2

RAT

ROB

DIS

EX

PREF

DEC

DEC

EXEC

WB

P5 Microarchitecture

P6 Microarchitecture

TC NextIP

TC Fetch

Drive

Alloc

Rename

Queue

Schedule

Dispatch

Reg File

Exec

Flags

Br Ck

Drive

NetBurst Microarchitecture (Willamette)

~30 stages

???

NetBurst Microarchitecture (Prescott)

Why Yet Another Branch Predictor? (ca. 2003)

Graphics from Prof. Hsien-Hsin Sean Lee presentation On the pentium pro/pentium 4 microarchitecture

23



Just to remind you Prediction

25



ad