Merging path global and local indexing in perceptron branch prediction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Merging Path, Global and Local Indexing in Perceptron Branch Prediction PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Merging Path, Global and Local Indexing in Perceptron Branch Prediction. David Tarjan. Published in:. An Ahead Pipelined Alloyed Perceptron with Single Cycle Access Time D. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004

Download Presentation

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Merging path global and local indexing in perceptron branch prediction

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

David Tarjan

1


Published in

Published in:

  • An Ahead Pipelined Alloyed Perceptron with Single Cycle Access TimeD. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004

  • Merging path and gshare indexing in perceptron branch predictionD. Tarjan and K. Skadron ACM Transactions on Architecture and Code Optimization, 2(3), Sep. 2005

2


Why yet another branch predictor

Why Yet Another Branch Predictor?

  • Single-thread performance growth stalling

  • Pipeline length still slowly increasing

  • Buffer sizes also increasing

  • No more “free” clock scaling

  • Power budget goes to more cores

    -> If we want more single-thread performance,

    we have to go for efficiency!

3


Outline

Outline

  • What is a perceptron?

  • Ahead-pipelining

  • Precomputing local sums

  • Results for ahead pipelined alloyed perceptron

  • Hashed Indexing

  • Results of a hashed perceptron

  • Conclusion

4


Main contributions

Main Contributions

  • Reduced latency of perceptron predictors to one cycle

  • Showed how to reduce number of weights/adders by N (for N:6-12) for a given history length

  • Reduced mispredictions by up to 27.2% over path-based perceptron

5


Global perceptron

Global Perceptron

6


Path based perceptron

Path-based Perceptron

7


Merging path global and local indexing in perceptron branch prediction

Main Problems in Branch Prediction:

  • Accuracy (larger tables, more logic)

  • Latency (smaller tables, less logic)

  • Multiple Branch/Trace/Stream/etc. per cycle

Addressing these two points

Tradeoff!

8


Merging path global and local indexing in perceptron branch prediction

Normal pipelined perceptron

9


P addr x addr x direction x

Ahead pipelined perceptron

p_addr(x) = addr(x) + direction(x)

10


Merging path global and local indexing in perceptron branch prediction

Impact of ahead-pipelining on accuracy

11


Merging path global and local indexing in perceptron branch prediction

Precomputing a local history perceptron

12


Merging path global and local indexing in perceptron branch prediction

Impact of adding local history

13


Hashed perceptron motivation

Hashed Perceptron: Motivation

  • Want longer history for accuracy

  • But that means more adders

  • Also need more bits per weight for very long history

  • With ahead-pipelining kind of have two bits of history for each weight…

  • But more ahead-pipelining means fewer address bits to select weight…

    We have been here before!

14


We want gshare for perceptrons

We want gshare for perceptrons!

15


Hashed perceptron

Hashed Perceptron

16


Benefits

Benefits?

  • We can reduce number of tables and adders by n, where n is the number of hist. bits per table

  • We can accurately predict linearly inseparable branches (two branches which have XOR pattern)

17


Comparison of misprediction rates

Comparison of misprediction rates

18


Performance results

Performance Results

19


Related work

Related Work

  • O-GEHL: Optimized GEometric History Length Predictor [Seznec2004]

  • gDAC: global Divide And Conquer [Loh2005]

  • PWLB: Piecewise Linear Branch Predictor [Jimenez2004]

  • TAGE: TAgged GEometric History Branch Predictor [Seznec&Michaud2006]

20


Conclusion

Conclusion

  • Can make a perceptron predictor single cycle latency

  • Assigning multiple bits to a single weight helps for both accuracy and power

  • More accuracy is only good with low latency

21


Merging path global and local indexing in perceptron branch prediction

Q & A

22


Why yet another branch predictor ca 2003

RET1

RET2

IFU1

IFU2

IFU3

DEC1

DEC2

RAT

ROB

DIS

EX

PREF

DEC

DEC

EXEC

WB

P5 Microarchitecture

P6 Microarchitecture

TC NextIP

TC Fetch

Drive

Alloc

Rename

Queue

Schedule

Dispatch

Reg File

Exec

Flags

Br Ck

Drive

NetBurst Microarchitecture (Willamette)

~30 stages

???

NetBurst Microarchitecture (Prescott)

Why Yet Another Branch Predictor? (ca. 2003)

Graphics from Prof. Hsien-Hsin Sean Lee presentation On the pentium pro/pentium 4 microarchitecture

23


Let s start with the familiar predictors

Let’s start with the familiar predictors

24


Merging path global and local indexing in perceptron branch prediction

Just to remind you

25


Cbp accuracy comparison

CBP Accuracy Comparison

26


  • Login