Perceptrons branch prediction and its recent developments
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Perceptrons Branch Prediction and its ’ recent developments PowerPoint PPT Presentation


  • 40 Views
  • Uploaded on
  • Presentation posted in: General

Perceptrons Branch Prediction and its ’ recent developments. Mostly based on the Dynamic Branch Prediction with Perceptrons Daniel A. Jim´enez Calvin Lin By Shugen Li. Introduction.

Download Presentation

Perceptrons Branch Prediction and its ’ recent developments

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perceptrons branch prediction and its recent developments

Perceptrons Branch Prediction and its’ recent developments

Mostly based on the Dynamic Branch Prediction with Perceptrons

Daniel A. Jim´enez Calvin Lin

By Shugen Li


Introduction

Introduction

  • As the new technology development on the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism.

  • Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.


Introduction cont

Introduction (cont’)

  • Figure 1. A conceptual system model for branch prediction

    Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,


Introduction cont1

Introduction (cont’)

  • we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities

  • Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties


Why perceptrons

Why perceptrons ?

  • The major benefit of perceptrons is that by examining theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make.

  • many neural networks is difficult or impossible to determine exactly how the neural network is making its decision.

  • perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.


Perceptrons model

Perceptrons Model

  • Input Xi as the bits of the global branch history shift register

  • Weight W0-n is the Weights vector

  • Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken


Perceptrons training

Perceptrons training

  • Let branch outcome t be -1 if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done.

    These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.


Perceptrons limitation

Perceptrons limitation

  • Only capable of learning linearly separable functions

  • It means a perceptron can learn the logical AND of two inputs, but not the exclusive-OR


Predictor block diagram

Predictor block Diagram


Experimental result

Experimental result

  • Use Spec2000 interger benchmark and compare with gshare and bi-mode.

  • Also compare with a hybrid gshare/perceptron predictor.

  • Its ability to make use of longer history lengths.

  • Done well when the branch being predicted exhibits linearly separable behavior.


Much longer history lengths than traditional two level schemes

much longer history lengths than traditionaltwo-level schemes


Performance

Performance


Implementation

Implementation

  • Computing the Perceptron Output.

    • not needed to compute the dot product.

    • Instead, simply add when the input bit is 1 and subtract (add the two’s complement) when the input bit is -1.

    • similar to that performed by multiplication circuits, which must find the sum of partial products that are each a function of an integer and a single bit.

  • Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.


Implementation cont

Implementation (cont’)

  • Training


Litimations

Litimations

  • Delay-huge latency even if simplified method

  • Low performance on the non linearly separable

  • Aliasing and Hardware


Recent development 1 low power perceptrons selective weight by kaveh aasaraai amirali baniasadi

Recent development (1)Low-power Perceptrons (selective weight) by Kaveh Aasaraai, Amirali Baniasadi

  • Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM.

  • Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM.

  • Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.


Recent development 2 the combined perceptron branch predictor by matteo monchiero gianluca palermo

Recent development (2)The Combined Perceptron Branch PredictorBy Matteo Monchiero Gianluca Palermo

  • The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.


Recent development 3 path based neural prediction by daniel a jimennez

Recent development (3)Path-based neural predictionBy Daniel A.Jimennez

  • On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel.

  • A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums.

  • The delay is the perceptron table read delay followed by a single multiply-add delay.

  • No consider the table read delay. Also the misprediction penalty.


Recent development 4 revisiting the perceptron predictor by a seznec

Recent development (4)Revisiting the perceptron predictorBy A. Seznec

  • the accuracy of perceptron predictors is further improved with the following extensions:

    • using pseudo-tag to reduce aliasing impact

    • skewing perceptron weight tables to improve table utilization,

    • introducing redundant history to handle linearly inseparable data sets.

    • The nonlinear redundant history also leads to a more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights

    • Increasing hardware complexity.


Recent development 5 the o geometric history length branch predictor by a seznec

Recent development (5)the O-GEometric History Length branch predictorBy A. Seznec

  • The GEHL predictor features M distinct predictor tables Ti

  • The predictor tables store predictions as signed saturated counters.

  • A single counter C(i) is read on each predictor table Ti.(1< i < M)

  • The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation.

  • The prediction is taken when S is positive or nul and not-taken when S is negative.


Recent development 5 cont the o geometric history length branch predictor by a seznec

Recent development(5) Cont’the O-GEometric History Length branch predictorBy A. Seznec

  • The history lengths used the second equation for computing the indexing functions for tables Ti

  • The element on all T(i) table is easy to train, similar like in the perceptrons predictor for

  • Low hardware cost and better latency.


Conclusion

Conclusion

  • Perceptrons is attractive as using long history lengths without requiring exponential resources.

  • It’s weakness is the increased computational complexity and following latency and hardware cost.

  • As the new idea, it can be combined with the tranditional methods to obtain better performance.

  • There are several methods being developed to reduce the latency and handle the mis-prediction.

  • Finally this technology will be more practical as the hardware cost go down quickly.

  • There should be more space for the further development.


Reference

Reference

  • [1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001.

  • [2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002.

  • [3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004.

  • [4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003.

  • [5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004

  • [6] K. Aasaraai and A. Baniasadi Low-power Perceptrons

  • [7] A. Seznec. TheO-GEometric History Length branch predictor

  • [8] M. Monchiero and G. Palermo The Combined Perceptron Branch Predictor[9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.


Thank you

Thank You!

Question?


  • Login