- 40 Views
- Uploaded on
- Presentation posted in: General

Perceptrons Branch Prediction and its ’ recent developments

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Perceptrons Branch Prediction and its’ recent developments

Mostly based on the Dynamic Branch Prediction with Perceptrons

Daniel A. Jim´enez Calvin Lin

By Shugen Li

- As the new technology development on the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism.
- Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.

- Figure 1. A conceptual system model for branch prediction
Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,

- we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities
- Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties

- The major benefit of perceptrons is that by examining theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make.
- many neural networks is difficult or impossible to determine exactly how the neural network is making its decision.
- perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.

- Input Xi as the bits of the global branch history shift register
- Weight W0-n is the Weights vector
- Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken

- Let branch outcome t be -1 if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done.
These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.

- Only capable of learning linearly separable functions
- It means a perceptron can learn the logical AND of two inputs, but not the exclusive-OR

- Use Spec2000 interger benchmark and compare with gshare and bi-mode.
- Also compare with a hybrid gshare/perceptron predictor.
- Its ability to make use of longer history lengths.
- Done well when the branch being predicted exhibits linearly separable behavior.

- Computing the Perceptron Output.
- not needed to compute the dot product.
- Instead, simply add when the input bit is 1 and subtract (add the two’s complement) when the input bit is -1.
- similar to that performed by multiplication circuits, which must find the sum of partial products that are each a function of an integer and a single bit.

- Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.

- Training

- Delay-huge latency even if simplified method
- Low performance on the non linearly separable
- Aliasing and Hardware

- Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM.
- Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM.
- Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.

- The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.

- On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel.
- A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums.
- The delay is the perceptron table read delay followed by a single multiply-add delay.
- No consider the table read delay. Also the misprediction penalty.

- the accuracy of perceptron predictors is further improved with the following extensions:
- using pseudo-tag to reduce aliasing impact
- skewing perceptron weight tables to improve table utilization,
- introducing redundant history to handle linearly inseparable data sets.
- The nonlinear redundant history also leads to a more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights
- Increasing hardware complexity.

- The GEHL predictor features M distinct predictor tables Ti
- The predictor tables store predictions as signed saturated counters.
- A single counter C(i) is read on each predictor table Ti.(1< i < M)
- The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation.
- The prediction is taken when S is positive or nul and not-taken when S is negative.

- The history lengths used the second equation for computing the indexing functions for tables Ti
- The element on all T(i) table is easy to train, similar like in the perceptrons predictor for
- Low hardware cost and better latency.

- Perceptrons is attractive as using long history lengths without requiring exponential resources.
- It’s weakness is the increased computational complexity and following latency and hardware cost.
- As the new idea, it can be combined with the tranditional methods to obtain better performance.
- There are several methods being developed to reduce the latency and handle the mis-prediction.
- Finally this technology will be more practical as the hardware cost go down quickly.
- There should be more space for the further development.

- [1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001.
- [2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002.
- [3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004.
- [4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003.
- [5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004
- [6] K. Aasaraai and A. Baniasadi Low-power Perceptrons
- [7] A. Seznec. TheO-GEometric History Length branch predictor
- [8] M. Monchiero and G. Palermo The Combined Perceptron Branch Predictor[9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.

Thank You!

Question?