1 / 15

Optimized Hybrid Scaled Neural Analog Predictor

Optimized Hybrid Scaled Neural Analog Predictor. Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio. Branch Prediction with Perceptrons. Branch Prediction with Perceptrons cont. SNP/SNAP [St. Amant et al. 2008].

thina
Download Presentation

Optimized Hybrid Scaled Neural Analog Predictor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio

  2. Branch Prediction with Perceptrons

  3. Branch Prediction with Perceptrons cont.

  4. SNP/SNAP [St. Amant et al. 2008] • A version of piecewise linear neural prediction [Jiménez 2005] • Based on perceptron prediction • SNAP is a mixed digital/analog version of SNP • Uses analog circuit for costly dot-product operation • Enables interesting tricks e.g. scaling

  5. Weight Scaling • Scaling weights by coefficients Different history positions have different importance!

  6. The Algorithm: Parameters and Variables • C – array of scaling coefficients • h – the global history length • H – a global history shift register • A – a global array of previous branch addresses • W – an n× (GHL + 1) array of small integers • θ – a threshold to decide when to train

  7. The Algorithm: Making a Prediction Weights are selected based on the current branch and the ith most recent branch

  8. The Algorithm: Training • If the prediction is wrong or |output| ≤ θ then • For the ithcorrelating weight used to predict this branch: • Increment it if the branch outcome = outcome of ithin history • Decrement it otherwise • Increment the bias weight if branch is taken • Decrement otherwise

  9. SNP/SNAP Datapath

  10. Tricks • Use alloyed [Skadron 2000] global and per-branch history • Separate table of local perceptrons • Output from this stage multiplied by empircally determined coefficient • Training coefficients vector(s) • Multiple vectors initialized to f(i) = 1 / (A + B × i) • Minimum coefficient value determined empircally • Indexed by branch PC • Each vector trained with perceptron-like learning on-line

  11. Tricks(2) • Branch cache • Highly associative cache with entries for branch information • Each entry contains: • A partial tag for this branch PC • The bias weight for this branch • An “ever taken” bit • A “never taken” bit • The “ever/never” bits avoid needless use of weight resources • The bias weight is protected from destructive interference • LRU replacement • >99% hit rate

  12. Tricks(3) • Hybrid predictor • When perceptron output is below some threshold: • If a 2-bit counter gshare predictor has high confidence, use it • Else use a 1-bit counter PAs predictor • Multiple θs indexed by branch PC • Each trained adaptively[Seznec 2005] • Ragged array • Not all rows of the matrix are the same size

  13. Benefit of Tricks • Graph shows effect of one trick in isolation • Training coefficients yields most benefit

  14. References • Jiménez & Lin, HPCA 2001 (perceptron predictor) • Jiménez & Lin, TOCS 2002 (global/local perceptron) • Jiménez ISCA 2005 (piecewise linear branch predictor) • Skadron, Martonosi & Clark, PACT 2000 (alloyed history) • Seznec 2005 (adaptively trained threshold) • St. Amant, Jiménez & Burger, MICRO 2008 (SNP/SNAP) • McFarling 1993, gshare • Yeh & Patt 1991, PAs

  15. The End

More Related