1 / 40

Perceptron-based Global Confidence Estimation for Value Prediction

Perceptron-based Global Confidence Estimation for Value Prediction. Master’s Thesis Michael Black June 26, 2003. Thesis Objectives. To present a viable global confidence estimator using perceptrons To quantify predictability relationships between instructions

Download Presentation

Perceptron-based Global Confidence Estimation for Value Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

  2. Thesis Objectives • To present a viable global confidence estimator using perceptrons • To quantify predictability relationships between instructions • To study the performance of the global confidence estimator when used with common value prediction methods

  3. Presentation Outline • Background: • Data Value Prediction • Confidence Estimation • Predictability Relationships • Perceptrons • Perceptron-based Confidence Estimator • Experimental Results and Conclusions

  4. Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C) . . . I 1: 6 (A) = 4 (B) + 2 (C) . . . I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8

  5. Data Value Prediction • A data value predictor predicts A from instruction 1’s past outcomes • Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1

  6. Types of Value Predictors • Computational: Performs a mathematical operation on past values • Last-Value: 5, 5, 5, 5 5 • Stride: 1, 3, 5, 7 9 • Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 3 6

  7. Types of Value History • Local History: Predicts using data from past instances of instructions • Global History: Predicts using data from other instructions Local value prediction is more conventional

  8. Are mispredictions a problem? • If a prediction is incorrect, speculatively executed instructions must be re-executed • This can result in: • Cycle penalties for detecting the misprediction • Cycle penalties for restarting dependent instructions • Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict

  9. Confidence Estimator • Decides whether to make a prediction for an instruction • Bases decisions on the accuracy of past predictions • Common confidence estimation method: Saturating Up-Down Counter

  10. Up-Down Counter Start Threshold Correct Correct Correct Correct Don’t Predict Don’t Predict Don’t Predict Predict Incorrect Incorrect Incorrect Incorrect

  11. Local vs. Global • Up-Down counter is local • Only past instances of an instruction affect its counter • Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions • Problem with global: • Not every past instruction affects the predictability of the current instruction

  12. Example I 1. A = B + C I 2. F = G – H I 3. E = A + A • Instruction 3 depends on 1 but not on 2 • Instruction 3’s predictability is related to 1 but not 2 • If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly

  13. Is global confidence worthwhile? • Fewer mispredictions than local • If an instruction mispredicts, its dependent instructions know not to predict • Less warm-up time than local • Instructions need not be executed several times before accurate confidence decisions can be made

  14. How common are predictability relationships? Simulation study: • How many instructions in a program predict correctly only when a previous instruction predicts correctly? • Which past instructions have the most influence?

  15. Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!

  16. Predictability Relationships The most recent 10 instructions have the most influence

  17. Global Confidence Estimation A global confidence estimator must: • Identify for each instruction which past instructions have similar predictability • Use their prediction accuracy to decide whether to predict or not predict

  18. Neural Network • Used to iteratively learn unknown functions from examples • Consists of nodes and links • Each link has a numeric weight • Data is fed to input nodes and propagated to output nodes by the links • Desired output used to adjust (“train”) the weights

  19. Perceptron • Perceptrons only have input and output nodes • They are much easier to implement and train than larger neural networks • Can only learn linearly separable functions

  20. Perceptron Computation • Each bit of input data sourced to an input node • Dot product calculated between input data and weights • Output is “1” if dot product exceeds a threshold; otherwise “0”

  21. Perceptron Training • Weights adjusted so that the perceptron output = the desired output for the given input • Error value (ε) = desired value – perceptron output • ε times each input bit added to each weight

  22. Weights • Weights determine the effect of each input on the output • Positive weight: Output varies directly with input bit • Negative weight: Output varies inversely with input bit • Large weight: Input has strong effect on output • Zero weight Input bit has no effect on output

  23. Linear Separability • An input may have a direct influence on the output • An input may instead have an inverse influence on the output • But an input cannot have a direct influence sometimes and an inverse influence at other times

  24. Perceptron Confidence Estimator • Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) • The output is the decision to predict: (1 = predict, 0 = don’t predict) • Weights determine past instruction’s predictability influence on the current instruction: • Positive weight: current instruction mispredicts when past instruction mispredicts • Negative weight: current instruction mispredicts when past instruction predicts correctly • Zero weight: past instruction does not affect current

  25. Perceptron Confidence Estimator Example weights: bias weight = –1 I 1: A = B  C weight = 1 I 2: D = E + F weight = 1 I 3: P = Q  R weight = 0 I 4: G = A + D(current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly

  26. Confidence Estimator Organization

  27. Perceptron Implementation

  28. Weight Value Distribution Simulation Study: • What are typical perceptron weight values? • How does the type of predictor influence the weight distribution? • What minimum range do the weights need to have?

  29. Weight Value Distribution

  30. Simulation Methodology • Measurements simulated using SimpleScalar 2.0a • SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex • Each benchmark is run for 500 million instructions • Value predictors: Stride, Last-Value, Context • Baseline confidence estimator: 2-bit up-down counter

  31. Simulation Metrics PCORRECT: # of correct predictions PINCORRECT: # of incorrect predictions N: # of cases where no prediction was made

  32. Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter

  33. Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter

  34. Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter

  35. Sensitivity to GPH size

  36. Coverage Sensitivity to the Unavailability of Past Instructions

  37. Accuracy Sensitivity to the Unavailability of Past Instructions

  38. Coverage Sensitivity to Weight Range Limitations

  39. Accuracy Sensitivity to Weight Range Limitations

  40. Conclusions • Mispredictions are a problem in data value prediction • Benchmark programs exhibit strong predictability relationships between instructions • Perceptrons enable confidence estimators to exploit these predictability relationships • Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation

More Related