1 / 21

Logistic Regression and Perceptron Prediction of Instruction Branches

Logistic Regression and Perceptron Prediction of Instruction Branches. Joshua Ferguson. Overview. Motivation Branch Prediction background Machine Learning background Methodology Results. Motivation.

rusty
Download Presentation

Logistic Regression and Perceptron Prediction of Instruction Branches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression and Perceptron Prediction of Instruction Branches Joshua Ferguson

  2. Overview Motivation Branch Prediction background Machine Learning background Methodology Results

  3. Motivation CPUs account for around 30% of server power usage while idle, and that percentage scales up with utilization* Instruction Branch Misprediction causes unnecessary instruction execution on the CPU A simple experiment on an Intel M 1.6 GHz CPU found approximately 8% of branched instruction were mispredicted, even while idle. *Luiz André Barroso and Urs Hölzle - The Case for Energy-Proportional Computing, IEEE 2007

  4. Branch Prediction Workload L3 L2 L2 L1 L1 Results … … … … … … Registers

  5. Branch Prediction Cont… • If-Then statements throw this off • By default, the CPU will execute whichever branch it predicts will be executed • Common techniques involve a simple buffer of recent memory. • Others use limited pattern matchers T N T N T T T – Branch Taken N – Branch Not-Taken

  6. Machine Learning The CPU is trying to learn patterns, so why not use modern machine learning techniques? Most scale poorly, especially at the constriction of resources that CPUs have. None-the-less, I wanted to try a few out.

  7. Machine Learning cont… Logistic Regression Perceptron

  8. Methodology Generate workload Trace CPU metrics Analyze and Rank ML algorithms

  9. Methodology cont…Generate Workload • Jakart – Java based HTTP request suite. Runs scripts of HTTP requests. • Scripts aren’t very customizable, and would make patterns painfully obvious

  10. Methodology cont…Generate Workload SpecPower – Perfect solution Provides interesting variation in CPU workload

  11. Methodology cont…Trace CPU metrics • Intel – Vtune • Only provide graphs and summary data, no trace for research • Performance Profiling for Machine Learning • Abandoned project, only runs on Pentium 4s • AMD - Code Analyst • Only provides summary data, no trace

  12. MethodologyTrace CPU metrics • Performance API • University of Tennessee Knoxville • Library of calls to Manufacturer Specific Registers that store information like: • # of branch instructions encountered • Branches mis-predicted • L1/L2 cache miss/hit/access

  13. MethodologyTrace CPU metrics • Unfortunately, limited to the resolution of the hardware’s sleep counter. • Hundreds of branches would pass between each measurement. • Capabilities for any specific CPU can vary. Main.c pthread_t BRCN; structthread_argsBRCN_args; *BRCN_args.metric_type = PAPI_BR_CN; pthread_create(&BRCN,NULL,papi_thread,(void *)&BRCN_args); PAPI_thread.c PAPI_read_counters();

  14. MethodologyTrace CPU Metrics Journal of Instruction-Level Parallelism hosts public traces with data values and memory addresses. Traces from Int and FPoperations, as well asWebServer workload

  15. Analysis • Prepare data • Bitshifted instruction addresses, so only high-level info remains • Unsigned int • Whether each instruction is a branch, call, or return • Booleans • If it branches, the bitshifted target address. • Boolean and unsigned int

  16. Analysis cont… • Train each algorithm on subset of data, and then test for error rate on main data file • Logistic Regression must train offline. • Trained on 10,000 samples. Tested on 40,000. • Perceptron can train online • Keeps running buffer of passed 100 values • Requires buffer size of (4*Boolean + 2*uint16)*100 • 3.6k

  17. Analysis cont…Baselines • Running history buffer • Choose statistically likely outcome • If 25%, 50% or 75% history take branch, then branch • Previous outcome • If took last branch, then take, otherwise pass.

  18. ResultsBaseline Floating Point Workload Integer Point Workload Error % Buffer History Length T N T N T T

  19. ResultsLogistic Regression Integer Workload Trace Floating Point Workload Trace Error % Epsilon Value (Higher means more accurate match with training data)

  20. ResultsPerceptron Flat 33.9% error rate using inventor’s algorithm (Rosenblatt) A disappointed result, especially for an online algorithm. No capability to really change how accurately it fits the training data, thus causing the model to lose generality.

  21. Final Thoughts Obtaining solid CPU traces is commonly done in literature using AIX, an IBM proprietary OS. For research in this area, this OS seems a necessity. Implementing logistic regression in a low enough language to execute effectively is a challenge. SPECPower can be combined with PAPI to test higher level workload learners, possibly existing at the OS level and controlling ACPI states, rather than just branch prediction in the register. Thanks!

More Related