1 / 39

WuKong : Automatically Detecting and Localizing Bugs that Manifest at Large System Scales

WuKong : Automatically Detecting and Localizing Bugs that Manifest at Large System Scales. Bowen Zhou Jonathan Too Milind Kulkarni Saurabh Bagchi Purdue University. Ever Changing Behavior of Software.

milla
Download Presentation

WuKong : Automatically Detecting and Localizing Bugs that Manifest at Large System Scales

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen Zhou Jonathan Too MilindKulkarniSaurabhBagchiPurdue University

  2. Ever Changing Behavior of Software • Software has to be adaptive to accommodate for different platforms, inputs and configurations. • As a side effect, manifestation of a bug may depend on a particular platform, input or configuration.

  3. Ever Changing Behavior of Software

  4. Software Development Process Develop a new feature and its unit tests Test the new feature on a local machine Not tested in production systems!!! Push the feature into productoin systems Break production systems Roll back the feature

  5. Bugs in Production Run • Properties • Remains unnoticed when the application is tested on developer's workstation • Breaks production system when the application is running on a cluster and/or serving real user requests • Examples • Configuration Error • Integer Overflow

  6. Bugs in Production Run • Properties • Remains unnoticed when the application is tested on developer's workstation • Breaks production system when the application is running on a cluster and/or serving real user requests • Examples • Configuration Error • Integer Overflow Scale-Dependent Bugs

  7. Modeling Program Behavior for Finding Bugs • Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] • Represents program behavior as a set of features that can be measured in runtime • Builds a model to describe and predict the features based on data collected from many runs • Detects abnormal features that deviate from the model's prediction beyond a certain threshold

  8. Modeling Program Behavior for Finding Bugs • Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] • Represents program behavior as a set of features that can be measured in runtime • Builds a model to describe and predict the features based on data collected from many runs • Detects abnormal features that deviate from the model's prediction beyond a certain threshold Does not account for scale-induced variation in program behavior

  9. Modeling Scale-dependent Behavior Training runs Production runs # OF TIMES LOOP EXECUTES Is there a bug in one of the production runs? RUN #

  10. Modeling Scale-dependent Behavior Training runs Production runs # OF TIMES LOOP EXECUTES Accounting for scale makes trends clear, errors at large scales obvious SCALE

  11. Modeling Scale-dependent Behavior • Our Previous Research • Vrisha [HPDC '11] • Builds a collective model for all features of a program to detect bugs at any feature • Abhranta [HotDep '12] • Tweaks Vrisha's model to allow per-feature bug detection and localization

  12. Modeling Scale-dependent Behavior • Our Previous Efforts • Vrisha [HPDC '11] • Builds a collective model for all features of a program to detect bugs at any feature • Abhranta [HotDep '12] • Tweaks Vrisha's model to allow per-feature bug detection and localization They have limitations...

  13. Modeling Scale-dependent Behavior • Big gap in scale • e.g. training runs on up to 128 nodes, production runs on 1024 nodes • Noisy features • Too many false positives render the model useless

  14. Reconstructing Scale-dependent Behavior: the WuKong way • Covers a wide range of program features • Predicts the expected value in a large-scale run for each feature separately • Prunes unpredictable features to improve localization quality • Provides a shortlist of suspicious features in its localization roadmap

  15. The Workflow APP APP APP APP APP SCALE SCALE SCALE SCALE SCALE SCALE FEATURE FEATURE PIN PIN PIN FEATURE PIN FEATURE FEATURE PIN FEATURE RUN 1 RUN 4 RUN 3 RUN 1 RUN N RUN 2 RUN 2 RUN 4 RUN N RUN 3 SCALE FEATURE MODEL ... ... = ? Training Production

  16. Feature Collection

  17. Features considered by WuKong void foo(int a) {  if (a > 0) {  } else {  }  if (a > 100) { inti = 0;    while (i < a) {      if (i % 2 == 0) {      }      ++i;    }  } }

  18. Features considered by WuKong void foo(int a) { 1:if (a > 0) {  } else {  } 2:if (a > 100) { inti = 0; 3:while (i < a) { 4:if (i % 2 == 0) {      }      ++i;    }  } } 1 2 3 4

  19. Modeling

  20. Predict Feature from Scale • X ~ vector of scale parameters X1...XN • Y ~ number of times a particular feature occurs • The model to predict Y from X: • Compute the prediction error:

  21. Predict Feature from Scale • X ~ vector of scale parameters X1...XN • Y ~ number of times a particular feature occurs • The model to predict Y from X: • Compute the prediction error:

  22. Bug Localization

  23. Locate Buggy Features • First, we need to know if the production run is buggy, by doing detection as follows: • If there is a bug in this run, we can start looking at the prediction error of each feature: • Rank all features by their prediction error to provide a localization roadmap that contains the top N features Error of feature i in the production run Constant parameter Max error of feature i in all training runs

  24. Improve Localization Quality by Feature Pruning

  25. Noisy Feature Pruning • Some features cannot be effectively predicted by the above model • Random • Not scale-determined • Discontinuous • The trade-off • Keep those feature would pollute the diagnosis by pushing real faults down the list • Remove these features could miss some faults if the faults happens to be in such features

  26. Noisy Feature Pruning • How to remove them? For each feature: • Do a cross validation with training runs • Remove the feature if it triggers greater-than-100% prediction error in more than (100-x)% of training runs • Parameter x > 0 is for tolerating outliers in training runs

  27. Evaluation • Fault injection in Sequoia AMG2006 • Up to 1024 processes • Randomly selected conditionals to be flipped • Two case studies • Integer overflow in a MPI library • Deadlock in a P2P file sharing application

  28. Evaluation • Fault injection in Sequoia AMG2006 • Up to 1024 processes • Randomly selected conditionals to be flipped • Two case studies • Integer overflow in a MPI library • Deadlock in a P2P file sharing application

  29. Fault Injection Study • Fault • Injected at process 0 • Randomly pick a feature to flip • Data • Training (w/o fault): 110 runs, 8-128 processes • Production (w/ fault): 100 runs, 1024 processes

  30. Fault Injection Study • Result • Total 100 • Noncrashing 57 • Detected 53 • Located 49 Successful Localized: 92.5%

  31. Evaluation • Fault injection in Sequoia AMG2006 • Up to 1024 processes • Randomly selected conditionals to be flipped • Two case studies • Integer overflow in a MPI library • Deadlock in a P2P file sharing application

  32. Evaluation • Fault injection in Sequoia AMG2006 • Up to 1024 processes • Randomly selected conditionals to be flipped • Two case studies • Integer overflow in a MPI library • Deadlock in a P2P file sharing application

  33. Case Study: A Deadlock in Transmission’s DHT Implemenation

  34. Case Study: A Deadlock in Transmission’s DHT Implemenation

  35. Case Study: A Deadlock in Transmission’s DHT Implemenation Feature 53, 66

  36. Conclusion • Debugging scale-dependent program behavior is a difficult and important problem • WuKong incorporates scale of run into a predictive model for each individual program feature for accurate bug diagnosis • We demonstrated the effectiveness of WuKong through a large-scale fault injection study and two case studies of real bugs

  37. Q&A • bzhou@purdue.edu

  38. Backup

  39. Runtime Overhead Geometric Mean: 11.4%

More Related