1 / 29

Introduction to Defect Prediction

Introduction to Defect Prediction. Cmpe 589 Spring 2008. Problem 1. How to tell if the project is on schedule and within budget? Earned-value charts. Problem 2. How hard will it be for another organization to maintain this software? McCabe Complexity. Problem 3.

ron
Download Presentation

Introduction to Defect Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Defect Prediction Cmpe 589 Spring 2008

  2. Problem 1 • How to tell if the project is on schedule and within budget? • Earned-value charts.

  3. Problem 2 • How hard will it be for another organization to maintain this software? • McCabe Complexity

  4. Problem 3 • How to tell when the subsystems are ready to be integrated • Defect Density Metrics.

  5. Problem Definition • Software development lifecycle: • Requirements • Design • Development • Test (Takes ~50% of overall time) • Detect and correct defects before delivering software. • Test strategies: • Expert judgment • Manual code reviews • Oracles/ Predictors as secondary tools

  6. Problem Definition

  7. Testing

  8. Defect Prediction • 2-Class Classification Problem. • Non-defective • If error = 0 • Defective • If error > 0 • 2 things needed: • Raw data: Source code • Software Metrics -> Static Code Attributes

  9. c > 0 c Static Code Attributes • void main() • { • //This is a sample code • //Declare variables • int a, b, c; • // Initialize variables • a=2; • b=5; • //Find the sum and display c if greater than zero • c=sum(a,b); • if c < 0 • printf(“%d\n”, a); • return; • } • int sum(int a, int b) • { • // Returns the sum of two numbers • return a+b; • } LOC: Line of Code LOCC: Line of commented Code V: Number of unique operands&operators CC: Cyclometric Complexity

  10. +

  11. Defect Prediction • Machine Learning based models. • Defect density estimation • Regression models: error pronness • First classification then regression • Defect prediction between versions • Defect prediction for embedded systems

  12. Constructing Predictors • Baseline: Naive Bayes. • Why?: Best reported results so far (Menzies et al., 2007) • Remove assumptions and construct different models. • Independent Attributes ->Multivariate dist. • Attributes of equal importance

  13. Weighted Naive Bayes Naive Bayes Weighted Naive Bayes

  14. Datasets

  15. Performance Measures Accuracy: (A+D)/(A+B+C+D) Pd (Hit Rate): D / (B+D) Pf (False Alarm Rate): C / (A+C)

  16. Results: InfoGain&GainRatio

  17. Results: Weight Assignments

  18. Benefiting from defect data in practice • Within Company vs Cross Company Data • Investigated in cost estimation literature • No studies in defect prediction! • No conclusions in cost estimation… • Straight forward interpretation of results in defect prediction. • Possible reason: well defined features.

  19. How much data do we need? • Consider: • Dataset size:1000 • Defect rate: 8% • Training instances: %90 • 1000*8%*90%=72 defective instances • (1000-72) non-defective instances

  20. Intelligent data sampling • With random sampling of 100 instances we can learn as well as thousands. • Can we increase the performance with wiser sampling strategies? • Which data? • Practical aspects: Industrial case study.

  21. WC vs CC Data? • When to use WC or CC? • How much data do we need to construct a model? ICSOFT’07

  22. ICSOFT’07

  23. Module Structure vs Defect Rate • Fan-in, fan-out • Page Rank Algorithm • Call graph information on the code • “small is beautiful”

  24. Performance vs. Granularity

More Related