1 / 22

State coverage: an empirical analysis based on a user study

State coverage: an empirical analysis based on a user study. Dries Vanoverberghe , Emma Eyckmans , and Frank Piessens. Software Validation Metrics. Software defects after product release are expensive NIST2002: $60 billion annually MS Security bulletins: around 40/year at 100k to 1M $ each

ugo
Download Presentation

State coverage: an empirical analysis based on a user study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens

  2. Software Validation Metrics • Software defects after product release are expensive • NIST2002: $60 billion annually • MS Security bulletins: around 40/year at 100k to 1M $ each • Validating software (Testing) • Reduce # defects before release • But not without a cost • Make tradeoff: • Estimate remaining # defects => Software validation metrics

  3. Example: Code coverage • Fraction of statements/basic blocks that are executed by the test suite • Principle: • not executed => no defects discovered • Hypothesis: • not executed => more likely contains defect

  4. Example: Code coverage • High statement coverage • No defects? • Different paths • Structural coverage metrics: • e.g. Path coverage, data flow coverage, … • Measure degree of exploration • Automatic tool assistance • Metrics evaluate tools rather than human effort

  5. Problem statement • Exploration is not sufficient • Tests need to check requirements • Evaluate completeness of test oracle • Impossible to automate: • Guess requirements • Evaluation is critical! • No good metrics available

  6. State coverage • Evaluate strength of assertions • Idea: • State updates must be checked by assertions • Hypothesis: • Unchecked state update => more likely defect

  7. State coverage • Complements code coverage • No replacement • Metrics also assist developers • Code coverage => reachability of statements? • State coverage => invariant established by reachable statements?

  8. State coverage • Metric: • State update • Assignment to fields of objects • Return values, local variables, … also possible • Computation: • Runtime monitor number of state updates read in assertions total number of state updates

  9. Design of experiment • Existing evaluation: • Correlation with mutation adequacy (Koster et al.) • Case study by expert user • Goal: • Directly analyze correlation with ‘real’ defects • Average users

  10. Hypotheses • Hypothesis 1: • When increasing state coverage (without increasing exploration), the number of discovered defects increases • Similar to existing case study • Hypothesis 2: • State coverage and the number of discovered defects are correlated • Much stronger

  11. Structure of experiment • Base program: • Small calendar management system • Result of software design course • Existing test suite • Presence of software defects unknown

  12. Structure of experiment • Phase 1: case study • Extend test suite to find defects • First increase code coverage • Then increase state coverage • Dry run of experiment • Simplified application • Injected additional defects

  13. Structure of experiment • Phase 2: Controlled user study • Create new test suite • First increase code coverage • Then increase state coverage • Commit after each detected defect

  14. Threats to validity • Internal validity • Two sessions: no differences observed • Learning effect: subjects were familiar with environment before experiment • External validity • Choice of application • Choice of faults • Subjects are students

  15. Results • Phase 1: case study • No additional defects discovered • No confirmation for hypothesis 1 • Potential reasons • Mostly structural faults • Non-structural faults were obvious • Phase 2: Controlled user study • No confirmation for hypothesis 1

  16. Potential causes • Frequency of logical faults • 3/20 incorrect state updates • only 1/14 discovered! • 5/14 are detected by assertions • Focusing on these 5 faults • Higher state coverage (42% wrt 34%) for classes that detect at least one of these 5 • How common are logical faults?

  17. Potential causes • Logical faults too obvious • Subjects discovered them with code coverage • State coverage is not monotonic • Adding new tests may decrease state coverage • Always relative to exploration

  18. Conclusions • Experiment fails to confirm hypothesis • How frequent are logical faults? • Combine state coverage with code coverage? • Or compare test suites with similar code coverage • But also: • Simple • Efficient

  19. Questions?

More Related