220 likes | 296 Views
This study analyzes the correlation between software state coverage and defects to improve validation techniques. Discover the impact on defect detection rates and the effectiveness of using state coverage metrics alongside traditional code coverage metrics.
E N D
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens
Software Validation Metrics • Software defects after product release are expensive • NIST2002: $60 billion annually • MS Security bulletins: around 40/year at 100k to 1M $ each • Validating software (Testing) • Reduce # defects before release • But not without a cost • Make tradeoff: • Estimate remaining # defects => Software validation metrics
Example: Code coverage • Fraction of statements/basic blocks that are executed by the test suite • Principle: • not executed => no defects discovered • Hypothesis: • not executed => more likely contains defect
Example: Code coverage • High statement coverage • No defects? • Different paths • Structural coverage metrics: • e.g. Path coverage, data flow coverage, … • Measure degree of exploration • Automatic tool assistance • Metrics evaluate tools rather than human effort
Problem statement • Exploration is not sufficient • Tests need to check requirements • Evaluate completeness of test oracle • Impossible to automate: • Guess requirements • Evaluation is critical! • No good metrics available
State coverage • Evaluate strength of assertions • Idea: • State updates must be checked by assertions • Hypothesis: • Unchecked state update => more likely defect
State coverage • Complements code coverage • No replacement • Metrics also assist developers • Code coverage => reachability of statements? • State coverage => invariant established by reachable statements?
State coverage • Metric: • State update • Assignment to fields of objects • Return values, local variables, … also possible • Computation: • Runtime monitor number of state updates read in assertions total number of state updates
Design of experiment • Existing evaluation: • Correlation with mutation adequacy (Koster et al.) • Case study by expert user • Goal: • Directly analyze correlation with ‘real’ defects • Average users
Hypotheses • Hypothesis 1: • When increasing state coverage (without increasing exploration), the number of discovered defects increases • Similar to existing case study • Hypothesis 2: • State coverage and the number of discovered defects are correlated • Much stronger
Structure of experiment • Base program: • Small calendar management system • Result of software design course • Existing test suite • Presence of software defects unknown
Structure of experiment • Phase 1: case study • Extend test suite to find defects • First increase code coverage • Then increase state coverage • Dry run of experiment • Simplified application • Injected additional defects
Structure of experiment • Phase 2: Controlled user study • Create new test suite • First increase code coverage • Then increase state coverage • Commit after each detected defect
Threats to validity • Internal validity • Two sessions: no differences observed • Learning effect: subjects were familiar with environment before experiment • External validity • Choice of application • Choice of faults • Subjects are students
Results • Phase 1: case study • No additional defects discovered • No confirmation for hypothesis 1 • Potential reasons • Mostly structural faults • Non-structural faults were obvious • Phase 2: Controlled user study • No confirmation for hypothesis 1
Potential causes • Frequency of logical faults • 3/20 incorrect state updates • only 1/14 discovered! • 5/14 are detected by assertions • Focusing on these 5 faults • Higher state coverage (42% wrt 34%) for classes that detect at least one of these 5 • How common are logical faults?
Potential causes • Logical faults too obvious • Subjects discovered them with code coverage • State coverage is not monotonic • Adding new tests may decrease state coverage • Always relative to exploration
Conclusions • Experiment fails to confirm hypothesis • How frequent are logical faults? • Combine state coverage with code coverage? • Or compare test suites with similar code coverage • But also: • Simple • Efficient