1 / 20

Benchmarking Effectiveness

Benchmarking Effectiveness. for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson. Overview. Measuring testing? The Behavioural Response Measuring six test cases Evaluation of JUnit tests

jarvis
Download Presentation

Benchmarking Effectiveness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BenchmarkingEffectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson

  2. Overview • Measuring testing? • The Behavioural Response • Measuring six test cases • Evaluation of JUnit tests • Evaluation of JWalk tests http://www.dcs.shef.ac.uk/~ajhs/jwalk/

  3. Analogy: Metrics and Testing • Things easy to measure (but why?) • metrics: MIT O-O metrics (Chidamber & Kemmerer) • testing: decision-, path-, whatever-coverage • testing: count exceptions, reduce test-set size • Properties you really want (but how?) • metrics: Goal, Question, Metric (Basili et al.) • testing: e.g. mutant killing index • testing: effectiveness and efficiency?

  4. Measuring Testing? Most approaches measure testing effort, rather than test effectiveness!

  5. Degrees of Correctness • Suppose an ideal test set • BR : behavioural response (set) • T : tests to be evaluated (bag – duplicates?) • TE = BR  T : effective tests (set) • TR = T – TE : redundant tests (bag) • Define test metrics • Ef(T) = (|TE | – |TR |) / |BR| : effectiveness • Ad(T) = |TE | / |BR| : adequacy

  6. Ideal Test Set? The ideal test set must verify each distinct response of an object!

  7. What is a Response? • Input response • Account.withdraw(int amount) : 3 partitions • amount < 0  fail precondition, exception • amount > balance refuse, no change • amount <= balance  succeed, debit • State response • Stack.pop() : 2 states • isEmpty()  fail precondition, exception • ! isEmpty()  succeed

  8. Behavioural Response – 1 • Input response • c.f. exemplars of equivalence partitions • max responses per method, over all states • State response • c.f. state cover, to reach all states • max state-contingent responses, over all methods • Behavioural Response • product of input and state response • checks all argument partitions in all states • c.f. transition cover augmented by exemplars

  9. Behavioural Response – 2 • Parametric form: BR(x, y) • stronger ideal sets, for higher x, y x = length of sequences from each state y = number of exemplars for each partition • Redundant states • higher x rules out faults hiding in duplicated states • Boundary values • higher y verifies equivalence partition boundaries • Useful measure • precise quantification of what has been tested • repeatable guarantees of quality after testing

  10. Compare Testing Methods JWalk – “Lazy systematic unit testing method” JUnit – “Expert manual unit testing method”

  11. JUnit – Beck, Gamma • “Automates testing” • manual test authoring (as good as human expertise) • may focus on positive, miss negative test cases • saved tests automatically re-executed on demand • regression style may mask hard interleaved cases • Test harness • bias: test method “testX” for each method “X” • each “testX” contains n assertions = n test cases • same assertions appear redundantly in “testY”, “testZ”

  12. JWalk – Simons • Lazy specification • static analysis of compiled code • dynamic analysis of state model • adapts to change, revises the state model • Systematic testing • bounded exhaustive state-based exploration • may not generate exemplars for all input partitions • semi-automatic oracle construction (confirm key values) • learns test equivalence classes (predictive testing) • adapts existing oracles, superclass oracles

  13. Six Test Cases • Stack1 – simple linked stack • Stack2 – bounded array stack • change of implementation • Book1 – simple loanable book • Book2 – also with reservations • extension by inheritance • Account1 – with deposit/withdraw • Account2 – with preconditions • refinement of specification

  14. Instructions to Testers Test each response for each class, similar to the transition cover, but with all equivalence partitions for method inputs

  15. Behavioural Response ideal test target

  16. JUnit – Expert Testing still not effective massive generation

  17. JWalk – Test Generation missed 5 inputs no wasted tests

  18. Comparisons • JUnit: expert manual testing • massive over-generation of tests (w.r.t. goal) • sometimes adequate, but not effective • stronger (t2, t3); duplicated; and missed tests • hopelessly inefficient – also debugging test suites! • JWalk: lazy systematic testing • near-ideal coverage, adequate and effective • a few input partitions missed (simple generation strategy) • very efficient use of the tester’s time – sec. not min. • or: two orders (x 1000) more tests, for same effort

  19. Conclusion • Behavioural Response • seems like a useful benchmark (scalable, flexible) • use with formal, semi-formal, informal design methods • measures effectiveness, rather than effort • Moral for testing • don’t hype up automatic test (re-)execution • need systematic test generation tools • automate the parts that humans get wrong!

  20. Any Questions? Put me to the test! http://www.dcs.shef.ac.uk/~ajhs/jwalk/

More Related