1 / 38

Some Thoughts on Metrics Test Management Forum Paul Gerrard

Some Thoughts on Metrics Test Management Forum Paul Gerrard. Systeme Evolutif Limited 3 rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk. I’m a Metrics Sceptic. A great book on metrics.

gretel
Download Presentation

Some Thoughts on Metrics Test Management Forum Paul Gerrard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Thoughts on Metrics Test Management Forum Paul Gerrard Systeme Evolutif Limited 3rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk

  2. I’m a Metrics Sceptic

  3. A great book on metrics • “The Tyranny of Numbers – why counting can’t make us happy” by David Boyle • Nothing to do with software • More to with government statistics • Written in the same spirit as “How to Lie with Statistics” – another counting classic • I’ve appeared to be fairly negative about metrics in the past • Not true – its blind faith in metrics I don’t like!

  4. “What matters, we cannot count, so let’s count what does not matter” • A lovely quote from an economist called Chambers (having a dig at economists) • I’ve changed it to reflect a tester’s mantra: • “Testers have come to feelWhat can’t be measured, isn’t realThe truth is always an amountCount defects, only defects count.”

  5. Some problems with metrics • Too often, numbers, and counting, being incontrovertible, are regarded as ‘absolute truth’ • Numbers are incontrovertible, but the object being counted may be subjective • The person who collects the raw data for counting isn’t usually independent (and has an ‘agenda’) • There is a huge amount of filtering going on: • by individuals • but also, the processes we use, by definition, are selective.

  6. When I were a lad (development team leader)… • I collected all sorts of metrics to do with code • To help manage my team and their activities, I counted lines of code, code delivered, module size, module rates of change, fault location and type, fan-in, fan-out, and other statically detectible measures as well as costs allocated to specific tasks, lifted from a simple time recording system (that I wrote specifically to see where time went).

  7. When I were a lad (development team leader)…2 • I used them to justify: • buying tools, changing my team’s behaviour and attitude to standards and other development practices as well as justifying my team’s existence through their productivity • (Other teams didn’t have metrics, so we were, by definition, infinitely more productive) • Metrics are extremely useful as a political tool • Metrics (statistics) are probably the most useful tool of politics. Ask any politician! • I knew I was collecting ‘good stuff’ but some of it was to be taken more seriously than others.

  8. Counting Defects is Misleading

  9. My biggest objection… • Counting defects is misleading, by definition • A gruesome analogy (body count): • to measure progress in a military campaign • not a good measure of how successful your campaign has been • A body count gives you the following information: • opponent’s forces have been diminished by a certain number • But what was it to start with? • How are the enemy recruiting more participants? • No one knows • The count represents the number of participants who are no longer in the campaign. So by definition, they don’t count anymore.

  10. Body count • The body count could be used to measure our efficiency of killing • But, is killing efficiency a good way to measure progress in a campaign intended to capture territory, enemy assets, hearts and minds? • Hardly - dead people are a consequence, a tragic side issue, not the objective itself.

  11. Defect/bug count • Defect count gives us the following information: • the number of defects have been diminished by a certain number • but what was it to start with? • How are the developers (the enemy? Hehe) injecting more defects? • No one knows – they certainly don’t • Predictive metrics are unreliable because of software languages, people, knock-on effects, coupling etc. etc.

  12. Defect/bug count 2 • Defect count of defects removed from the system • By definition…they don’t count anymore • Bugs left in don’t count because they are trivial • The count can be used to measure test “efficiency” • But is “defect detection efficiency” a good way to measure progress in a project intended to deliver functionality, business benefits, cost savings? • NO! Defects are a consequence, a tragic side issue, and an inevitability, not the end itself • Need I go on?

  13. Counting defects sends the wrong message to testers and management • If the only thing we count and take seriously is defects, we are telling testers that the only thing that counts is defects • All they ever do is look for defects • All management think testers do is find defects • But what to managers want?

  14. Managers want… • To know the status of deliverables, what works, what doesn’t • They want… and want it NOW…: • demonstration that software works • confidence that software is usable • Defects are an inevitable consequence of development and testing, but not the prime objective • They are a tactical challenge • Defects are a practitioner issue all the time • But not a serious management issue unless defects block release and a decision is required to unblock the project • Most of the time, test metrics are irrelevant.

  15. Purpose of Testing, Purpose of Early Testing… are Flawed

  16. Myers has a lot to answer for • Myers advanced testing a few years when he defined the purpose of testing in 1978 • But that flawed definition has held us back since 1983! • The defects we count aren’t representative • Typically, system and acceptance test defects are counted • We recommend that all defects are counted • But that’s hardly possible • Even if we tried we couldn’t count them all • The vast majority are corrected as they are created • Finding bugs is a tactical objective, not strategic.

  17. Most defects corrected before they have an impact • When we write a document, code or a test plan, we correct the vast majority of our mistakes instantly • never find their way into the review or test • vast majority of defects not found by “testing” at all • Testing only detects the most obscure faults • But we use the metrics based on the obscure defects to generalise and steer our testing activities • Surely, this isn’t sensible? • Only if we consider defects in all their various manifestations, can we promote general theories of how testing can be improved.

  18. Our approach to testing undermines the data we collect • Textbooks promote the economic view: • finding defects early is better than finding them later • The logic is flawless • But this argument only holds if the “absence of defects” is our key objective • But surely, defects are symptoms, not the underlying problem • Absence of defects is a sign of good work, it’s not the deliverable • How can “absence” of anything be a meaningful deliverable though?

  19. Economic argument for early testing is flawed • It is based on fixing symptoms, not the underlying problem, or improving the end deliverable • The argument for using reviews and inspections has traditionally been ‘defect prevention’ • But this is nonsense • Inspections and reviews find defects like any other test activity • The economic argument is based on rework prevention, not defect prevention • Early defects are simply more expensive to correct if left in products.

  20. Testing is a reactive activity, not proactive • Testing cannot prevent defects – it is reactive, never proactive • This is why we still have to convince management that testing is important • Testing actually corrupts the defect data we collect • If we structure our testing to detect defects before system and acceptance testing, design defects found in system test are A BAD THING • bad because of the way we approach dev and test • bad because we need to re-document, redesign, re-test at unit, integration levels and so on • Self-fulfilling prophesy • Late testing makes the other guys look bad.

  21. Compare that with RAD, DSDM or Agile methods • Little testing done by developers • some might do test-first programming or good unit testing • but most don’t • Because there is shallow documentation • There is little developer testing • instant response to system/user testing incidents • cost of finding ‘serious defects’ in system/acceptance testing is remarkably low.

  22. Economic argument of early testing is smashed • The whole basis for a structured testing discipline is undermined • Traditional metrics don’t support Agile approach, so we say they are undisciplined, unprofessional and incompetent • (It’s hard to sell ISEB courses to these guys!) • Surely we are measuring the wrong things? • The data we collect is corrupted by the processes we follow!

  23. Where to Now with Test Metrics?

  24. Where now with test metrics? • Move away from defects as the principle object of measurement • Move towards ‘information assets’ as the testers deliverable • Defects are a part of that information asset • Defect analysis: • A development task, not a tester’s • Only programmers know how to analyse the cause of defects and see trends • Defect analyses help developers improve, not testers (a white box metric)

  25. Where now with test metrics? 2 • Testing metrics should be aligned with business metrics (more black box) • Business results/objectives/goals • Intermediate deliverables/goals • Risk • Looking forward to software use, not back into software construction • Need to present metrics in more accessible, graphical ways.

  26. A New Way to Classify Incidents?

  27. Suppose you were asked to carry fruit • How many Apples could you carry? • I can carry 100 • How many Oranges? • I can carry 80 • How many Watermelons? • I can carry 7 • Assuming you have carrier bags, could you carry 40 apples, 25 oranges and 4 watermelons? • How would you work it out?

  28. Can I carry the fruit? • If my carrying capacity is C • Weight of an apple is C/100 • Weight of an Orange is C/80 • Weight of a watermelon is C/7 • So total weight of the load is: No, I obviously can’t carry that load 40C 25C 4C 100 80 7 + + = 1.28C

  29. Acceptable load • I don’t know what C is precisely, but that doesn’t matter • If the load factor is greater than one, I can’t carry the load • Let’s ignore C, then, and just worry about the acceptable load factor L L must be less than one

  30. Acceptable load • If L is > 1 let’s try and reduce it • Removing 1 watermelon makes L=1.14 • Removing 2 watermelons makes L=0.998 • I can now carry the reduced load (just) • I have a measure of the load (L) and a threshold of acceptability (less than one) • I know that removing the heavy items will have the biggest improvement.

  31. Suppose you were asked to accept a system? • How many low severity bugs could you afford? • I can accept 100 • How many medium? • I can accept 80 • How many high? • I can accept 7 • Could you accept 40 low, 25 medium and 4 high? • Could you work it out?

  32. Can I afford (accept) the system with bugs? • If my “bug budget” is B • Cost of a LOW is B/100 • Cost of a MEDIUM is B/80 • Cost of a HIGH is B/7 • So total cost of the bugs is: No, I obviously can’t accept those bugs 40B 25B 4B 100 80 7 + + = 1.28B

  33. Acceptable bug cost • I don’t know what B is, but that doesn’t matter • If the total cost of bugs is greater than one, I can’t accept the system • Let’s ignore B, then, and just worry about the bug COST factor: C C must be less than one

  34. Calculating cost of bugs • If C is > 1 let’s try and reduce it • Removing 1 HIGH makes C=1.14 • Removing 2 HIGHs makes C=0.998 • I can now accept the improved system (just) • I have a measure of the cost (C) and a threshold of acceptability (less than one) • I know that removing the HIGH severity bugs will have the biggest improvement.

  35. A useful metric for developers • Now, developers have a numeric score to drive their rework efforts • The can model different change strategies and predict an outcome • They can normalise the cost of correction with the reduction in bug cost.

  36. A useful metric for testers • Bugs get a score that is finer grained that 3 or five level severities • No need to worry about borderline cases as the user can adjust the acceptability factor for bugs • Testers should focus on high COST bugs • But not to the exclusion of lower cost bugs.

  37. Proposal • Why not assign THREE classifications: • Priority • Severity • Bug Cost* • And plot the cost of open bugs over time as well as the number of bugs?

  38. Some Thoughts on Metrics Test Management Forum Paul Gerrard Systeme Evolutif Limited 3rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk

More Related