Test vs. inspection Part 2

Test vs. inspectionPart 2 Tor Stålhane

Testing and inspectionA short data analysis

Test and inspections – some terms First we need to understand two important terms – defect types and triggers. After this we will look at inspection data and test data from three activity types, organized according to type of defect and trigger. We need the defect categories to compare test and inspections – where is what best?

Defect categories This presentation uses eight defect categories: • Wrong or missing assignment • Wrong or missing data validation • Error in algorithm – no design change is necessary • Wrong timing or sequencing • Interface problems • Functional error – design change is needed • Build, package or merge problem • Documentation problem

Triggers We will use different triggers for test and inspections. In addition – white box and black box tests will use different triggers. We will get back to triggers and black box / white box testing later in the course.

Inspection triggers • Design conformance • Understanding details • Operation and semantics • Side effects • Concurrency • Backward compatibility – earlier versions of this system • Lateral compatibility – other, similar systems • Rare situations • Document consistency and completeness • Language dependencies

Test triggers – black box • Test coverage • Sequencing – two code chunks in sequence • Interaction – two code chunks in parallel • Data variation – variations over a simple test case • Side effects – unanticipated effects of a simple test case

Test triggers – white box • Simple path coverage • Combinational path coverage – same path covered several times but with different inputs • Side effect - unanticipated effects of a simple path coverage

Testing and inspection – the V model

Inspection data We will look at inspection data from three development activities: • High level design: architectural design • Low level design: design of subsystems, components – modules – and data models • Implementation: realization, writing code This is the left hand side of the V-model

Test data We will look at test data from three development activities: • Unit testing: testing a small unit like a method or a class • Function verification testing: functional testing of a component, a system or a subsystem • System verification testing: testing the total system, including hardware and users. This is the right hand side of the V-model

What did we find The next tables will, for each of the assigned development activities, show the following information: • Development activity • The three most efficient triggers First for inspection and then for testing

Inspection – defect types

Inspection – triggers

Testing – triggers and defects

Some observations – 1 • Pareto’s rule will apply in most cases – both for defect types and triggers • Defects related to documentation and functions taken together are the most commonly found defect types in inspection • HLD: 69.81% • LLD: 41.44% • Code: 33.34%

Some observations – 2 • The only defect type that is among the top three both for testing and inspection is “Interface” • Inspection - HLD: 14.12% • Testing: 39.13% • The only trigger that is among the top three both for testing and inspection is “Side effects” • Inspection – LLD: 29.73 • Testing: 11.07

Summary Testing and inspection are different activities. By and large, they • Need different triggers • Use different mind sets • Find different types of defects Thus, we need both activities in order to get a high quality product

Inspection as a social process

Inspection as a social process Inspections is a people-intensive process. Thus, we cannot consider only technical details – we also need to consider how people • Interact • Cooperate

Data sources We will base our discuss on data from two experiments: • UNSW – three experiments with 200 students. Focus was on process gain versus process loss. • NTNU – two experiments • NTNU 1 with 20 students. Group size and the use of checklists. • NTNU 2 with 40 students. Detection probabilities for different defect types.

The UNSW data The programs inspected were • 150 lines long with 19 seeded defects • 350 lines long with seeded 38 defects • Each student inspected the code individually and turned in an inspection report. • The students were randomly assigned to one out of 40 groups – three persons per group. • Each group inspected the code together and turned in a group inspection report.

Gain and loss - 1 In order to discuss process gain and process loss, we need two terms: • Nominal group (NG) – a group of persons that will later participate in a real group but are currently working alone. • Real group (RG) – a group of people in direct communication, working together.

Gain and loss -2 The next diagram show the distribution of the difference NG – RG. Note that the • Process loss can be as large as 7 defects • Process gain can be as large as 5 defects Thus, there are large opportunities and large dangers.

Gain and loss - 3

Gain and loss - 4 If we pool the data from all experiments, we find that the probability for: • Process loss is 53 % • Process gain is 30 % Thus, if we must choose, it is better to drop the group part of the inspection process.

Reporting probability - 1

Reporting probability - 2 It is a 10% probability of reporting a defect even if nobody found it during their preparations. It is a 80 % to 95% probability of reporting a defect that is found by everybody in the nominal group during preparations.

Reporting probability - 3 The table and diagram opens up for two possible interpretations: • We have a, possibly silent, voting process. The majority decides what is reported from the group and what is not. • The defect reporting process is controlled by group pressure. If nobody else have found it, it is hard for a single person to get it included in the final report.

A closer look - 1 The next diagram shows that when we have • Process loss, we find few new defects during the meeting but remove many • Process gain, we find, many new defects during the meeting but remove just a few • Process stability, we find and remove roughly the same amount during the meeting.

New, retained and removed defects

A closer look - 2 It seems that groups can be split according to the following characteristics • Process gain • All individual contributions are accepted. • Find many new defects. • Process loss • Minority contributions are ignored • Find few new defects.

A closer look - 3 A group with process looses is double negative. It rejects minority opinions and thus most defects found by just a few of the participants during: • Individual preparation. • The group meeting. The participants can be good at finding defects – the problem is the group process.

The NTNU-1 data We had 20 students in the experiment. The program to inspect was130 lines long. We seeded 13 defects in the program. • We used groups of two, three and five students. • Half the groups used a tailored checklist. • Each group inspected the code and turned in an inspection report.

Group size and check lists - 1 We studied two effects: • The size of the inspection team. Small groups (2 persons) versus large groups (5 persons) • The use of checklists or not In addition we considered the combined effect – the factor interaction.

DoE-table

Group size and check lists - 2 Simple arithmetic gives us the following results: • Group size effect – small vs. large - is 4. • Check list effect – use vs. no use – is 0. • Interaction – large groups with check lists vs. small group without – is -2. Standard deviation is 1.7. Two standard deviations – 5% confidence – rules out everything but group size.

The NTNU-2 data We had 40 students in the experiment. The program to inspect was130 lines long. We seeded 12 defects in the program. • We had 20 PhD students and 20 third year software engineering students. • Each student inspected the code individually and turned in an inspection report.

Defect types The 12 seeded defects were of one of the following types: • Wrong code – e.g. wrong parameter • Extra code - e.g. unused variable • Missing code – e.g. no exception handling There was four defects of each type.

How often is each defect found

Who finds what – and why First and foremost we need to clarify what we mean by high and low experience. • High experience – PhD students. • Low experience - third and fourth year students in software engineering. High experience, in our case, turned out to mean less recent hands-on development experience.

Hands-on experience The plot shows us that: • People with recent hands-on experience are better at finding missing code • People with more engineering education are better at finding extra – unnecessary – code. • Experience does not matter when finding wrong code statements.

Test vs. inspection Part 2