SMU CSE 8314 / NTU SE 762-N Software Measurement and Quality Engineering

SMU CSE 8314 / NTU SE 762-NSoftware Measurement and Quality Engineering Module 22 Principles of Measurement Part 1

Contents • Introduction • Some Principles of Measurement Theory • Issues with Measuring Software

Plan Risk Management Execute Engineer Quality Measure Measurement in the Larger Picture Measurements are used to Monitor Monitoring is used to Manage Risk

Introduction

Measurements are Powerful • Because of the power of measurements they can tell you a lot • But it is easy to misuse them • Proper use of measurement requires understanding of some basic rules and principles

Analysis Data Information Why Measure? • Every measurement should have a purpose • You want to get information

But for Every Analysis there are Two Possible Results Information - tells you something right • We are (or are not) on schedule • Our risks are (or are not) under control Misinformation - tells you something wrong • We are (or are not) on schedule • Our risks are (or are not) under control You need to make sure you know what the data are really telling you

Another Issue with Measurement Whenever you measure an organization, there will be changes in the organization. • It takes resources to measure, which causes things to be slightly less efficient • People change when they are measured so that they will look more favorable If you’re not careful, you may not measure the true situation

Key Issues • Define how to interpret measurements • To form a basis of consistent analysis • Choose consistent display or graphing techniques • So people know how to interpret the data We will address these throughout the next several modules, at several levels of detail

Some Principles ofMeasurement Theory(how to interpret measurements correctly)

Importance of Measurement “When you cannot measure, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of a science.” Lord Kelvin, 1800’s

Why Study Measurement Theory • The theory of measurement offers us principles that we should be careful not to violate • We will consider several of them here. • First we will consider some observations about software engineering and software measurement by (somewhat) neutral observers

“Software Engineering is Still an Aspiration ... ... because Computer Science is not yet a science” Ruth Ravenel, U. of Colorado, Dept of Electrical and Computer Engineering, 1995

“Lemmingengineering” “The process of engineering systems by blindly following techniques the masses are following, without regard to the appropriateness of those techniques.” Alan Davis, IEEE Software, 9/93.

Failure to Use Measurement Theory “Despite the fact that the basis for software metrics lies in measurement theory, it has been largely ignored by both practitioners and researchers. The result is that much work in software metrics is theoretically flawed.” Norman Fenton, IEEE Transactions on Software Engineering, 3/94

Metrology • Metrology is the study of measurement • Recently, software measurement experts have turned to metrology to bring order to software metrics • One of their first conclusions was that the term “metrics” is not well defined • As a result, there is a general movement away from use of the term “metrics” and toward standard metrology terminology • Several ISO standards define the terminology

ISO Standards Related to Measurement • ISO International Vocabulary of Basic and General Terms in Metrology (1993) • Defines basic terminology for measurement • ISO/IEC 15939 Software Measurement Process (2002) • Defines the activities and tasks necessary to have a successful measurement program • ISO/IEC 12207 Software Development Process (1995) • Defines terms and concepts associated with software development processes

Some Basic Terms • Measure (singular noun): a variable to which a value is assigned as a result of measurement • Data (plural noun): a collection of values assigned to measures • Example: • Lines of code – a measure (of size) • 2345, 5432, 18432 – specific data indicating the size values of three different programs

More Basic Terms • Attribute: a property or characteristic of an entity that can be distinguished quantitatively or qualitatively • Example: size • Base Measure: a measure of an attribute • Example: lines of code • Derived Measure: a measure defined in terms of other measures • Example: lines of code per month

Measurement is ... ... the process by which numbers and symbols are assigned to attributes of real world entities so as to describe them according to defined rules • The assignment of numbers must preserve intuitive and empirical observations about the attributes and entities

Preservation of AttributesExample “House A is bigger than House B” is a meaningful statement only if the number assignment of “size” preserves our intuitive notion of houses and their sizes. House A House B

But Intuitions Vary • Is “size” defined by area? • or by number of rooms? • or by the cost to construct? • We must define a model that reflects a specific viewpoint before we measure. • The model must specify an entity to be measured and an attribute of that entity. • I.e., what do you want to measure and what do you want to know about it?

Four Issues 1) The Properties of Numbers 2) Are Means Meaningful? 3) The Problem of Small Sample Size 4) Are the Variables Independent?

Scale Yesterday Today Centigrade 0 18 Fahrenheit 32 64 1) Properties of Numbers The properties of the number system may not necessarily apply to the attribute being measured Consider temperature: Is it twice as hot today as it was yesterday?

Twice as ... • “Twice as” is a meaningful concept for real numbers • It is not a meaningful concept for temperature • (See discussion of scales – next three slides) The error we make is assuming that properties of the number system apply to the thing being measured

Some Types of Scales (1 of 3) • Nominal: a scale that places entities in categories – but without any ordering • Example: defects are • design related • coding related or • requirements related. • Ordinal: there is a ranking or ordering • Example: defect severity is • minor • significant • major

Some Types of Scales (2 of 3) • Interval: there is a fixed distance between consecutive members of a sequence, but multiplication is not meaningful • Example: the “McCabe Complexity” of a program. Can be any positive integer • But if one program has complexity 3 and another has complexity 9, it does not mean the second one is 3 times as complex • Or consider the temperature example 2 slides back • If it is 10 degrees one day and 30 the next, it is not “3 times as hot”

Some Types of Scales (3 of 3) • Ratio: A fixed distance between consecutive sequence members, but multiplication is meaningful • Example: Size of a software program • A program with 900 lines of code is 3 times a big as one that is 300 lines of code • Absolute: all mathematical operations are meaningful

Example - Assigning a Scale to Test Failures {Blue, Green, Yellow, Red} • This is only an ordinal scale • It provides a ranking but nothing else • It makes no sense to add, subtract, multiply or divide the values. • The difference between “red” and “yellow” is not comparable to the difference between “yellow” and “green”

But Suppose we Replace with a Numeric Scale 4 = Blue 3 = Green 2 = Yellow 1 = Red • We are tempted to make statements like these: “The average test error improved from 2.2 to 3.1” “The average test error improved by 47%”

Other Examples My code is 10% smaller than yours • But what about the languages, the comments, the clarity, the performance, etc.? The average response from our customers is “good” [on a scale of very poor, poor, good, very good] • but the scale is not an interval or ratio scale, so what does “average” mean? • Does “half very good and half poor” mean “good”?

And Old Favorite • This year you will get a 10% pay cut • But next year you will get a 20% pay raise • So it will be like giving you 5% raises for two years I don’t know about that

2) Are Means Meaningful? The mean or average is a statistical concept that may have no meaning in a real situation Consider some well known examples: “The average family has 2.4 children” “The average worker is 63% male and 37% female” “The average car has 3.4 customer complaints in the first 3 months of ownership”

Meaningless Means • If my family is average, then I have 2.4 children • I will buy 2.4 sets of clothing for the children • I have made my 3 complaints but haven’t made the .4 yet • That average employee must be an interesting medical specimen

Another Meaningless Mean • Half of the people think we should turn left • And half of the people think we should turn right • So we will average and go straight ahead

Meaningful Means • Clearly, averages have some statistical validity and can be useful in some situations, such as: • Determining how big to make schools • Evaluating child health care costs • Comparing cars for reliability • Evaluating diversity in the workplace • But clearly they also have no validity in other situations

3) The Problem ofSmall Sample Size • Suppose you have a large population and you want to determine its properties by selecting a “typical” sample population of size “n”. • What conclusions can you draw from this sample? • How reliable are those conclusions?

Small n vs. Large n • Many statistical properties only apply to “large n”, where individual quirks can be smoothed. • A sample size of n < 17 is generally considered “small” • For many of cases, n must be much larger than this WHY? Because individual items have undue impact on the results when n is small.

Example “40% of the students have blonde hair” • Suppose your population size is 1000 • If your sample size is 100, and 40 of them are blonde, this is a reasonable conclusion • If your sample size is 5 and 2 have blonde hair, this is a much less reliable conclusion

Misuse of Statistics for Small Sample Sizes “We measured 10 programs and concluded that our typical program has 23.7 defects per 1000 lines of code” • Statistically speaking, can you draw a meaningful conclusion from only 10 programs? • What percent is this of the total population? 100%? 10%? 1%?

4) Are the Variables Independent? • Many standard statistical manipulations assume independent variables • But many software engineering situations have variables that influence each other and thus are dependent

Example: Comparing A and B Factor Rating A Rating B Clarity of Code 2.5 2.2 Complexity of Code 3.4 3.0 Size of Code 2.0 2.6 Total 7.9 7.8 A is “better”

Dangers of Ignoring Measurement Theory • We attach undue credibility to numbers that may be meaningless or at least much less meaningful than we think they are • We delude ourselves into thinking we have a sound basis for decisions • We may reach wrong conclusions because we misunderstand what the numbers tell us

Issues with Measuring Software

Issues with Software • Software is not bound by the laws of physics or hardware constraints • Be careful not to rely on hardware measurement where the theory assumes limits to physical behavior • E.g., if you increase input by .0001%, output changes by 10000000000%. Rarely possible in hardware, but very easy in software.

Issues with Software (continued) • Many software products are NOT code • specifications • tests • user guides • etc. • If you only measure the code, you will probably not really understand your software or its development process

Summary • Learn the principles of measurement theory • Understand what attribute you are measuring before you start to measure • Don’t assume the properties of the number system apply to the attribute being measured • Beware of misuse of means • Beware of small “n” • Beware of dependent variables

References • Department of Defense, Joint Logistics Commanders Joint Group on Systems Engineering, Practical Software Measurement, a Guide to Objective Program Insight (version 2.1), Naval Undersea Warfare Center, c/o John McGarry, mcgarry@ada.npt.navy.mil. • Fenton, Norman E. Software Metrics: A Rigorous Approach, Chapman & Hall, London SE1 8HN, 1991. ISBN 0-442-31355-1.

END OF MODULE 22

SMU CSE 8314 / NTU SE 762-N Software Measurement and Quality Engineering