Software Measurement

Software Measurement UCLA Computer Science Department CS130 Winter, 2002

Reference • Material in this lecture is taken from chapters 1-3 of Software Metrics: A Rigorous and Practical Approach (2nd ed.), Norman E. Fenton and Shari Lawrence Pfleeger, 1997, PWS Publishing Company, Boston, MA, ISBN 0534954251

Overview • Measurement – what is it and why do we do it? • Measurement basics • A goal-based software measurement framework

Measurement – What Is It and Why Do We Do It? • Measurement in Everyday Life • Measurement in Software Engineering • The Scope of Software Metrics

Measurement in Everyday Life • Measurement governs many aspects of everyday life: • Economic indicators determine prices, pay raises • Medical system measurements enable diagnosis of specific illnesses • Measurements in atmospheric systems are the basis of weather prediction

Measurement in Everyday Life • How do we use measurement in our lives? • In a shop, price is a measure of the value of an item, and we calculate the bill to make sure we get the correct change. • Height and size measurements ensure clothing will fit correctly. • When traveling, we calculate distance, choose a route, measure speed, and predict when we’ll arrive • Measurement helps us to: • Understand our world • Interact with our surroundings • Improve our lives

Measurement in Everyday Life • What is Measurement? • Common thread in previous examples – some aspect of a thing is assigned a descriptor that allows us to compare it with other things. • More formally – the process by which • Numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them. • According to clearly defined rules.

Measurement in Everyday Life • Definition of measurement process is far from clear cut. • To understand measurement, must ask questions that are difficult to answer: • In a room with blue walls, is “blue” a measure of the color of the room? • A person’s height is a commonly understood attribute that can be easily measured. What about other attributes of people, such as intelligence? • Some measurements (e.g., intelligence, wine quality) may have wide error margins – is this a reason to reject them? • How do we decide which error margins are acceptable and which are not? • When is a measurement scale acceptable for the purpose to which it is put (e.g., is it appropriate to measure a person’s height in kilometers)? • What types of manipulations can we apply to the results of measurement? • Material in next section (Measurement Basics) will allow us to answer these questions.

Measurement in Everyday Life • Making Things Measurable • “What is not measurable, make measurable” (Galileo Galilei) • One aim of science is to find ways of measuring attributes of things we’re interested in. • Measurement makes concepts more visible, therefore more understandable and controllable. • Attributes previously thought to be unmeasurable now form basis for decisions affecting our lives (e.g., air quality, inflation index). • Measuring the unmeasurable improves understanding of particular entities, attributes • Act of proposing a particular measure can open discussion that will lead to greater understanding • Making new measurement may requiring modifying environment or practices (e.g., using a new tool, adding a step in a process)

Measurement in Everyday Life • Measurement in Software Engineering • In many instances, measurement is considered a luxury. For many projects: • Measurable targets are not set (e.g., products are supposed to be user-friendly, reliable, and maintainable, but we don’t quantify what that means). • The component costs of projects are not quantified or understood. • Product quality is not quantified. • Too much reliance on anecdotal evidence (e.g., try our product and you’ll improve your productivity by 50%!). Most of the time, there’s no measurable basis for the claims.

Measurement in Everyday Life • Measurement in Software Engineering (cont’d) • When measurements are made, they tend to be: • Incomplete • Inconsistent • Infrequent • Most of the time, we’re not told anything about: • How experiments were designed • What was measured and how • Realistic error margins • Without this information, can’t decide whether to apply results to a development effort, and can’t do an objective study to repeat the measurements. • Lack of measurement in SW engineering is compounded by lack of a rigorous approach.

Measurement in Everyday Life • Software Measurement Objectives • Assessing status • Projects • Products for a specific project or projects • Processes • Resources • Identifying trends • Need to be able to differentiate between a healthy project and one that’s in trouble • Determine corrective action • Measurements should indicate the appropriate corrective action, if any is required.

Measurement in Everyday Life • Types of information required to understand, control, and improve projects: • Managers • What does the process cost? • How productive is the staff? • How good is the code? • Will the customer/user be satisfied? • How can we improve? • Engineers • Are the requirements testable? • Have all the faults been found? • Have the product or process goals been met? • What will happen in the future?

Measurement in Everyday Life • The Scope of Software Metrics • Cost and effort estimation • Productivity measures and models • Data collection • Quality models and measures • Reliability models • Performance evaluation and models • Structural and complexity metrics • Capability-maturity assessment • Management by metrics • Evaluation of methods and tools

Measurement in Everyday Life • The Scope of Software Metrics – some details • Cost and effort estimation • Motivation – accurately predict costs early in the development life cycle. • Numerous empirical cost models have been developed • COCOMO, COCOMO 2 • Putnam’s model (see Pressman Ch 3) • ...

Measurement in Everyday Life • The Scope of Software Metrics – some details • Productivity models and measures • Estimate staff productivity to determine how much specified changes will cost • Naive measure – size divided by effort. Doesn’t take into account things like defects, functionality, reliability. • More comprehensive models have been developed – next slide illustrates a possible model.

Measurement in Everyday Life • The Scope of Software Metrics – some details • Possible productivity model Productivity Cost Value Personnel Resources Complexity Quality Quantity HW Env Cnstrst Time Problem difficulty Reliability Defects Size Functionality Money SW

Measurement in Everyday Life • The Scope of Software Metrics – some details • Software quality model Criteria Factor Use Communicativeness Usability Accuracy Product Operation Reliability Consistency Efficiency Device Efficiency Accessibility Reusability Metrics Completeness Maintainability Product Revision Structuredness Conciseness Portability Device Independence Testability Legibility Self-descriptiveness Traceability

Overview • Measurement – what is it and why do we do it? • Measurement basics • A goal-based software measurement framework

Measurement Basics • Overview • The representational theory of measurement • Measurement and models • Measurement scales and scale types • Meaningfulness in measurement

Measurement Basics • Overview • Understanding of software attributes not as deep as understanding of non-software entities (e.g., length, weight, temperature) • Questions that are relatively easy to answer for non-software entities are difficult for software: • How much must we know about an attribute before it’s reasonable to consider measuring (e.g., program complexity)? • How do we know if we’ve really measured the attribute we want to measure? Does a count of the number of defects found in a system measure its quality, or does it measure something else? • Using measurement, what meaningful statements can we make about an attribute and the entities that possess it (e.g., can we talk about doubling a design’s quality)? • What meaning operations can we perform on measures (e.g., can we compute the average productivity of a group of developers, or the average quality of a set of modules)? • Answering these questions requires developing a theory of measurement

Measurement Basics • The representational theory of measurement • Developed as a classical discipline from the physical sciences • Provides rules for: • Making consistent measurements • Interpreting data resulting from measurement • Representational theory of measurement formalizes intuition about the way the world works.

Measurement Basics • Empirical relations • Data obtained as measures should represent attributes of observed entities • Manipulating data should preserve observed relationships • Example – “Taller than” • Binary relation defined on the set of pairs of people. Either • A is taller than B, or • B is taller than A • Empirical relations are not restricted to binary relations – can be unary (e.g., A is tall), ternary (A sitting on B’s shoulders is taller than C), etc.

Measurement Basics • Empirical relations (cont’d) • Empirical relations are mappings from the empirical, real world to a formal mathematical world. • Height – maps a set of people to the set of real numbers • Greater functionality (from survey results) • x has greater functionality than y if (x,y) > 60%. Relation is (C,A), (C,B), (C,D), (A,B), (A,D). • Surveys can help gain preliminary understanding of relationships.

Measurement Basics • Empirical relations (cont’d) • Definitions • Measurement – a mapping from the empirical world to the formal, relational world. • Measure – number or symbol assigned to an entity by the mapping in order to characterize an attribute.

Measurement Basics • Rules of Mapping • Measures must specify domain and range as well as the rule for performing the mapping • Domain – real world is domain of mapping that defines the measurement • Range – the mathematical world into which real-world attributes are mapped • Examples • Measuring height: • Is height measured in inches, centimeters, feet? • Are people measured sitting or standing? • Are shoes allowed to be worn during the measurement? • Measuring lines of code • Are lines of code reused without change counted? • Are non-executable lines counted? • Declarations • Compiler Directives • Comments • Blank lines

Measurement Basics • The representation condition • Behavior of measures in number system needs to be the same as corresponding elements in the real world. • Formally, a measurement mapping M must map entities into numbers and empirical relations into numerical relations in such a way that: • Empirical relations preserve numerical relations • Empirical relations are preserved by numerical relations

Measurement Basics • The representation condition – example • Taller than: • A is taller than B iff M(A) > M(B), where M is a mapping from the empirical world to the real numbers. • Whenever Joe is taller than Frank, then M(Joe) must be a bigger number than M(Frank) • Jane can be mapped to a bigger number than John only if Jane is taller than John.

Measurement Basics • The representation condition – example 2 • Software failures criticality • Three types of failures examined: • Delayed response • Incorrect output • Data loss • At this point, we have a relation system consisting of 3 unary relations • R1 for delayed response • R2 for incorrect output • R3 for data loss • With this information, we can’t yet judge the relative criticality of these types of failures.

Measurement Basics • The representation condition – example 2 (cont’d) • We can find a representation in the set of real numbers by choosing three distinct numbers: • M(delayed response) = 6 • M(incorrect output)=4 • M(data loss)=50 • Further investigation of criticality reveals that data loss is more critical than incorrect output, which in turn is more critical than a delayed response. • To develop a real-number representation for this enriched relation, we must be more careful in assigning numbers. • Using “>” to mean “more critical than”, data-loss failures must be mapped to a higher number than incorrect output failures, which in turn must mapped to a higher number than delayed responses.

Measurement Basics • The representation condition (cont’d) • There may be many different measures for a given attribute (e.g., in., cm., furlongs). • Any measure satisfying the representation condition is a valid measurement • The richer the empirical relation system, the fewer the valid valid measures • Relational systems are rich if they have a large number of relations that can be defined. • As the number of empirical relations increases, so does the number of conditions a measurement mapping must satisfy in its representation condition.

Measurement Basics • Measurement and models • Model – an abstraction of reality allowing us to: • Strip away unnecessary detail • View an entity or concept from a particular perspective • Representation condition requires every measure to be associated with a model of how the measure maps real world entities and attributes to elements of a numerical system. These models are essential in: • Understanding how measure is derived • Interpreting behavior of numerical elements when we return to the real world.

Measurement Basics • Defining Attributes • Always a temptation to focus too much on formal, mathematical system, rather than on empirical system. • Before we set out to measure something (e.g., program complexity), we need to: • Identify a set of characteristics of the thing we’re trying to measure • A model that associates the characteristics • We can then define measures for each characteristic, and use the representation condition to help understand the relationships.

Measurement Basics • Direct and Indirect Measurement • Direct measure – relates an attribute to a number or symbol without reference to no other object or attribute (e.g., height). • Indirect measure • Used when an attribute must be measured by combining several of its aspects (e.g., density) • Requires a model of how measures are related to each other

Measurement Basics • Direct and Indirect Measures for Software – examples • Direct • Length or source code (lines of code) • Duration of testing process • Number of defects discovered during test • Time a developer spends on a project • Indirect • Programmer productivity (LOC/workmonths of effort) • Module defect density (number of defects/module size) • Defect detection efficiency (# defects detected/total defects) • Requirements stability (initial # requirements/total # requirements) • Test effectiveness ratio (number of items covered/total number of items) • System spoilage (effort spent fixing faults/total project effort)

Measurement Basics • Measurement for prediction • So far we’ve talked about measuring some entity that already exists • Useful for assessing current situation or understanding what has happened in the past • In many cases, we want to predict an attribute of an entity that doesn’t yet exist (e.g., project cost, reliability of fielded system). • Requires model relating measurement that can be taken now to attributes that will be predicted • Empirical cost models • Software reliability models • Model is not sufficient by itself to perform required prediction. Need a prediction system including: • A model relating the measurements to the desired attribute • A procedure to model parameters • Procedures for interpreting model results

Measurement Basics • Measurement for prediction • Accurate predictive measurement is always based on measurement in the assessment sense • Everyone wants to predict key determinants of success (e.g., effort to build a new system, operational reliability), but... • There are no magic models. They all depend on: • High-quality measurements of past projects • High-quality measurements of current project

Measurement Basics • Measurement scales and scale types • A measurement scale is our mapping, M, together with the empirical and numerical relation systems. • If the relation systems (domain and range) are obvious from context, sometimes M alone is referred to as the scale. • Three important questions concerning representations and scales: • How do we determine when one numerical relation system is preferable to another? • How do we know if a particular empirical relation system has a representation in a given numerical relation system? • What do we do when we have several different possible representations (and hence many scales) in the same numerical relation system?

Measurement Basics • Measurement scales and scale types (cont’d) • Three questions: • How do we determine when one numerical relation system is preferable to another? • Answer: We can map the scale to a symbolic relational system. In practice, this can be unwieldy (symbolic vs. numerical manipulation). We try to use real numbers whenever possible. • How do we know if a particular empirical relation system has a representation in a given numerical relation system? • Answer: This is known as the representation problem, one of the basic problems of measurement theory. This is a solved problem for various types of relation systems characterized by specific axioms. Discussion is beyond the scope of this course, but solutions can be found in texts on measurement theory. • What do we do when we have several different possible representations (and hence many scales) in the same numerical relation system? • Answer: This is the uniqueness problem. Following slides address this question.

Measurement Basics • Measurement scale types • Nominal • Ordinal • Interval • Ratio • Absolute • One relational system is richer than another if all relationships in the second system are contained in the first. • Scale types above are listed in order of increasing richness.

Measurement Basics Measurement scale types (cont’d) • Why is this important? • If we have a satisfactory measure for an attribute with respect to an empirical relation system, we want to know what other measures exist that are acceptable. • Mapping from one acceptable measure to another is called an admissible transformation. • Example – when considering length, admissible transformations are of the form M’=aM. Transformations of the form M’=b+aM, or M’=aMb are not acceptable when b <> 0. • The more restrictive the class of admissible transformations, the most sophisticated the measurement scale.

Measurement Basics • Nominal scale • Most primitive form of measurement – define classes or categories, and place each category in a particular class or category • Two major characteristics • Empirical relation consists only of different classes – no notion of ordering • Any distinct number or symbolic representation is an acceptable measure – no notion of magnitude associated with numbers or symbols. • Any two mappings, M and M’, will be related to each other in that M’ can be obtained from M by a one-to-one mapping • Example – software faults can belong to one of the following classes, according to where they were first introduced during development: • Specification • Design • Code

Measurement Basics • Measurement types and scale • Ordinal scale • Augments nominal scale with ordering information. • Three major characteristics • Empirical relation system consists of classes that are ordered with respect to the attribute • Any mapping preserving the ordering (i.e., a monotonic function) is acceptable • Numbers represent ranking only, so arithmetic operations have no meaning • Set of admissible transformations is set of all monotonic mappings • Example – software “complexity” – two valid measures

Measurement Basics • Measurement type and scale • Interval scale • Captures information about size of intervals that separate classes. • Three characteristics • Preserves order • Preserves differences, but not ratios • Addition and subtraction are acceptable, but not multiplication and division • Class of admissible transformations is the set of affine transformations: M’=aM+b, where a>0. • Example – software complexity – suppose the difference in complexity between a trivial and a simple system is the same as that between a simple and a moderate system. Where this equal step applies to each class, we have an attribute measurable on an interval scale.

Measurement Basics • Measurement type and scale • Ratio scale • Most useful scale, common in physical sciences – captures information about ratios • 4 characteristics • Preserves ordering, size of intervals between entities, and ratios between entities • There is a zero element, representing total lack of the attribute • Measurement mapping must start at 0 and increase at equal intervals (units) • All arithmetic can be meaningfully applied to classes in the range of the mapping. • Acceptable transformations are ratio transformations – M’=aM, where a is a scalar. • Example – program length can be measured by lines of code, number of characters, etc. Number of characters may be obtained by multiplying the number of lines by the average number of characters per line.

Measurement Basics • Measurement type and scale • Absolute scale • Most restrictive in terms of admissible transformations • For any two measures, M and M’, there’s only one admissible transformation (identity transformation), since there’s only one way to make the measurement. • 4 characteristics • Measurement is made simply by counting the number of elements in the entity set. • Attribute always takes the form of “number of occurrences of x in the entity” • Only one possible measurement mapping, namely the actual count • All arithmetic analysis of the resulting count is meaningful. • Example – lines of code in a module is an absolute scale measure.

Measurement Basics • Measurement type and scale - summary

Measurement Basics • Meaningfulness in measurement • After making measurements, key question is “can we deduce meaningful statements about entities being measured?” • Harder to answer than it first appears – consider these statements: • The number of errors discovered during the integration testing of a program X was at least 100 • The cost of fixing each error in program X is at least 100 • A semantic error takes twice as long to fix as a syntactic error • A semantic error is twice as complex as a syntactic error

Measurement Basics • Meaningfulness in measurement (cont’d) • First statement seems to make sense • Second statement doesn’t make sense – number of errors may be specified without reference to a particular scale, but cost to fix them must be • Statement 3 seems sensible – the ratio of time taken is the same, whether time is measured in second, hours, or fortnights • Statement 4 does not appear to be meaningful and requires clarification: • If complexity means time to understand the error, than it makes sense • Other definitions of complexity may not admit measurement on a ratio scale (e.g. examples in previous slides) in which case statement 4 is meaningless.

Measurement Basics • Meaningfulness in measurement • Definition: a statement involving measurement is meaningful if its truth value is invariant of transformations of allowable scales.

Software Measurement