CS 540 – Quantitative Software Engineering

CS 540 – Quantitative Software Engineering Lecture 7 Estimation Estimate size, then Estimate effort, schedule and cost from size Bound estimates

Project Metrics: Why Estimate? • Cost and schedule estimation • Measure progress • Calibrate models for future estimation • Manage Project Scope • Make Bid/No Bid decisions • Make Buy/Build decisions

QSE Lambda Protocol • Prospectus • Measurable Operational Value • Prototyping or Modeling • sQFD • Schedule, Staffing, Quality Estimates • ICED-T • Trade-off Analysis

Specification for Development Plan • Project • Feature List • Development Process • Size Estimates • Staff Estimates • Schedule Estimates • Organization • Gantt Chart

Approaches to Cost Estimation • By expert • By analogies • Decomposition • Parkinson’s Law; work expands to fill time available • Price to win/ customer willingness-to -pay • Lines of Code • Function Points • Mathematical Models: Function Points & COCOMO

Heuristics to do Better Estimates • Decompose Work Breakdown Structure to lowest possible level and type of software. • Review assumptions with all stakeholders • Do your homework - past organizational experience • Retain contact with developers • Update estimates and track new projections (and warn) • Use multiple methods • Reuse makes it easier (and more difficult) • Use ‘current estimate’ scheme

Heuristics to Cope with Estimates • Add and train developers early • Use gurus for tough tasks • Provide manufacturing and admin support • Sharpen tools • Eliminate unrelated work and red tape (50% issue) • Devote full time end user to project • Increase level of exec sponsorship to break new ground (new tools, techniques, training) • Set a schedule goal date but commit only after detailed design • Use broad estimation ranges rather than single point estimates

Popular Methods for Effort Estimation • Parametric Estimation • Wideband Delphi • Cocomo • SLIM (Software Lifecycle Management) • SEER-SEM • Function Point Analysis • PROBE (Proxy bases estimation, SEI CMM) • Planning Game (XP) Explore-Commit • Program Evaluation and Review Technique (PERT)

SEER-SEM: System Evaluation and Estimation of Resources • Sizing. How large is the software project being estimated (Lines of Code, Function Points, Use Cases, etc.) • Technology. What is the possible productivity of the developers (capabilities, tools, practices, etc.) • Effort and Schedule Calculation. What amount of effort and time are required to complete the project? • Constrained Effort/Schedule Calculation. How does the expected project outcome change when schedule and staffing constraints are applied? • Activity and Labor Allocation. How should activities and labor be allocated into the estimate? • Cost Calculation. Given expected effort, duration, and the labor allocation, how much will the project cost? • Defect Calculation. Given product type, project duration, and other information, what is the expected, objective quality of the delivered software? • Maintenance Effort Calculation. How much effort will be required to adequately maintain and upgrade a fielded software system? • Progress. How is the project progressing and where will it end up. Also how to replan. • Validity. Is this development achievable based on the technology involved?

Wide Band Delphi • Convene a group of expert • Coordinator provides each expert with spec • Experts make private estimate in interval format: most likely value and an upper and lower bound • Coordinator prepares summary report indicating group and individual estimates • Experts discuss and defend estimates • Group iterates until consensus is reached

Minimum Time: PERT and GANTT

Boehm: “A project can not be done in less than 75% of theoretical time” Time Ttheoretical 75% * Ttheoretical Linear increase Staff-month Impossible design But, how can I estimate staff months? Ttheoretical = 2.5 * 3√staff-months

Sizing Software Projects • Effort = (productivity)-1 (size)c productivity ≡ staff-months/kloc size ≡ kloc Staff months 500 Lines of Code or Function Points

Understanding the equations Consider a transaction project of 38,000 lines of code, what is the shortest time it will take to develop? Module development is about 400 SLOC/staff month Effort = (productivity)-1 (size)c = (1/.400 KSLOC/SM) (38 KSLOC)1.02 = 2.5 (38)1.02 ≈ 100 SM Min time = .75 T= (.75)(2.5)(SM)1/3 ≈ 1.875(100)1/3 ≈ 1.875 x 4.63 ≈ 9 months

Productivity= f(size) Bell Laboratories data Capers Jones data Productivity (Function points / staff month) Function Points

Lines of Code • LOC ≡ Line of Code • KLOC ≡ Thousands of LOC • KSLOC ≡ Thousands of Source LOC • NCSLOC ≡ New or Changed KSLOC

Bernstein’s rule of thumb Productivity per staff-month: • 50 NCSLOC for OS code (or real-time system) • 250-500 NCSLOC for intermediary applications (high risk, on-line) • 500-1000 NCSLOC for normal applications (low risk, on-line) • 10,000 – 20,000 NCSLOC for reused code Reuse note: Sometimes, reusing code that does not provide the exact functionality needed can be achieved by reformatting input/output. This decreases performance but dramatically shortens development time.

Productivity: Measured in 2000

Heuristics for requirements engineering • Move some of the desired functionality into version 2 • Deliver product in stages 0.2, 0.4… • Eliminate features • Simplify Features • Reduce Gold Plating • Relax the specific feature specifications

Function Point (FP) Analysis • Useful during requirement phase • Substantial data supports the methodology • Software skills and project characteristics are accounted for in the Adjusted Function Points • FP is technology and project process dependent so that technology changes require recalibration of project models. • Converting Unadjusted FPs (UFP) to LOC for a specific language (technology) and then use a model such as COCOMO.

Function Point Calculations • Unadjusted Function Points UFP= 4I + 5O + 4E + 10L + 7F, Where I ≡ Count of input types that are user inputs and change data structures. O ≡ Count of output types E ≡ Count of inquiry types or inputs controlling execution. • [think menu selections] L ≡ Count of logical internal files, internal data used by system • [think index files; they are group of logically related data entirely within the applications boundary and maintained by external inputs. ] F ≡ Count of interfaces data output or shared with another application Note that the constants in the nominal equation can be calibrated to a specific software product line.

External Inputs – One updates two files External Inputs (EI) - when data crosses the boundary from outside to inside. This data may come from a data input screen or another application.

External Interface Table File Type References (FTR’s) are the sum of Internal Logical Files referenced or updated and External Interface Files referenced. For example, EIs that reference or update 2 File Types Referenced (FTR’s) and has 7 data elements would be assigned a ranking of average and associated rating of 4.

External Output from 2 Internal Files External Outputs (EO) – when data passes across the boundary from inside to outside.

External Inquiry drawing from 2 ILFs External Inquiry (EQ) - an elementary process with both input and output components that result in data retrieval from one or more internal logical files and external interface files. The input process does not update Internal Logical File, and there is no derived data.

EO and EQ Table mapped to Values

Adjusted Function Points Unadjusted Function Points (UFP) • Accounting for Physical System Characteristics • Characteristic Rated by System User • 0-5 based on “degree of influence” • 3 is average X General System Characteristics (GSC) = • Data Communications • Distributed Data/Processing • Performance Objectives • Heavily Used Configuration • Transaction Rate • On-Line Data Entry • End-User Efficiency • On-Line Update • Complex Processing • Reusability • Conversion/Installation Ease • Operational Ease • Multiple Site Use • Facilitate Change Adjusted Function Points (AFP) AFP = UFP (0.65 + .01*GSC), note GSC = VAF= TDI

Complexity Table

Complexity Factors 1. Problem Domain ___ 2. Architecture Complexity ___ 3. Logic Design -Data ___ 4. Logic Design- Code ___ Total ___ Complexity = Total/4 = _________

Problem DomainMeasure of Complexity (1 is simple and 5 is complex) • All algorithms and calculations are simple. • Most algorithms and calculations are simple. • Most algorithms and calculations are moderately complex. • Some algorithms and calculations are difficult. • Many algorithms and calculations are difficult. Score ____

Architecture ComplexityMeasure of Complexity (1 is simple and 5 is complex) 1. Code ported from one known environment to another. Application does not change more than 5%. 2. Architecture follows an existing pattern. Process design is straightforward. No complex hardware/software interfaces. 3. Architecture created from scratch. Process design is straightforward. No complex hardware/software interfaces. 4. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces exist but they are well defined and unchanging. 5. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces are ill defined and changing. Score ____

Logic Design -Data Score ____

Logic Design- Code Score __

Computing Function Points See http://www.engin.umd.umich.edu/CIS/course.des/cis525/js/f00/artan/functionpoints.htm

Adjusted Function Points • Now account for 14 characteristics on a 6 point scale (0-5) • Total Degree of Influence (DI) is sum of scores. • DI is converted to a technical complexity factor (TCF) TCF = 0.65 + 0.01DI • Adjusted Function Point is computed by FP = UFP X TCF • For any language there is a direct mapping from Function Points to LOC Beware function point counting is hard and needs special skills

Function Points Qualifiers • Based on counting data structures • Focus is on-line data base systems • Less accurate for WEB applications • Even less accurate for Games, finite state machine and algorithm software • Not useful for extended machine software and compliers An alternative to NCKSLOC because estimates can be based on requirements and design data.

SLOC Defined : • Single statement, not two separated by semicolon • Line feed • All written statements (OA&M) • No Comments • Count all instances of calls, subroutines, … There are no industry standards and SLOC can be fudged

Initial Conversion http://www.qsm.com/FPGearing.html

Average Median Low High Consultant

SLOC • Function Points = UFP x TCF = 78 * .96 = 51.84 ~ 52 function points • 78 UFP * 53 (C++) SLOC / UFP = 4,134 SLOC ≈ 4.2 KSLOC . (Reference for SLOC per function point: http://www.qsm.com/FPGearing.html)

Understanding the equations For 4,200 lines of code, what is the shortest time it will take to develop? Module development is about 400 SLOC/staff month From COCOMO: Effort = 2.4 (size)c By Barry Boehm

What is ‘2.4?’ Effort = 2.4 (size)c =1/(.416)(size)c Effort = (productivity)-1 (size)c where productivity = 400 KSLOC/SM from the statement of the problem = (1/.400 KSLOC/SM)(4.2 KSLOC)1.16 = 2.5 (4.2)1.16 ≈ 13 SM

Minimum Time Theoretical time = 2.5 * 3√staff-months Min time = .75 Theorectical time = (.75)(2.5)(SM)1/3 ≈ 1.875(13)1/3 ≈ 1.875 x 2.4 ≈ 4.5 months

Pros: Language independent Understandable by client Simple modeling Hard to fudge Visible feature creep Cons: Labor intensive Extensive training Inexperience results in inconsistent results Weighted to file manipulation and transactions Systematic error introduced by single person, multiple raters advised Function Point pros and cons

Easy? • “When performance does not meet the estimate, there are two possible causes: poor performance or poor estimates. In the software world, we have ample evidence that our estimates stink, but virtually no evidence that people in general don’t work hard enough or intelligently enough.” -- Tom DeMarco

Capers Jones Expansion Table

638 475 142 113 81 75 47 37.5 30 15 Bernstein’s Trends in Software Expansion 1000 100 Expansion Factor 10 Order of Magnitude Every Twenty Years 3 1 1960 Machine Instructions 1965 Macro Assembler 1970 High Level Language 1975 Database Manager 1980 On-line 1985 Prototyping 1990 Subsec Time Sharing 1995 Object Oriented Programming 2000 Large Scale Reuse Technology Change Regression Testing Small Scale Reuse 4GL

Sizing Software Projects Effort = (productivity)-1 (size)c Staff months 500 1000 Lines of Code or Function Points

Regression Models • Effort: • Watson-Felix: Effort = 5.2 KLOC 0.91 • COCOMO: Effort = 2.4 KLOC 1.05 • Halstead: Effort = 0.7 KLOC 1.50 • Schedule: • Watson-Felix: Time = 2.5E 0.35 • COCOMO: Time = 2.5E 0.38 • Putnam: Time = 2.4E 0.33

CS 540 – Quantitative Software Engineering