Domain-Driven Software Cost Estimation

Domain-DrivenSoftware Cost Estimation Wilson Rosa (Air Force Cost Analysis Agency) Barry Boehm (USC) Brad Clark (USC) Thomas Tan (USC) Ray Madachy (Naval Post Graduate School) 27th International Forum on COCOMO® and Systems/Software Cost Modeling October 16, 2012 This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Systems Engineering Research Center (SERC) under Contract H98230-08-D-0171. The SERC is a federally funded University Affiliated Research Center (UARC) managed by Stevens Institute of Technology consisting of a collaborative network of over 20 universities. More information is available at www.SERCuarc.org

Research Objectives Domain CER/SER DataPreparation and Analysis Cost (Effort) = a * Sizeb Schedule = a * Sizeb * Staffc Data Records for one Domain 27th International Forum on COCOMO® and Systems/Software Cost Modeling • Make collected data useful to oversight and management entities • Provide guidance on how to condition data to address challenges • Segment data into different Application Domains and Operating Environments • Analyze data for simple Cost Estimating Relationships (CER) and Schedule-Cost Estimating Relationships (SCER) within each domain • Develop rules-of-thumb for missing data

Stakeholder Community Funding Sources Data Sources Project has evolved into a Joint Government Software Study Research is collaborative across heterogeneous stakeholder communities who have helped us in refining our data definition framework, taxonomy, providing us data and funding 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Topics • Data Preparation Workflow • Data Segmentation • Analysis Workflow • Software Productivity Benchmarks • Cost Estimating Relationships • Schedule Estimating Relationships • Conclusion • Future Work 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Data Preparation

Current Dataset Multiple Sources Multiple Data Formats (SRDR, SEER, COCOMO) SRDR (377 records) + Other (143 records) = 522 total records 27th International Forum on COCOMO® and Systems/Software Cost Modeling

The Need for Data Preparation • Issues found in dataset • Inadequate information on modified code (size provided) • Inadequate information on size change or growth • Size measured inconsistently • Inadequate information on average staffing or peak staffing • Inadequate information on personnel experience • Inaccurate effort data in multi-build components • Missing effort data • Replicated duration (start and end dates) across components • Inadequate information on schedule compression • Missing schedule data • No quality data 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Data Preparation Workflow Start with SRDR submissions Inspect each Data Point Determine Data Quality Levels No resolution Correct Missing or Questionable Data Normalize Data Exclude from Analysis Segment Data 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Segment Data by Operating Environments (OE) 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Segment Data by Productivity Type (PT) • Different productivities have been observed for different software application types. • SRDR dataset was segmented into 14 productivity types to increase the accuracy of estimating cost and schedule Sensor Control andSignal Processing (SCP) Vehicle Control (VC) Real Time Embedded (RTE) Vehicle Payload (VP) Mission Processing (MP) System Software (SS) Telecommunications (TEL) Process Control (PC) Scientific Systems (SCI) Planning Systems (PLN) Training (TRN) Test Software (TST) Software Tools (TUL) Intelligence & Information Systems (IIS) 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Example: Finding Productivity Type 27th International Forum on COCOMO® and Systems/Software Cost Modeling Finding Productivity Type (PT) using the Aircraft MIL-STD-881 WBS: The highest level element represents the environment. In the MAV environment there are the Avionics subsystem, Fire-Controlsub-subsystem, and the sensor, navigation, air data, display, bombing computer and safetydomains. Each domain has an associated productivity type.

Operating Environment & Productivity Type When the dataset is segmented by Productivity Type and Operating Environment, the impact accounted for by many COCOMO II model drivers are considered 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Data Analysis

Analysis Workflow Prepared, Normalized & Segmented Data Derive CER Model Form Derive Final-CER & reference data subset Derive SCER Publish Productivity Benchmarks by Productivity Type & Size Group Publish SCER Publish CER results CER: Cost Estimating Relationship PR: Productivity Ratio SER: Schedule Estimating Relationship SCER: Schedule Compression / Expansion Relationship 27th International Forum on COCOMO® and Systems/Software Cost Modeling

. Software Productivity Benchmarks Productivity-based CER Software productivity refers to the ability of an organization to generate outputs using the resources that it currently has as inputs. Inputs typically include facilities, people, experience, processes, equipment, and tools. Outputs generated include software applications and documentation used to describe them. The metric used to express software productivity is thousands of equivalent source lines of code (ESLOC) per person-month (PM) of effort. While many other measures exist, ESLOC/PM will be used because most of the data collected by the Department of Defense (DoD) on past projects is captured using these two measures. While controversy exists over whether or not ESLOC/PM is a good measure, consistent use of this metric (see Metric Definitions) provides for meaningful comparisons of productivity.

Software Productivity Benchmarks • Benchmarks by PT, across all operating environments** • ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added

Software Productivity Benchmarks • Benchmarks by PT, Ground System Manned Only CV: Cost Variance ESLOC: Equivalent SLOC KESLOC: Equivalent SLOC in Thousands MAD: Mean Absolute Deviation MAX: Maximum MIN: Minimum PM: Effort in Person-Months PT: Productivity Type OE: Operating Environment Preliminary Results – More Records to be added

Cost Estimating Relationships Preliminary Results – More Records to be added

CER Model Forms Scaling Factor Log-Log transform % Adjustment Factor Production Cost(Cost/Unit) ln(Effort) = b0 + (b1 * ln(Size)) + (b2 * ln(c1)) + (b3 * ln(c2)) + … Effort = eb0 * Sizeb1 * c1b2 * c2b3 + … Anti-log transform Effort = a * Size Effort = a * Size + b Effort = a * Sizeb+ c Effort = a * ln(Size) + b Effort = a * Sizeb * Durationc Effort = a * Sizeb * c1-n 19

Software CERs by Productivity Type (PT) • CERs by PT, across all operating environments** • ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added

Software CERs for Aerial Vehicle Manned (AVM) • CERs by Productivity Type, AVM Only CERs: Cost Estimating Relationships ESLOC: Equivalent SLOC KESLOC: Equivalent SLOC in Thousands MAD: Mean Absolute Deviation MAX: Maximum MIN: Minimum PM: Effort in Person-Months PRED: Prediction (Level) PT: Productivity Type OE: Operating Environment Preliminary Results – More Records to be added

Software CERs for Manned Ground Systems Manned (GSM) • CERs by Productivity Type CERs: Cost Estimating Relationships ESLOC: Equivalent SLOC KESLOC: Equivalent SLOC in Thousands MAD: Mean Absolute Deviation MAX: Maximum MIN: Minimum PM: Effort in Person-Months PT: Productivity Type OE: Operating Environment Preliminary Results – More Records to be added

Software CERs for Space Vehicle Unmanned • CERs by Productivity Type (PT) - SVU Only CERs: Cost Estimating Relationships ESLOC: Equivalent SLOC KESLOC: Equivalent SLOC in Thousands MAD: Mean Absolute Deviation MAX: Maximum MIN: Minimum PM: Effort in Person-Months PRED: Prediction (Level) PT: Productivity Type OE: Operating Environment Preliminary Results – More Records to be added

Schedule Estimating Relationships Preliminary Results – More Records to be added

Schedule Estimation Relationships (SERs) • SERs by Productivity Type (PT), across operating environments** • ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Size – People – Schedule Tradeoff 27th International Forum on COCOMO® and Systems/Software Cost Modeling

COCOMO 81 vs. New Schedule Equations • Model Comparisons • ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added

COCOMO 81 vs. New Schedule Equations • Model Comparisons using PRED (30%) • ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added

. Conclusions

Conclusion • Developing CERs and Benchmarks by grouping appears to account for some of the variability in estimating relationships. • Grouping software applications byOperating Environment and Productivity Type appears to have promise – but needs refinement • Analyses shown in this presentation are preliminary as more data is available for analysis • It requires preparation first 27th International Forum on COCOMO® and Systems/Software Cost Modeling

Future Work • Productivity Benchmarks need to be segregated by size-groups • More data is available to fill in missing cells in the OE-PT table • Workshop recommendations will be implemented • New data grouping strategy • Data repository that provides drill-down to source data • Presents the data to the analyst • If there is a question, it is possible to navigate to the source document, e.g. data collection form, project notes, EVM data, Gantt Charts, etc. • Final results will be published online http://csse.usc.edu/afcaawiki

Domain-Driven Software Cost Estimation

Domain-Driven Software Cost Estimation

Presentation Transcript

Software Cost Estimation

Software Cost Estimation

Software Cost Estimation

Software cost estimation

Software Cost Estimation

Software Cost Estimation

Software cost estimation

Software Cost Estimation

Software Cost Estimation

Software cost estimation

SOFTWARE COST ESTIMATION

Software Engineering Software Cost Estimation

Software cost estimation

Software cost estimation

Software cost estimation

Software cost estimation

Software Cost Estimation

Software cost estimation

Software cost estimation

Software cost estimation

SOFTWARE COST ESTIMATION

Software Cost Estimation