Exploiting the Essential Assumptions of Analogy-based Effort Estimation

Exploiting the Essential Assumptions of Analogy-based Effort Estimation Syed Shah A Zaman (6705130) Shakil Mahmud(7015384) Submitted to Professor ShervinShirmohammadi in partial fulfillment of the requirements for the course ELG 5100

Roadmap • Effort estimation and its importance • Different methods of effort estimation • Analogy based effort estimation • TEAK and its steps • Conclusion & Future Work

Effort estimation “Software development efforts estimation is the process of predicting the most realistic use of effort required to develop or maintain software ” • Importance of effort estimation: • Tracking velocity • Iteration scope • Prioritizing • Release planning

Effort estimation methods • Three subcategories: • Human Centric Techniques (e.g. Expert judgment) • Algorithmic models (e.g. COCOMO) • Machine learning (e.g. Analogy based estimation or ABE)

Analogy Based Effort Estimation “projects that are similar with respect to project features will be also similar with respect to project effort” • Five basic steps: • Select the historical project dataset • Choose the project features for similarity measurement • Measure the similarities • Identify the most similar projects • Adopt the efforts of similar projects to generate the effort estimation

ABE0 or “Baseline” ABE: Basic features • Table containing all the training projects • Each row-one project • Each column- independent and dependent variables or features of each project (e.g. duration, effort) choice of variables is flexible • Input: test project • Output: estimate for that project • Scaling measure is used to maintain same degree of feature in test and training projects • Feature weighting is used to reflect the influence of the features

Measuring Similarity “Measuring the closeness between two data objects in n-dimensional feature space” • Objective: Rank the similar cases from the dataset and utilize k nearest cases. • Most common method: Euclidean distance metric • Example: Two points X(x1, x2, x3…..) and Y(y1 ,y2, y3 ……) • Unweighted Euclidian distance: • Weighted Euclidian distance:

TEAK: Test Estimation Assumption Knowledge • It uses an easy path heuristic which finds the situations that confuse estimation and removes those

Select prediction system • There are many prediction systems • Authors have chosen ABE, because : • It is widely studied. • It works even if the domain data are sparse • Unlike other predictors, it makes no assumptions about data distributions or an underlying model. • When the local data do not support standard Algorithmic/parametric models like COCOMO, ABE can still be applied.

Identify essential assumption(s) • Assumption one: locality implies homogeneity • If two projects are closer, it would mean they are more similar • Avoiding confuse estimation • Not choosing a project that has higher variances Variance defined as:

Identify assumption violations • We need to find a way to compare the variance between small and large k estimates, so, clustering is needed , and a GAC (Greedy Agglomerative Clustering) tree is formed • Using the GAC tree, finding the k-nearest neighbors in project data can be implemented using the following procedure called TRAVERSE: • 1. Place the test project at the root of the tree. • 2. Move the test project to the nearest child (where “nearest” is defined by Eulicidean). • 3. Go to step 2. • In the tree, check where going to a child increases variance, that is identified as a violation.

Remove violations • After identifying the violated node, check if it satisfies the selected pruning policy, among : • more than α times the parent variance; • more than β*max(σ2); • more than Rγ*max(σ2), where R is a random number 0<R<1 • Prune that subtree

Execute the modified system

ABCDEF TEST ABCD CD AB EF C D A B E F

TEAK performance comparison

Conclusion • Higher variance “confuses” estimation • TEAK doesn’t consider variance only; TRAVERSE2 moves away from regions of higher variance and toward regions with similar features. • Augmenting nearest neighbor algorithms with variance avoidance does better than just applying nearest neighbor. • In ABE, a brute force method is exhaustive, but TEAK uses subtree pruning which reduces CPU consumption.

Future Work • Use easy path to measure feature weighting • Explore alternatives to GAC • Improve pruning policy by examining more datasets.

References • Menzies et. al. “Exploiting the Essential Assumptions of Analogy-based Effort Estimation”, IEEE Transactions on Software Engineering, Vol 38, No.2 . March-April 2012 • J.Wen et al. “Improve Analogy Based Software Estimation using Principle Components Analysis and Co-relation Weighting”. 16th Asia Pacific Software Engineering Conference, 2009. pp 179-186 • D. Baker, “A Hybrid Approach to Expert and Model-Based Effort Estimation,” master’s thesis, LCSEE, West Virginia Univ., http://bit.ly/hWDEfU, 2007 • Shepperd et al. “Effort Estimation using Analogy”. Proc. 18th Intl. Conf. on Software Engineering, 1996, pp-170-178 • D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log”. Proc. 6th ACM, SIGKDD Intl. Conf. Knowledge Discovery and Data Mining, 2000 , pp 407-416 • Li et al. “A Study of Project Selection and Feature Weighting for Analogy Based Software Cost Estimation”. J. Systems and Software, vol. 82, pp. 241-252, 2009.

Thank You Questions

Exploiting the Essential Assumptions of Analogy-based Effort Estimation

Exploiting the Essential Assumptions of Analogy-based Effort Estimation

Presentation Transcript

The analogy of the Arrow

Effort Estimation

The Power of Analogy

Software Effort Estimation

Based on False Assumptions

Increasing the accuracy of software development effort estimation

Symbolism of the Analogy

The deviance problem in effort estimation

Effort Estimation

EFFORT ESTIMATION RISK MANAGEMENT

Essential Elements of a State Rebalancing Effort

The Analogy of the Car

Effort Estimation

Effort Estimation

The role of assumptions

Estimation of Defects and Effort

Integrating Case-Based, Analogy-Based, and Parameter-Based Estimation via Agile COCOMO II

Effort Estimation

Effort Estimation Based on Collaborative Filtering

Transfer learning in effort estimation

Realism in Assessment of Effort Estimation Uncertainty :

Software Effort Estimation