1 / 19

Exploiting the Essential Assumptions of Analogy-based Effort Estimation

Exploiting the Essential Assumptions of Analogy-based Effort Estimation. Syed Shah A Zaman (6705130) Shakil Mahmud(7015384) Submitted to Professor Shervin Shirmohammadi in partial fulfillment of the requirements for the course ELG 5100. Roadmap. Effort estimation and its importance

signa
Download Presentation

Exploiting the Essential Assumptions of Analogy-based Effort Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting the Essential Assumptions of Analogy-based Effort Estimation Syed Shah A Zaman (6705130) Shakil Mahmud(7015384) Submitted to Professor ShervinShirmohammadi in partial fulfillment of the requirements for the course ELG 5100

  2. Roadmap • Effort estimation and its importance • Different methods of effort estimation • Analogy based effort estimation • TEAK and its steps • Conclusion & Future Work

  3. Effort estimation “Software development efforts estimation is the process of predicting the most realistic use of effort required to develop or maintain software ” • Importance of effort estimation: • Tracking velocity • Iteration scope • Prioritizing • Release planning

  4. Effort estimation methods • Three subcategories: • Human Centric Techniques (e.g. Expert judgment) • Algorithmic models (e.g. COCOMO) • Machine learning (e.g. Analogy based estimation or ABE)

  5. Analogy Based Effort Estimation “projects that are similar with respect to project features will be also similar with respect to project effort” • Five basic steps: • Select the historical project dataset • Choose the project features for similarity measurement • Measure the similarities • Identify the most similar projects • Adopt the efforts of similar projects to generate the effort estimation

  6. ABE0 or “Baseline” ABE: Basic features • Table containing all the training projects • Each row-one project • Each column- independent and dependent variables or features of each project (e.g. duration, effort) choice of variables is flexible • Input: test project • Output: estimate for that project • Scaling measure is used to maintain same degree of feature in test and training projects • Feature weighting is used to reflect the influence of the features

  7. Measuring Similarity “Measuring the closeness between two data objects in n-dimensional feature space” • Objective: Rank the similar cases from the dataset and utilize k nearest cases. • Most common method: Euclidean distance metric • Example: Two points X(x1, x2, x3…..) and Y(y1 ,y2, y3 ……) • Unweighted Euclidian distance: • Weighted Euclidian distance:

  8. TEAK: Test Estimation Assumption Knowledge • It uses an easy path heuristic which finds the situations that confuse estimation and removes those

  9. Select prediction system • There are many prediction systems • Authors have chosen ABE, because : • It is widely studied. • It works even if the domain data are sparse • Unlike other predictors, it makes no assumptions about data distributions or an underlying model. • When the local data do not support standard Algorithmic/parametric models like COCOMO, ABE can still be applied.

  10. Identify essential assumption(s) • Assumption one: locality implies homogeneity • If two projects are closer, it would mean they are more similar • Avoiding confuse estimation • Not choosing a project that has higher variances Variance defined as:

  11. Identify assumption violations • We need to find a way to compare the variance between small and large k estimates, so, clustering is needed , and a GAC (Greedy Agglomerative Clustering) tree is formed • Using the GAC tree, finding the k-nearest neighbors in project data can be implemented using the following procedure called TRAVERSE: • 1. Place the test project at the root of the tree. • 2. Move the test project to the nearest child (where “nearest” is defined by Eulicidean). • 3. Go to step 2. • In the tree, check where going to a child increases variance, that is identified as a violation.

  12. Remove violations • After identifying the violated node, check if it satisfies the selected pruning policy, among : • more than α times the parent variance; • more than β*max(σ2); • more than Rγ*max(σ2), where R is a random number 0<R<1 • Prune that subtree

  13. Execute the modified system

  14. ABCDEF TEST ABCD CD AB EF C D A B E F

  15. TEAK performance comparison

  16. Conclusion • Higher variance “confuses” estimation • TEAK doesn’t consider variance only; TRAVERSE2 moves away from regions of higher variance and toward regions with similar features. • Augmenting nearest neighbor algorithms with variance avoidance does better than just applying nearest neighbor. • In ABE, a brute force method is exhaustive, but TEAK uses subtree pruning which reduces CPU consumption.

  17. Future Work • Use easy path to measure feature weighting • Explore alternatives to GAC • Improve pruning policy by examining more datasets.

  18. References • Menzies et. al. “Exploiting the Essential Assumptions of Analogy-based Effort Estimation”, IEEE Transactions on Software Engineering, Vol 38, No.2 . March-April 2012 • J.Wen et al. “Improve Analogy Based Software Estimation using Principle Components Analysis and Co-relation Weighting”. 16th Asia Pacific Software Engineering Conference, 2009. pp 179-186 • D. Baker, “A Hybrid Approach to Expert and Model-Based Effort Estimation,” master’s thesis, LCSEE, West Virginia Univ., http://bit.ly/hWDEfU, 2007 • Shepperd et al. “Effort Estimation using Analogy”. Proc. 18th Intl. Conf. on Software Engineering, 1996, pp-170-178 • D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log”. Proc. 6th ACM, SIGKDD Intl. Conf. Knowledge Discovery and Data Mining, 2000 , pp 407-416 • Li et al. “A Study of Project Selection and Feature Weighting for Analogy Based Software Cost Estimation”. J. Systems and Software, vol. 82, pp. 241-252, 2009.

  19. Thank You Questions

More Related