When to Use Data from Other Projects for Effort Estimation

When to Use Data from Other Projects for Effort Estimation

引言 • 相关知识 • 实验方法 • 实验结果 • 讨论

引言

引言 • 应用情况：当一个项目的内部数据不足以用来做工作量估算时，使用其他项目的引进数据。 • 数据来源：PROMISE repository [软件工程中的预测模型] • 类似做法：软件缺陷检测 • 结果：表明假定在估算之前将相关过滤器应用到数据里，使用跨项目数据和使用项目内部的数据来做估算的估算精度差不多

相关知识

相关知识 • cross project/data • within project/data • 类比估算(analogy-based estimation methods,ABE0)： • 通过把新项目和过去类似项目做比较来对新项目进行准确的工作量估算。相似度计算：

相关知识 • ABE0: • 利用过去的项目建立训练集 • 训练集中包含自变量和因变量，其中自变量是值定义项目的特征，因变量是指工作量 • 决定使用多少个相似项目（analogies）来进行一个新实例的估算，即k-values • 对于每个新的测试实例，从训练集中找出k个相似项目 • 在选择相似项目的过程中，使用一种相似性度量 • 在计算相似性之前，给每个自变量设定一个权重 • 使用最相近的k个相似项目来估算工作量

相关知识 independent variables(自变量): the features that define projects dependent variables(因变量) : the recorded effort value

相关知识 • ABE0+Relevancy Filtering: • Step1 removes the training instances implicated in poor decisions; • Step 2 selects those instances nearest the test instance.

ABDEFGH AB DE FGH A B D E FG H F G 相关知识 • ABE0+Relevancy filtering:

ABDEFGH AB DE FGH A B D E FG H F G 相关知识 • ABE0+Relevancy filtering: The variance of the effort values in each sub-tree (the performance variance) is then recorded and normalized to a 0-1 interval.（将差异归一化） Step one prunes all sub-trees with a variance greater than 10% of the maximum variance seen in any tree.（剪枝）

FGH FG H F G 用方差作为决策准则（decision criterion） IF 当前树的差异>其子树的差异继续向下移动 ELSE 用当前树的实例作为相关实例，并用此子树做估算

实验方法

实验方法 数据源：Nasa93、 Cocomo81、 Desharnais

实验方法 曼-惠特尼U检验（Mann-Whitney test）：它假设两个样本分别来自除了总体均值以外完全相同的两个总体，目的是检验这两个总体的均值是否有显著的差别。

实验方法 For within experiments Leave-one-out method：在Xi的n个实例中选一个实例作为测试集，其他n-1个实例作为训练集。相关过滤分别应用在X1, X2和X3里，将训练集里面的中值作为测试实例的的工作量估算值。 For the cross experiments 选择X1, X2和X3中的其中之一作为测试集（test set），剩下的两个作为训练集（ cross dataset）。将相关过滤（relevancy filtering）应用在训练集中，将对测试集的估算记录下来。

：测试集 ：测试集：训练集：训练集 X1 X2 X3 Xi 实验方法 For within experiments： For the cross experiments：

实验方法 • Without Relevancy Filtering • 线性回归(linear regression)模型 • -Within experiment • -Cross experiment • With Relevancy Filtering • 重复预测20次 • -Within experiment • -Cross experiment

实验方法

实验结果

实验结果 Without Relevancy Filtering In the absence of relevancy filtering , the within datasets yield significantly lower MRE values in majority of cases

实验结果 With Relevancy Filtering 由图可以看出,至少75%的实验两种方法(within和cross)表现相当

实验结果 With Relevancy Filtering 由图可以看出,2/3的实验中两种方法(within和cross)表现相当；但是，用Coc81o做测试集时，within方法13次优于cross方法(原因不明)

实验结果 With Relevancy Filtering 由图可以看出,2/3的实验中两种方法(within和cross)表现相当；但是，用DesL3做测试集时，within方法16次优于cross方法(原因不明)

讨论

讨论图中表示每次选择的相似项目（analogies）的数目很小：均值为3

讨论 • It would also lead to • more accurate filtering techniques; • a better understanding of the structure of software projects including where to find data most relevant to some current project.

THE END Thank You~

When to Use Data from Other Projects for Effort Estimation

When to Use Data from Other Projects for Effort Estimation

Presentation Transcript

Effort Estimation

Data Mining Techniques for Software Effort Estimation: a Comparative Study

Use of Humidity data from MT and other platforms for Science projects on Monsoon Cloud systems

Software Effort Estimation

Channel Estimation from Data

Cost Estimation for Engineering Projects

Class Answers to Effort “Estimation” (Fall 2010)

Rainfall Estimation from Satellite Data

Effort Estimation

EFFORT ESTIMATION RISK MANAGEMENT

Other Projects

Use of Estimation Methods on finished projects .

Effort Estimation

Effort Estimation

How to Find Relevant Data for Effort Estimation ?

Estimation of Defects and Effort

Effort Estimation

Estimation – Software Projects

Transfer learning in effort estimation

Software Effort Estimation