1 / 14

Analysis & Evaluation of Data

Analysis & Evaluation of Data. The collected data should be Reliable none or very little error is committed in the gathering and tabulation of data Accurate maintain the desired degree of precision Valid the data is applicable to the issue and attribute of interest. Sample Consideration.

brigit
Download Presentation

Analysis & Evaluation of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis & Evaluation of Data • The collected data should be • Reliable • none or very little error is committed in the gathering and tabulation of data • Accurate • maintain the desired degree of precision • Valid • the data is applicable to the issue and attribute of interest

  2. Sample Consideration • We have collected “error data” on Requirements Inspection, Design Inspection and Unit Testing and want to analyze them for“quality” attribute • Potential Reliability problem? • Did we collect and count the data correctly in all three cases • Potential Accuracy problem ? • Did we use the same level of precision (e.g. same level of severity breakdown) • Potential Validity problem ? • Is number of “defect” a valid quality attribute • Do these data reflect a measure of the extent of “defects” committed (extent = number, severity, complexity of fix, etc. ?)

  3. Some Common Analysis Methods of Data • Distribution of Data • Centrality and Dispersion • Moving Averages • Data Correlation • Normalization of Data

  4. 1. Distribution of Data • We often look at a scatter diagram of the raw data and pick out the “outliers” • We count the frequency of occurrences and get a distribution to get a view of the “shape” of the distribution and the “range” of distribution. • severity 1 : 7 defects • severity 2 : 24 defects • severity 3 : 26 defects • severity 4 : 88 defects • severity 5 : 92 defects • Rangeis from 7 defects to 92 defects • Shapeis not that important in this case, • the skew is towards the less severe defects

  5. Common Distributions of Data • There are some “recognizable” distributions * * * * * * * * * * * * Normal Linear * * * * * * * * * * * * * Logarithmic Exponential Negative Exponential

  6. 2. Centrality and Dispersion • Use centrality to compare two sets of data distribution • mean • median median value mean value median value median value mean value Mean value

  7. Variance & Standard Deviation • A measure of dispersion from the central value (see below) • we measured number of defects (xi) from n similar sized functional areas • the mean or central value is calculated : Xmean = ∑(xi) / n • the variance = [ ∑ ( (Xi – Xmean )**2 ) ] / n • Std Dev. = SQRT (variance) • For Normal Distribution, 1 Std captures about 68% of the sample. • Given a new function of similar size, we can measure the number of defects found and compare against the mean of the earlier group and the 1 std deviation.

  8. Control Chart * 1 Std Dev. * * * * Mean = 5.3 * * 1 Std Dev. *

  9. 3. Moving Average - a “Smoothing” Technique Jump smoothed Jump smoothed Special jump

  10. 4. Correlation • Only addresses whether there is a “relationship” • Does not address “cause and effect” • Example : • size of the module may correlateto number of defects • but size of the module may or may not be the cause

  11. Linear Relationship Y Linear equation of the form Y= a+ bX where: - ‘b’ is the slope and - ‘a’ is the y intercept * * * * * * * * * X

  12. Least Square Linear Regression • A method of estimating the linear relationship of Y variables with the X variables in the following form by minimizing the distance of Y coordinates from the linear line to get Y = a+bX. • We can estimate the parameters a, b as follows: • b = [ ∑(XY) - (1/n)(∑X)(∑Y)]/ [∑(X**2) - (1/n)(∑X)**2] • this b estimate gives the same value as the one shown in the book • a = Yave - (b*Xave) • where X is each of the X observation and Xave is the average of X’s

  13. Least Square Linear Regression - Example • (size,defects) : (150,2); (230,3);(500,4);(730,7);(1000,9) • Xs: 150, 230, 500, 730, 1000; ∑(Xs) = 2610 • X**2: 22,500; 52,900; 250,000; 532,900; 1,000,000 and ∑(X**2) = 1,858,300 • Ys: 2, 3, 4, 7, 9; ∑(Y) = 25 • XY = 300, 690, 2000, 5110, 9000; ∑(XY) = 17,100 • b = [17100-(1/5)(2610)(25)]/[1858300 -(1/5)((2610)**2)] • = 4050/495880 = .0081 • a = 25/5 - (.0081)(2610/5) = 5 - 4.23 ≡.77 • Least Square Regression line is : Y= .77 + .0081 X Let’s plug in x= 150 and see what we get. .0081 * (150) + .77 = 1.22 + .77 = 1.99 (close!) More accurate for interpolation than extrapolation.

  14. 5. Normalization • Pure data gives 1-dimensional comparison • program A : 52 person days to complete • program B : 33 person days to complete • program C : 64 person days to complete • 64 > 52 > 33 what else can we say ? (suspect different sizes of programs) • Normalization gives an equalizing factor in terms of another attribute. • 52 person days : 5000 loc or 96.1 loc / person day • 33 person days : 3000 loc or 90.9 loc / person day • 64 person days : 6000 loc or 93.7 loc / person day

More Related