CORRELATION ANALYSIS FOR USCM8 CERS

2. Outline Objectives Background Tecolote�s position Future study items noted in Mr. Covert�s paper (see Reference 1) How to apply the correlation formula Ground Rules for Developing USCM7 CERs Using CER Data Points to Compute Pearson�s r Multiplicative Error Model (MUPE) and Error Forms Pearson�s Correlation Coefficient Definition and example Property Revisited High Correlation Items from Reference 1 USCM8 Sample Correlation Coefficients Conclusions

3. Objectives Derive correlations between the USCM CER uncertainties using an analytic method Note: Correlation matters in cost risk analysis as correlation impacts uncertainty.Note: Correlation matters in cost risk analysis as correlation impacts uncertainty.

4. Tecolote�s Position Cost correlation is not the same as �CER noise correlation� With CERs as cost estimating methodologies, most of the correlations are captured through the functional relationships specified in the WBS Do any correlations exist for the remaining noise terms? �Cost correlation� is not the same as �noise correlation� when CERs are considered. Strong correlations between cost elements in a database should not be mistaken as evidence that residuals or percentage errors of our estimating methodologies derived from the same database are correlated. �Cost correlation� is not the same as �noise correlation� when CERs are considered. Strong correlations between cost elements in a database should not be mistaken as evidence that residuals or percentage errors of our estimating methodologies derived from the same database are correlated.

5. Future Study Items Noted in Reference 1 High correlation coefficients between USCM7 CER uncertainties in �Correlation Coefficients for Spacecraft Subsystems from the USCM7 Database� Note: These correlation numbers seemed extraordinarily high, especially those approaching one, such as 0.98 and 0.97. We wondered if there were any good engineering reasons to believe that the remaining noise for the apogee kick motor (AKM) T1 CER was almost perfectly correlated with the noise for the attitude determination and control system (ADCS) nonrecurring CER. Similarly, could we conclude the existence of high correlation for the remaining uncertainties between program-level and communication nonrecurring costs? Note: These correlation numbers seemed extraordinarily high, especially those approaching one, such as 0.98 and 0.97. We wondered if there were any good engineering reasons to believe that the remaining noise for the apogee kick motor (AKM) T1 CER was almost perfectly correlated with the noise for the attitude determination and control system (ADCS) nonrecurring CER. Similarly, could we conclude the existence of high correlation for the remaining uncertainties between program-level and communication nonrecurring costs?

6. How should we apply the correlation formula to the data points? Reference 1 used 26 satellites from the entire USCM7 database to compute correlation coefficients for USCM7 CERs �Outliers� not eliminated Population not homogeneous We should not use the entire database to compute correlation coefficients Data point selection Error form consideration

7. Ground Rules for Developing USCM7 CERs ATSF deleted due to incomplete cost data Programs with no costs identified were not used AE, CRRES, P78-1, P78-2, P72-2, OSO, S3, DMSP 5-D1, DMSP 5-D2, and DMSP 5-D3 did not have a communication payload DSCS, DMSP, DSP, AE, OSO, and SMS did not have an AKM GPS 9-11 and CRRES AKMs were GFE� Follow-on production programs: DSCS 4-7, DSCS 8-14, DMSP 5-D2, DSP 5-12, DSP 18-22, FLTSATCOM 6-8, GPS 9-11, and GPS 13-40 not used in the nonrecurring CERs DSCS A (a development program) not used in the T1 CER Data points displaying program peculiarity were not used in subsystem CER development Note: The Combined Release and Radiation Effects Satellite (CRRES) was deleted from the TT&C nonrecurring cost CER because the costs did not represent a full design effort. Note: The Combined Release and Radiation Effects Satellite (CRRES) was deleted from the TT&C nonrecurring cost CER because the costs did not represent a full design effort.

8. Ground Rules for Developing USCM7 CERs (2) P78-1, P78-2, P72-2, and S3 were identified as Space Test Programs (STPs) A smaller physical size, maximum reuse of existing HW Shorter design life (6 �18 months) Not a full-up design effort for nonrecurring Not a full-up manufacturing effort for recurring AE, OSO, and CRRES were considered experimental satellites Developed a separate CER for estimating STPs and experimental programs if appropriate Using primary equation to predict STPs would be incorrect Note: We have tried to use dummy variables to include STPs and experimental programs in primary equations, if suitable. Note: We have tried to use dummy variables to include STPs and experimental programs in primary equations, if suitable.

9. Using CER Data Points to ComputePearson�s r Even worse: calculate the corresponding correlation coefficient when using primary equation to predict STPs If a satellite doesn�t have a particular subsystem, do not include it in computing the correlation coefficient for the corresponding subsystem-level CER Percentage errors could be 100% using any CER Do not use data points with program peculiarity to compute Pearson�s r if they are excluded from the CER Refit the CER with previously excluded outliers if necessary Homogeneous data set is essential Note: Using primary equation to predict STPs would give inaccurate and misleading results if STPs are not included.Note: Using primary equation to predict STPs would give inaccurate and misleading results if STPs are not included.

10. Multiplicative Error Model � MUPE Definition for cost variation: Y = f(X)*e where E(e ) = 1 and V(e ) = s 2 Error in cost is proportional to cost.Error in cost is proportional to cost.

11. Candidate Error Forms MUPE models use percentage errors: Note: Residuals are weighted by the reciprocal of the predicted value Additive models use residuals:

12. Pearson�s Correlation Coefficient Pearson�s correlation coefficient measures the linear association between two sets of pairs {xi} and {yi} {xi} and {yi} are the paired percentage errors for multiplicative models {xi} and {yi} are the paired residuals for additive models should both be zero

13. Reference 1: Deriving Correlation Coefficients Usually don�t know the true value of rxy, so approximate it by sample correlation rxy Example calculation using randomly generated numbers Note: Both the means of xi�s and yi�s are not zero. This is a warning flag to indicate that there is a mismatch between the CERs and their error terms.Note: Both the means of xi�s and yi�s are not zero. This is a warning flag to indicate that there is a mismatch between the CERs and their error terms.

14. Pearson�s r Preserved through Linear Transformation Given the following: T = X + Y X = f(W)* e Y = g(W)* ? (Note: f and g are USCM7 weight-based CERs, e and h are error terms) The correlation between X and Y is the same as the correlation between e and h, i.e., Total cost variance at a given weight, wt, is given by We should consider the correlations between percentage errors instead of residuals Note: If a total project T is composed of two elements, X and Y, which are hypothesized by the USCM7 weight-based CERs, f and g, respectively: T = X + Y, X = f(W)* e, and Y = g(W)* ?Note: If a total project T is composed of two elements, X and Y, which are hypothesized by the USCM7 weight-based CERs, f and g, respectively: T = X + Y, X = f(W)* e, and Y = g(W)* ?

15. Pearson�s r Preserved Through Linear Transformation (2) General total cost variance: Where: sk, sm, and rkm are the standard deviations of the noise terms for the WBS elements k and m, respectively, and the correlation between them. fk and fm are the CER estimated values for the WBS elements k and m, respectively.

16. Revisited High Correlation Items in Previous Study High correlation coefficients listed in Reference 1 not found with the revised approach Note: The correlation of 0.8 between the EPS NR and COMM NR CER noise terms is not significant as the sample size is only 6. The corresponding 95% CI for r is from �0.033 to 0.977. CI for z� is 0.5*ln[(1+r)/(1-r)] + (za/2)(sz')Note: The correlation of 0.8 between the EPS NR and COMM NR CER noise terms is not significant as the sample size is only 6. The corresponding 95% CI for r is from �0.033 to 0.977. CI for z� is 0.5*ln[(1+r)/(1-r)] + (za/2)(sz')

17. USCM8 Sample Correlation Coefficients Range: (-0.925,0.913), Mean = 0.04, Median = 0.02, Skew = - 0.02 1st quartile = -0.32, 3rd quartile = 0.44, sd = 0.44 73% of the correlation coefficients are from �0.5 to 0.5 Three sample correlations with absolute values > 0.85: 0.90, 0.91, -0.93 The sample correlation coefficients range from -0.925 to 0.913 with an average of 0.04, median of 0.02, and standard deviation of 0.44. There are only three sample correlations with absolute values greater than 0.85. They are 0.90, 0.91, and -0.93 (shown in red on the backup chart). The sample correlation of 0.9 is significant, but the other two numbers are not, due to the sample size. The shape of the histogram is very different from the one listed in Reference 1. See the graph on next page for comparison. The sample correlation coefficients range from -0.925 to 0.913 with an average of 0.04, median of 0.02, and standard deviation of 0.44. There are only three sample correlations with absolute values greater than 0.85. They are 0.90, 0.91, and -0.93 (shown in red on the backup chart). The sample correlation of 0.9 is significant, but the other two numbers are not, due to the sample size. The shape of the histogram is very different from the one listed in Reference 1. See the graph on next page for comparison.

18. Reference 1: USCM7 Correlation Coefficients This graph is from Mr. Covert�s correlation analysis paper.This graph is from Mr. Covert�s correlation analysis paper.

19. Correlations between Structure/Thermal and SEPM Nonrecurring CERs For non-communication satellites: 0.90 For communication satellites: -0.54 For all satellites: 0.73 This result indicates that the noise of the SEPM nonrecurring CER for non-communication satellites might be correlated with the noise of the combined structure and thermal nonrecurring CER. The data points in this category are STPs and experimental programs. Another interesting point is that the SEPM noise term is moderately correlated with the combined structure and thermal nonrecurring CER noise term for communication satellites, with a negative correlation of -0.54. But the overall sample correlation coefficient is 0.73 if communication and non-communication satellites are combined.This result indicates that the noise of the SEPM nonrecurring CER for non-communication satellites might be correlated with the noise of the combined structure and thermal nonrecurring CER. The data points in this category are STPs and experimental programs. Another interesting point is that the SEPM noise term is moderately correlated with the combined structure and thermal nonrecurring CER noise term for communication satellites, with a negative correlation of -0.54. But the overall sample correlation coefficient is 0.73 if communication and non-communication satellites are combined.

20. Conclusions Sample correlation coefficient is sensitive to the computing method Use CER data points to compute Pearson�s r to avoid heteroscedasticity In cost risk analysis, consider the correlations between percentage errors instead of residuals for multiplicative CERs and residuals instead of percentage errors for additive CERs Means of the errors should be zero when computing Pearson�s r With the revised approach, high correlations from previous study for USCM7 CERs are not found We have found no discernible sample correlations for the USCM8 subsystem-level CERs using the revised method: Mean = 0.04, Median = 0.02, Skew = -0.02 73% of them are between -0.5 and 0.5. Three sample correlations with absolute values greater than 0.85: 0.90, 0.91, and -0.93 ( 0.9 is significant, but not the other two) Cost correlation is not the same as �CER noise correlation.� Use this analytic method as a cross-check Suggestion: Use this analytic method as a cross-check to see (1) if CERs are developed properly and (2) if we need to check with the program office about the development process for certain cost elements. Suggestion: Use this analytic method as a cross-check to see (1) if CERs are developed properly and (2) if we need to check with the program office about the development process for certain cost elements.

21. References Covert, Raymond P., "Correlation Coefficients for Spacecraft Subsystems from the USCM7 Database," Third Joint Annual ISPA/SCEA International Conference, Vienna, VA, 12-15 June 2001.� Garvey, Paul R, "Do Not Use Rank Correlation in Cost Risk Analysis," 32nd Annual DoD Cost Analysis Symposium, Williamsburg, VA, 2-5 February 1999.� Nguyen, P., et al., �Unmanned Spacecraft Cost Model, Seventh Edition,� U.S. Air Force Space and Missile Systems Center (SMC/FMC), Los Angeles AFB, CA, August 1994.� Nguyen, P., et al., �Unmanned Spacecraft Cost Model, Eighth Edition,� U.S. Air Force Space and Missile Systems Center (SMC/FMC), Los Angeles AFB, CA, October 2001.� Tecolote Research, Inc., �RI$K in ACE User�s Manual,� GM 075, August 1999.�

23. USCM8 Sample Correlation Coefficients Note: The above table contains correlation coefficients for the uncertainties between USCM8 Subsystem-Level MUPE CERs. PGM_T1C denotes the SEPM T1 CER for communication satellites, while PGM_T1NC is the SEPM T1 CER for non-communication satellites. The noise correlation between the spacecraft nonrecurring cost and the SEPM nonrecurring cost for non-communication satellites is �0.23, which is not displayed in the above table. There are only three sample correlations with absolute values greater than 0.85. They are 0.90, 0.91, and -0.93 (shown in red in Table above). The sample correlation of 0.9 is significant, but the other two numbers are not, due to the sample size. Note: The above table contains correlation coefficients for the uncertainties between USCM8 Subsystem-Level MUPE CERs. PGM_T1C denotes the SEPM T1 CER for communication satellites, while PGM_T1NC is the SEPM T1 CER for non-communication satellites. The noise correlation between the spacecraft nonrecurring cost and the SEPM nonrecurring cost for non-communication satellites is �0.23, which is not displayed in the above table. There are only three sample correlations with absolute values greater than 0.85. They are 0.90, 0.91, and -0.93 (shown in red in Table above). The sample correlation of 0.9 is significant, but the other two numbers are not, due to the sample size.

CORRELATION ANALYSIS FOR USCM8 CERS

CORRELATION ANALYSIS FOR USCM8 CERS

Presentation Transcript

Correlation Analysis

Canonical Correlation Analysis for Feature Reduction

CORRELATION ANALYSIS FOR USCM8 CERS

Correlation and regression analysis

DCA: Differential Correlation Analysis

CERS “Local” Tab

Correlation Analysis

Correlation Analysis

Portfolio Correlation Analysis Tool

Correlation and Spectral Analysis

Correlation and spectral analysis

Kernel Canonical Correlation Analysis

Statistical Analysis Regression - Correlation

Correlation and Regression Analysis

CORRELATION ANALYSIS

Canonical Correlation Analysis (CCA)

Correlation analysis

Correlation Analysis

Correlation Analysis

Canonical Correlation Analysis

Correlation and spectral analysis