1 / 19

Lab 11

Lab 11. Multiple Regression Residuals and Influence. Influence analysis. Standardized B weights. Residuals analysis. Multiple Regression Syntax. Proc Reg; Model dv = iv1 iv2 / stb R influence ; Plot dv*iv1; Plot dv*iv2; Plot dv*p.; Plot p.*r.; Run ;. Example.

starbuck
Download Presentation

Lab 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 11 Multiple Regression Residuals and Influence

  2. Influence analysis Standardized B weights Residuals analysis Multiple Regression Syntax Proc Reg; Model dv = iv1 iv2 / stb R influence; Plot dv*iv1; Plot dv*iv2; Plot dv*p.; Plot p.*r.; Run;

  3. Example Record company wants to know if 1) airplay 2) attractiveness of band and 3) advertising budget contribute significant variance to record sales?

  4. Example Program data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract / stb R influence; Plot sales*adverts; Plot sales*airplay; Plot sales*attract; Plot sales*p.; Plot p.*r.; Run;

  5. Graph of Sales and Adverts

  6. Graph Sales and Airplay

  7. Graph of Sales and Attractiveness

  8. Graph of Sales and predictors

  9. Graph of predictors and residuals

  10. Example Output Model: MODEL1 Dependent Variable: sales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Root MSE 46.84893 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6621 Coeff Var 24.24893 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -26.61294 12.20619 -2.18 0.0298 0 adverts 1 0.08488 0.00487 17.43 <.0001 0.51085 airplay 1 3.36742 0.19542 17.23 <.0001 0.51199 attract 1 11.08634 1.71509 6.46 <.0001 0.19168

  11. Output Residual Analysis(cont) Model: MODEL1 Dependent Variable: sales Output Statistics Dep Var Predicted Std Error Std Error Student Obs sales Value Mean Predict Residual Residual Residual 1 330.0000 229.9206 7.1963 100.0794 46.293 2.162 2 330.0000 229.9206 7.1963 100.0794 46.293 2.162 3 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330 4 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330 5 360.0000 291.5573 4.7662 68.4427 46.606 1.469 6 360.0000 291.5573 4.7662 68.4427 46.606 1.469 7 270.0000 262.9757 3.7127 7.0243 46.702 0.150 8 270.0000 262.9757 3.7127 7.0243 46.702 0.150 9 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124 10 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124 11 170.0000 141.0950 3.9487 28.9050 46.682 0.619 12 170.0000 141.0950 3.9487 28.9050 46.682 0.619

  12. What to look for in your outlier results • Studentized Residuals (takes into account that values of X further from the mean have larger standard errors) – want to identify values greater than 2. • Residuals – Look at values greater than 2 x root MSE, 2 x 47 = 94 • Once you identify the outliers, you want to investigate their influence on your results

  13. Influence Output Model: MODEL1 Dependent Variable: sales Output Statistics Cook's Hat Diag Cov Obs -2-1 0 1 2 D RStudent H Ratio DFFITS 1 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376 2 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376 3 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486 4 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486 5 | |** | 0.006 1.4707 0.0104 0.9987 0.1504 6 | |** | 0.006 1.4707 0.0104 0.9987 0.1504 7 | | | 0.000 0.1502 0.0063 1.0163 0.0119 8 | | | 0.000 0.1502 0.0063 1.0163 0.0119 9 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142 10 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142 11 | |* | 0.001 0.6187 0.0071 1.0135 0.0523 12 | |* | 0.001 0.6187 0.0071 1.0135 0.0523

  14. What to look for in your influence results • Leverage: an index of the importance of an observation for the regression equation. Function solely of X. • Denoted by Hat Diag (H). Want to look for values greater than 2(k+1)/N, where k = number of independent variables. Recall (k+1)/N is the average. • 2(3+1)/400 = .02

  15. Influence (cont) • Cook’s D – measure of overall influence of a single case on the model. Look for values greater than .2. • DFBETA and standardized DFBETA, change in regression values when that case is deleted. You can evaluate the influence on both X and Y. Look for values that are large relative to the other values or look for values greater than:

  16. Influence Output (cont) Model: MODEL1 Dependent Variable: sales Output Statistics -------------------DFBETAS------------------- Obs Intercept adverts airplay attract 1 -0.2177 -0.1672 0.1088 0.2438 2 -0.2177 -0.1672 0.1088 0.2438 3 0.0089 -0.0889 0.0066 -0.0131 4 0.0089 -0.0889 0.0066 -0.0131 5 -0.0267 0.1228 0.0327 -0.0038 6 -0.0267 0.1228 0.0327 -0.0038 7 -0.0018 0.0086 0.0024 0.0001 8 -0.0018 0.0086 0.0024 0.0001 9 -0.0060 0.0008 -0.0100 0.0095 10 -0.0060 0.0008 -0.0100 0.0095 11 0.0465 0.0016 -0.0147 -0.0362 12 0.0465 0.0016 -0.0147 -0.0362

  17. Delete outliers greater than 2 studentized residuals data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; if _n_ = 1 then delete; if _n_ = 2 then delete; if _n_ = 3 then delete; if _n_ = 4 then delete; ProcReg; Model sales = adverts airplay attract / stb R influence; Run;

  18. Output with 4 outliers deleted Model: MODEL1 Dependent Variable: sales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1719469 573156 272.58 <.0001 Error 392 824249 2102.67719 Corrected Total 395 2543718 Root MSE 45.85496 R-Square 0.6760 Dependent Mean 192.87879 Adj R-Sq 0.6735 Coeff Var 23.77398 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -21.41404 12.06981 -1.77 0.0768 0 adverts 1 0.08741 0.00480 18.20 <.0001 0.52814 airplay 1 3.32150 0.19177 17.32 <.0001 0.50771 attract 1 10.27947 1.70029 6.05 <.0001 0.17695

  19. In class example The dataset “Data11” contains the following 10 variables: id na typeAas typeAii errorC learn errorS errorCmm think mngmt Use PROC REG to regress errorC (error competence, higher more competent) on learn (learning from error, higher means quicker you learn from errors),think (thinking about errors, more time you put into thinking about your errors) and mngmt (Management’s orientation toward errors, higher values mean there are more consequences when errors are made). Which variables are significant predictors of Error competence. How many outliers (studentized residuals greater than 2) did you identify? Delete the outliers. Are there change in the significance of the Beta weights or R-square?

More Related