1 / 13

Multiple Linear Regression

Multiple Linear Regression. Regression Diagnostics. Find Scores That. Contribute to violation of assumptions. Are suspect because they are far removed from the centroid (multidimensional mean) Have undue influence on the solution. Outliers Among the Predictors.

creola
Download Presentation

Multiple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Linear Regression Regression Diagnostics

  2. Find Scores That • Contribute to violation of assumptions. • Are suspect because they are far removed from the centroid (multidimensional mean) • Have undue influence on the solution.

  3. Outliers Among the Predictors • Leverage, hi or Hat Diagonal • The larger this statistic, the greater the distance between the data point and the centroid in p-dimensional space. • Investigate cases with hi greater than2(p-1)/N. • p is the number of parameters in the model, including the intercept.

  4. Distance from the Regression Surface • Standardized Residual (aka Studentized Residual) • Difference between actual Y and predicted Y divided by an appropriate standard error • Rstudent (aka Studentized Deleted Residual) – same except for each case the regression surface is that obtained when this individual case is removed. • Investigate if greater than 2.

  5. Influence on the Solution • Cook’s D – how much would the regression surface change if this case were removed • Investigate cases with D > 1. • Dfbetas – how much would one parameter (slope or intercept) change if this case were removed • Investigate cases with values > 2.

  6. Simple Example • Y = sperm count • X1 = % time recently spent with mate • X2 = time since last ejaculation

  7. Leverage • Investigate cases with values greater than 2(3)/11 = .55. • Case 7 is close to this cutoff. • It is a univariate outlier on the time together variable. • Further investigation indicates the case is valid, so we retain it.

  8. Residuals • Case 11 has large residuals, it should be investigated. • Notice that Rstudent is much larger than the standardized residual • This indicates that removing this case has a large effect on the solution.

  9. Influence • Case 11 has a high value of Cook’s D. • It has a high Dfbeta for the time since last ejaculation predictor, even after I transformed that variable to reduce skewness. • Upon investigation, it was found that this subject did not follow the instructions for gathering the data. His scores were deleted.

  10. Plots of Residuals • These can also be useful, but • It takes some practice to get good at detecting problems from such plots • Plot the residual versus predicted Y

  11. Heteroscedasticity

  12. Trying Squaring One Predictor

  13. Residuals not Normal and Variance not Constant

More Related