1 / 30

Anareg week 12

Anareg week 12. Remedial Measures II Review regression diagnostics Remedial measures Validation. Regression Diagnostics Recommendations. Check normality of the residuals with a normal quantile plot

Download Presentation

Anareg week 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anareg week 12 Remedial Measures II Review regression diagnostics Remedial measures Validation

  2. Regression DiagnosticsRecommendations • Check normality of the residuals with a normal quantile plot • Plot the residuals versus Y(hat), versus each of the X’s and (where appropriate) versus time • Examine the partial regression plots • If there appears to be a pattern, generate the graphics version with a smooth

  3. Regression DiagnosticsRecommendations (2) • Examine • the studentized deleted residuals (RSTUDENT in the output) • The hat matrix diagonals • Dffits, Cook’s D, and the DFBETAS • Check observations that are extreme on these measures relative to the other observations

  4. Regression DiagnosticsRecommendations (3) • TOL = (1 – R2k) =1 / VIF • Examine the tolerance for each X • If there are variables with low tolerance, you need to do some model building • Recode variables • Variable selection

  5. Remedial measures Weighted least squares Ridge regression Robust regression Nonparametric regression Bootstrapping

  6. Weighted least squares Least squares problem is to minimize the the sum of wi times the squared residual for case i Computations are easy, use the weight statement in proc reg The problem is to determine the weight

  7. Weighted least squares (2) • bw = (X’WX)-1(X’WY) • where W is a diagonal matrix with the weights • Usually we require that the sum of the weights is n, but some software will automatically take care of this

  8. Determination of weights Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation Similarly for the squared residual and the variance Use grouped data or appriximately grouped data to estimate the variance

  9. Determination of weights (2) With a model for the standard deviation or the variance, we can approximate the optimal weights Optimal weights are proportional to the inverse of the variance

  10. NKNW Example NKNW p 421 Y is diastolic blood pressure X is age n = 54 healthy adult women aged 20 to 60 years old

  11. Procedures • Get the data and check it • Plot the relationship • Run the regression • Diagnostic residuals

  12. Diastolic bpvs age

  13. Regression output Run the regression, we get the output as follows: Source DF Value Pr > F Model 1 35.79 <.0001 Error 52 Total 53

  14. Regression output (2) Root MSE 8.14 R-Square 0.40 Adj R-Sq 0.39 Dep Mean 79.1 Coef Var 10.2

  15. Regression output (3) Par St Var DF Est Err t P Int 1 56.1 3.9 14.06 <.0001 age 1 0.58 .09 5.98 <.0001

  16. Diagnostic the residuals • Use the output data set to get the absolute and squared residuals. • And do the plots with a smooth; • Predict the standard deviation (absolute value of the residual)

  17. Residuals vs age

  18. Absolute value of the residuals vs age

  19. Squared residuals vs age

  20. Predict the standard deviation (absolute value of the residual) and compute the weight

  21. Output Regress diastolic on age with weight. And the output is : Source DF F P Model 1 56.64 <.0001 Error 52 Total 53

  22. Output (2) Root MSE 1.21302 R-Square 0.5214 Adj R-Sq 0.5122 Dependent Mean 73.55134 Coeff Var 1.64921

  23. Output (3) Par St Var Est Err t P Int 55.5 2.5 22.04 <.0001 age 0.59 0.07 7.53 <.0001

  24. Ridge regression If (X’X) is difficult to invert (near singular) then approximate by inverting (X’X+cI) plus additional terms in a series; c is a small positive constant Interesting but has not turned out to be a useful method in practice Ridge = c is an option for model statement

  25. Ridge regression (2) • Ridge estimator For OLS, the normal eq are given by : (X’X)b = X’Y Using standardized of each variables, we have the transformation regression model: The least squares normal eq are given by rXXb = rYX The ridge standardized regression estimators are obtained by introducing a biasing constant c > 0 into the least squares normal eq. (rXX + CI)bR = rYX

  26. Ridge regression (3) Solutions of the normal eq yields the ridge standardized regresiopn coeff bR = (rXX + cI)-1rYX Whre the constant c reflects the amounrt of bias in the estimators

  27. Robust regression • Basic idea is to have a procedure that is not sensitive to outliers • Alternatives to least squares, minimize • sum of absolute values of residuals • Median of the squares of residuals • Do weighted regression with weights based on residuals, and iterate

  28. Bootstrap Very important theoretical development that will have a major impact on applied statistics Based on simulation Sample with replacement from the data or residuals and get the distribution of the quantity of interest CI based on quantiles of the sampling distribution

  29. Model validation • Three approaches to checking the validity of the model • Collect new data, does it fit the model • Compare with theory, other data, simulation • Use some of the data for the basic analysis and some for validity check

  30. Last slide Read NKNW Chapter 11

More Related