1 / 44

Model selection and model building

Model selection and model building. Model selection. Selection of predictor variables. Statement of problem. A common problem is that there is a large set of candidate predictor variables.

Download Presentation

Model selection and model building

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model selectionand model building

  2. Model selection Selection of predictor variables

  3. Statement of problem • A common problem is that there is a large set of candidate predictor variables. • Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.

  4. Example: Cement data • Response y: heat evolved in calories during hardening of cement on a per gram basis • Predictor x1: % of tricalcium aluminate • Predictor x2: % of tricalcium silicate • Predictor x3: % of tetracalcium alumino ferrite • Predictor x4: % of dicalcium silicate

  5. Example: Cement data

  6. Two basic methods of selecting predictors • Stepwise regression: Enter and remove variables, in a stepwise manner, until no justifiable reason to enter or remove more. • Best subsets regression: Select the subset of variables that do the best at meeting some well-defined objective criterion.

  7. Stepwise regression: the idea • Start with no predictors in the model. • At each step, enter or remove a variable based on partial F-tests. • Stop when no more variables can be justifiably entered or removed.

  8. Stepwise regression: the steps • Specify an Alpha-to-Enter (0.15) and an Alpha-to-Remove (0.15). • Start with no predictors in the model. • Put the predictor with the smallest P-value based on the partial F statistic (a t-statistic) in the model. If P-value > 0.15, then stop. None of the predictors have good predictive ability. Otherwise …

  9. Stepwise regression: the steps • Add the predictor with the smallest P-value (below 0.15) based on the partial F-statistic (a t-statistic) in the model. If none of the predictors yield P-values < 0.15, stop. • If P-value > 0.15 for any of the partial F statistics, then remove violating predictor. • Continue above two steps, until no more predictors can be entered or removed.

  10. Example: Cement data Predictor Coef SE Coef T P Constant 81.479 4.927 16.54 0.000 x1 1.8687 0.5264 3.55 0.005 Predictor Coef SE Coef T P Constant 57.424 8.491 6.76 0.000 x2 0.7891 0.1684 4.69 0.001 Predictor Coef SE Coef T P Constant 110.203 7.948 13.87 0.000 x3 -1.2558 0.5984 -2.10 0.060 Predictor Coef SE Coef T P Constant 117.568 5.262 22.34 0.000 x4 -0.7382 0.1546 -4.77 0.001

  11. Predictor Coef SE Coef T P Constant 103.097 2.124 48.54 0.000 x4 -0.61395 0.04864 -12.62 0.000 x1 1.4400 0.1384 10.40 0.000 Predictor Coef SE Coef T P Constant 94.16 56.63 1.66 0.127 x4 -0.4569 0.6960 -0.66 0.526 x2 0.3109 0.7486 0.42 0.687 Predictor Coef SE Coef T P Constant 131.282 3.275 40.09 0.000 x4 -0.72460 0.07233 -10.02 0.000 x3 -1.1999 0.1890 -6.35 0.000

  12. Predictor Coef SE Coef T P Constant 71.65 14.14 5.07 0.001 x4 -0.2365 0.1733 -1.37 0.205 x1 1.4519 0.1170 12.41 0.000 x2 0.4161 0.1856 2.24 0.052 Predictor Coef SE Coef T P Constant 111.684 4.562 24.48 0.000 x4 -0.64280 0.04454 -14.43 0.000 x1 1.0519 0.2237 4.70 0.001 x3 -0.4100 0.1992 -2.06 0.070

  13. Predictor Coef SE Coef T P Constant 52.577 2.286 23.00 0.000 x1 1.4683 0.1213 12.10 0.000 x2 0.66225 0.04585 14.44 0.000

  14. Predictor Coef SE Coef T P Constant 71.65 14.14 5.07 0.001 x1 1.4519 0.1170 12.41 0.000 x2 0.4161 0.1856 2.24 0.052 x4 -0.2365 0.1733 -1.37 0.205 Predictor Coef SE Coef T P Constant 48.194 3.913 12.32 0.000 x1 1.6959 0.2046 8.29 0.000 x2 0.65691 0.04423 14.85 0.000 x3 0.2500 0.1847 1.35 0.209

  15. Predictor Coef SE Coef T P Constant 52.577 2.286 23.00 0.000 x1 1.4683 0.1213 12.10 0.000 x2 0.66225 0.04585 14.44 0.000

  16. Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13 Step 1 2 3 4 Constant 117.57 103.10 71.65 52.58 x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205 x1 1.44 1.45 1.47 T-Value 10.40 12.41 12.10 P-Value 0.000 0.000 0.000 x2 0.416 0.662 T-Value 2.24 14.44 P-Value 0.052 0.000 S 8.96 2.73 2.31 2.41 R-Sq 67.45 97.25 98.23 97.87 R-Sq(adj) 64.50 96.70 97.64 97.44 C-p 138.7 5.5 3.0 2.7

  17. Drawbacks of stepwise regression • The final model is not guaranteed to be optimal in any specified sense. • The procedure yields a single final model, although in practice there are often several almost equally good models.

  18. Best subsets regression • If there are p-1 possible predictors, then there are 2p-1possible regression models containing the predictors. • For example, 10 predictors yields 210 = 1024 possible regression models. • A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.

  19. What is used to judge “best”? • R-square • Adjusted R-square • MSE (or S = square root of MSE) • Mallow’s Cp

  20. R-squared Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.

  21. Adjusted R-squared or MSE Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information. Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.

  22. Mallow’s Cp statistic: is an estimator of total standardized mean square error of prediction: which equals: Mallow’s Cp criterion

  23. Using the Cp criterion • Subsets with small Cpvalues have a small total (standardized) mean square error of prediction. • When the Cp value is also near p, the bias of the regression model is small.

  24. Using the Cp criterion (cont’d) • So, identify subsets of predictors for which: • the Cp value is smallest, and • the Cp value is near p (if possible) • Note though that for the full model, Cp= p. So, the fullest model is always judged to be a good candidate model.

  25. Best Subsets Regression: y versus x1, x2, x3, x4 Response is y x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4 1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

  26. Example: Modeling PIQ

  27. Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38 Step 1 2 Constant 4.652 111.276 MRI 1.18 2.06 T-Value 2.45 3.77 P-Value 0.019 0.001 Height -2.73 T-Value -2.75 P-Value 0.009 S 21.2 19.5 R-Sq 14.27 29.49 R-Sq(adj) 11.89 25.46 C-p 7.3 2.0

  28. Best Subsets Regression: PIQ versus MRI, Height, Weight Response is PIQ H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t 1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X

  29. The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 MRI 2.0606 0.5466 3.77 0.001 Height -2.7299 0.9932 -2.75 0.009 S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5% Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6

  30. Example: Modeling BP

  31. Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20 Step 1 2 3 Constant 2.205 -16.579 -13.667 Weight 1.201 1.033 0.906 T-Value 12.92 33.15 18.49 P-Value 0.000 0.000 0.000 Age 0.708 0.702 T-Value 13.23 15.96 P-Value 0.000 0.000 BSA 4.6 T-Value 3.04 P-Value 0.008 S 1.74 0.533 0.437 R-Sq 90.26 99.14 99.45 R-Sq(adj) 89.72 99.04 99.35 C-p 312.8 15.1 6.4

  32. Best Subsets Regression: BP versus Age, Weight, ... Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s 1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X

  33. The regression equation is BP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA Predictor Coef SE Coef T P Constant -13.667 2.647 -5.16 0.000 Age 0.70162 0.04396 15.96 0.000 Weight 0.90582 0.04899 18.49 0.000 BSA 4.627 1.521 3.04 0.008 S = 0.4370 R-Sq = 99.5% R-Sq(adj) = 99.4% Analysis of Variance Source DF SS MS F P Regression 3 556.94 185.65 971.93 0.000 Error 16 3.06 0.19 Total 19 560.00 Source DF Seq SS Age 1 243.27 Weight 1 311.91 BSA 1 1.77

  34. Stepwise regression in Minitab • Stat >> Regression >> Stepwise … • Specify response and all possible predictors. • If desired, specify predictors that must be included in every model. • Select OK. Results appear in session window.

  35. Best subsets regression • Stat >> Regression >> Best subsets … • Specify response and all possible predictors. • If desired, specify predictors that must be included in every model. • Select OK. Results appear in session window.

  36. Model building strategy

  37. The first step • Decide on the type of model needed • Predictive: model used to predict the response variable from a chosen set of predictors. • Theoretical: model based on theoretical relationship between response and predictors. • Control: model used to control a response variable by manipulating predictor variables.

  38. The first step (cont’d) • Decide on the type of model needed • Inferential: model used to explore strength of relationships between response and predictors. • Data summary: model used merely as a way to summarize a large set of data by a single equation.

  39. The second step • Decide which predictor variables and response variable on which to collect the data. • Collect the data.

  40. The third step • Explore the data • Check for outliers, gross data errors, missing values on a univariate basis. • Study bivariate relationships to reveal other outliers, to suggest possible transformations, to identify possible multicollinearities.

  41. The fourth step • Randomly divide the data into a training set and a test set: • The training set, with at least 15-20 error d.f., is used to fit the model. • The test set is used for cross-validation of the fitted model.

  42. The fifth step • Using the training set, fit several candidate models: • Use best subsets regression. • Use stepwise regression (only gives one model unless specifies different alpha-to-remove and alpha-to-enter values).

  43. The sixth step • Select and evaluate a few “good” models: • Select based on adjusted R2, Mallow’s Cp, number and nature of predictors. • Evaluate selected models for violation of model assumptions. • If none of the models provide a satisfactory fit, try something else, such as more data, different predictors, a different class of model …

  44. The final step • Select the final model: • Compare competing models by cross-validating them against the test data. • The model with a larger cross-validation R2 is a better predictive model. • Consider residual plots, outliers, parsimony, relevance, and ease of measurement of predictors.

More Related