310 likes | 458 Views
This overview addresses the complexities involved in model selection when several explanatory variables are present in quantitative methods. It highlights the challenge of choosing the appropriate model given the vast number of possible combinations, including interactions and squared terms. The severe consequences of the multiplicity of p-values lead to ambiguities in model interpretation, as well as the importance of variable economy. Strategies such as stepwise regression and incorporating outside information are discussed, emphasizing best practices for effective model choice.
E N D
Quantitative Methods Model Selection II: datasets with several explanatory variables
Model Selection II: several explanatory variables The problem of model choice
Model Selection II: several explanatory variables The problem of model choice
Model Selection II: several explanatory variables The problem of model choice With 5 x-variables, there are 25=32 possible models, not including interactions. If we include two-way interactions without squared terms, there are 1x1 + 5x1 + 10x2 + 10x8 + 5x64 + 1x1024 = 1450 models If we do allow squared terms, there are 1x1 + 5x2 + 10x8 + 10x64 + 5x1024 + 1x32768 = 38619 models. With multiple models, there are many p-values and possible “right-leg/left-leg” and “poets’ dates” effects.
Model Selection II: several explanatory variables The problem of model choice • Economy of variables • Multiplicity of p-values • Marginality
Model Selection II: several explanatory variables The problem of model choice
Model Selection II: several explanatory variables Economy of variables
Model Selection II: several explanatory variables Economy of variables
Model Selection II: several explanatory variables Economy of variables all variables increase R2 F<1 - adding the variable decreased R2 adj F>1 - adding the variable increased R2 adj
continuous Model Selection II: several explanatory variables Economy of variables
Model Selection II: several explanatory variables Economy of variables
Model Selection II: several explanatory variables Economy of variables (Predictions for datapoint 39)
Model Selection II: several explanatory variables Multiplicity of p-values
Model Selection II: several explanatory variables Multiplicity of p-values
Model Selection II: several explanatory variables Multiplicity of p-values Focus, don’t fish - reduce number of X-variables - use outside information to decide on inclusion - use outside information to decide on exclusion Stringency - reduce nominal p-value Combine model terms - for once, reverse the usual splitting
Model Selection II: several explanatory variables Multiplicity of p-values
Model Selection II: several explanatory variables Multiplicity of p-values DF SeqSS 1 366.9 1 42.7 1 14.7 3 424.3 MS=424.3/3=141.4 F = 141.4/108.9 = 1.30 on 3 and 30 DF Single p-value from Minitab using CDF: p=0.293 CDF 1.30 K1; F 3 30. LET K2=1-K1
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression General Linear Model: LRGWHAL versus Source DF Seq SS Adj SS Adj MS F P VIS 1 61.166 61.166 61.166 193.35 0.000 Error 230 72.759 72.759 0.316 Total 231 133.925 Term Coef SE Coef T P Constant -4.52464 0.06116 -73.98 0.000 VIS 0.125222 0.009005 13.91 0.000
Model Selection II: several explanatory variables Stepwise regression General Linear Model: LRGWHAL versus Source DF Seq SS Adj SS Adj MS F P VIS 1 61.166 61.166 61.166 193.35 0.000 Error 230 72.759 72.759 0.316 Total 231 133.925 Term Coef SE Coef T P Constant -4.52464 0.06116 -73.98 0.000 VIS 0.125222 0.009005 13.91 0.000
Model Selection II: several explanatory variables Stepwise regression General Linear Model: LRGWHAL versus Source DF Seq SS Adj SS Adj MS F P VIS 1 61.166 61.166 61.166 193.35 0.000 Error 230 72.759 72.759 0.316 Total 231 133.925 Term Coef SE Coef T P Constant -4.52464 0.06116 -73.98 0.000 VIS 0.125222 0.009005 13.91 0.000
Model Selection II: several explanatory variables Stepwise regression General Linear Model: LRGWHAL versus Source DF Seq SS Adj SS Adj MS F P VIS 1 61.166 61.166 61.166 193.35 0.000 Error 230 72.759 72.759 0.316 Total 231 133.925 Term Coef SE Coef T P Constant -4.52464 0.06116 -73.98 0.000 VIS 0.125222 0.009005 13.91 0.000
Model Selection II: several explanatory variables Stepwise regression
Forward ≠ Backward Model Selection II: several explanatory variables Stepwise regression Forward = Backward
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Stepwise regression
Model Selection II: several explanatory variables Last words… • Economy of variables: prediction, adjusted R2 • Multiplicity: outside information, focussing, stringency, combining model terms • Stepwise regressions not usually suitable -- but are for initial sifting of a large number of potential predictors in a preliminary study Random Effects Read Chapter 12