Multiple Logistic Regression. RSQUARE, LACKFIT, SELECTION, and interactions. Introduction. Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome.
RSQUARE, LACKFIT, SELECTION, and interactions
Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome.
Consider the following example: 15- and 16-year-old adolescents were asked if they have ever had sexual intercourse. The outcome of interest is intercourse. The predictors are race (white and black) and gender (male and female).
Example from Agresti, A. Categorical Data Analysis, 2nd ed. 2002.
The data set intercourse is created with the variables “white” (1 if white, 0 if black), “male” (1 if male, 0 if female), and “intercourse” (1 if yes, 0 if no). We want to examine the odds of having intercourse with race and gender as predictors.
Enter the code on the next slide into SAS.
First look at the effect of race and gender with no interaction. The SAS code is similar to that of simple logistic regression; one more independent variable has been added to the model statement.
The R2 value is 0.9907. This means that 99.07% of the variability in our outcome (intercourse) is explained by including gender and race in our model.
Notice that the race and gender terms are both statistically significant (p < 0.0001 and p = 0.0040, respectively).
The logistic regression model is:log(odds) = β0+β1(white) +β2(male)
log(odds) = -0.4555 – 1.3135(white) + 0.6478(male)
The odds of having intercourse is 73.1% (1-0.269) lower for whites than blacks.
The odds of having intercourse is 1.911 times greater for males versus females.
Log(odds)black males = β0+β1(0) +β2(1)
Log(odds)white females = β0+β1(1) +β2(0)
Log(OR) = β0 +β2 – [β0+β1] = β2 –β1
Log(OR) = 0.6478 – (-1.3135) = 1.9613
OR = exp(1.9613) = 7.11
Black males have a 7.11 times greater odds of having intercourse than white females.
The Hosmer and Lemeshow Goodness-of-Fit Test tests the hypotheses:
Ho: the model is a good fit, vs.
Ha: the model is NOT a good fit
With this test, we want to FAIL to reject the null hypothesis, because that means our model is a good fit (this is different from most of the hypothesis testing you have seen).
Look for a pvalue > 0.10 in the H-L GOF test. This indicates the model is a good fit.
In this case, the pvalue = 0.2419, so we do NOT reject the null hypothesis, and we conclude the model is a good fit.
We have added a third term to the model: the interaction between race and gender (“white*male”). We did not need to create this variable in the data set.
The new R2 value is 0.9908, which is barely higher than the R2 from the model with only the main effects. Adding the interaction did not help explain more variance in the model.
log(odds) = β0+β1(white) +β2(male) + β3(white*male)
log(odds) = -0.4925-1.2534(white) +0.7243(male) – 0.1151(white*male)
Often, if you have multiple predictors and interactions in your model, SAS can systematically select significant predictors using forward selection, backwards selection, or stepwise selection.
In forward selection, SAS starts with no predictors in the model. It then selects the predictor with the smallest pvalue and adds it to the model. It then selects another predictor from the remaining variables with the smallest pvalue and adds it to the model. It continues doing this until no more predictors have pvalues less than 0.05.
In backwards selection, SAS starts with all of the predictors in the model and eliminates the non-significant predictors one at a time, refitting the model between each elimination. It stops once all the predictors remaining in the model are statistically significant.
We will let SAS select a model for us out of the three predictors: white, male, white*male. Type the following code into SAS: