Session 10

Session 10

Outline • Binary Logistic Regression • Why? • Theoretical and practical difficulties in using regular (continuous) dependent variables • How? • Minitab procedure • Interpreting results • Some diagnostics • Making predictions • Comparison with regular regression model Applied Regression -- Prof. Juran

Logistic Regression In our previous discussions of regression analysis, we have implicitly assumed that the dependent variable is continuous. We have learned some methods for operationalizing binary independent variables (using dummy variables), but have not discussed any method for dealing with categorical or binary dependent variables with regression analysis. (One non-regression method is discriminant analysis.) There are a number of tools available, but we will focus here on logistic regression. Applied Regression -- Prof. Juran

The basic idea: instead of predicting the exact value of the (binary) dependent variable, we will try to model the probability that the dependent variable takes on the value of 1. In English,  is the probability that the dependent variable is 1, given a particular vector of values for the independent variables. Applied Regression -- Prof. Juran

Example: Rick Beck Consumer Credit Applied Regression -- Prof. Juran

Why not a normal multiple regression model? Applied Regression -- Prof. Juran

Here we have Since  is an estimated probability, it shouldn’t go outside of the range from zero to one. But our regression equation is unbounded, and in this data set sometimes  takes on illogical estimated values. Applied Regression -- Prof. Juran

We address this problem with a logistic response function: Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

This sort of relationship will meet our criteria of keeping  in the proper range. (Note: the cumulative normal distribution has a similar shape, and is the basis for the probit model.) What we need is a transformation of either X or  such that the relationship is linear. This would enable us to use linear regression to create a model. Applied Regression -- Prof. Juran

Minitab Results Response Information Here we get the number of observations that fall into each of the two response categories. The response value that has been designated as the “reference event” is the first entry under Value and labeled as the event. In this case, the reference event is “being in default”. Applied Regression -- Prof. Juran

Logistic Regression Table: This shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, p-values, the odds ratio, and a 95% confidence interval for the odds ratio. A rule of thumb: if the confidence interval for the odds ratio is entirely below 1, then the relative odds are decreased by this variable (e.g. children). Similarly, if the confidence interval for the odds ratio is entirely above 1, then the relative odds are increased by this variable (e.g. single). Applied Regression -- Prof. Juran

From the output, we can see that all five independent variables have p-values less than 0.05, indicating that there is sufficient evidence that the parameters are not zero using a significance level of 0.05. The coefficient of 0.9699 for Single represents the estimated change in the log of P(default)/P(not default) when the subject is single compared to when he/she is not single, with the other independent variables held constant. The coefficient of –0.019388 for Debt is the estimated change in the log of P(default)/P(not default) with a $1000 increase in Debt, with the other independent variables held constant. Applied Regression -- Prof. Juran

Assumptions in Logit Regression Applied Regression -- Prof. Juran

Minitab displays the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In this example, G = 283.811, with a p-value of 0.000, indicating that there is sufficient evidence that at least one of the coefficients is different from zero. The G test is analogous to the F test in regular regression. It can be used to determine whether any of the coefficients are significantly different from zero, or whether a full model is significantly better than a reduced model. Applied Regression -- Prof. Juran

This Table of Observed and Expected Frequencies allows us to see how well the model fits the data by comparing the observed and expected frequencies. There is evidence here that the model fits the data well, as the observed and expected frequencies are similar. This supports the conclusions made by the Goodness of Fit Tests. Applied Regression -- Prof. Juran

This table is calculated by pairing the observations with different response values. Here, you have 153 individuals in default and 847 not in default, resulting in 153 * 847 = 129,591 pairs with different response values. Based on the model, a pair is concordant if the individual in default has a higher probability of being in default, discordant if the opposite is true, and tied if the probabilities are equal. 90.2% of pairs are concordant, 9.7% are discordant, with 0.2% tied. Somers' D, Goodman-Kruskal Gamma, and Kendall's Tau are summaries of this table. These measures most likely lie between 0 and 1, where larger values indicate that the model has a better predictive ability. Applied Regression -- Prof. Juran

Making Predictions Applied Regression -- Prof. Juran

Summary • Binary Logistic Regression • Why? • Theoretical and practical difficulties in using regular (continuous) dependent variables • How? • Minitab procedure • Interpreting results • Some diagnostics • Making predictions • Comparison with regular regression model Applied Regression -- Prof. Juran

For Session 11 and 12 • Student presentations Applied Regression -- Prof. Juran

Session 10

Session 10

Presentation Transcript

Session # 10

SESSION 10

Session 10

Session 10

Session 10

Session 10

Session 10

SESSION 10

Session 10

Session 10

Session 10

Session #10

Session 10

Session 10

Session 10

Session 10

Session #10

Session 10

Session 10

Session 10

Session 10