Chapter 8 Logistic Regression

Chapter 8 Logistic Regression

In the usual linear regression model the response variable is assumed to be continuous. • In the logistic regression the outcome variable is binary or dichotomous, discrete and taking on two or more possible values.

A real world example • In January 1986, the space shuttle Challenger exploded shortly after launch. An investigation was launched into the cause of the crash and attention focused on the rubber O-ring seals in the rocket boosters. • For each mission, we know the number of O-rings out of six showing some damage and the launch temperature. The data is in file orings.txt.

The simple linear regression model can predict probabilities greater than one or less than zero

8.1.1 The logistic function and Odds

Confidence interval based on Wald statistic

8.1.2 The likelihood and log-likelihood function (the logistic model can be estimated by maximizing the log-likelihood)

The deviance is the logarithm of the square of the ratio of the likelihood function Ls and LM .

The deviance associated with a given logistic regression model (M) is based on comparing the maximized log-likelihood under (M) with the maximized log-likelihood under (S), the so-called saturated model that has a parameter for each observation.

Provided that Yiis truly binomial and that the miare relatively large, rule of thumb mi ≥ 5, the deviance is approximately χ2 distributed with n–p-1, where n = the number of binomial samples, p = the number of predictors in the modeldegrees of freedom if the model is correct. • H0: logistic regression model (8.1) is appropriate • HA: logistic model is inappropriate so a saturated model is needed In R output, the deviance is called Residual deviance.

8.1.4 Using Differences in Deviance Values to Compare Models • In the r output the residual deviance is the deviance for the current model while the Null deviance is the deviance for a model with no predictors and just an intercept term. • H0: logistic regression model (8.1) with β1=0 is appropriate (means all groups follow the same probability of success, independent of the predictors) • HA: logistic model is inappropriate so a saturated model is needed

The difference in deviance can be used to compare nested models.

8.1.6 Residuals for Logistic Regression

Pearson residual

Pearson residuals do not account for the variance of the fitted probability. This issue is overcomeby standardized Pearson residuals , which are defined to be

standardized deviance residuals

8.2 Binary Logistic Regression

8.2.1 and 8.2.2 Deviance and Residuals for the case of Binary Data • The deviance does not provide an assessment of the goodness-of-fit of model. However, it can be used to compare nested models, see page 291. • Residual plots are problematic when the data are binary.

Goodness of fit test • When covariate patterns G is almost as large as n, in particular, when G equals n, then the deviance cannot be assumed to have a chi-square distribution (Collett, 1991). • The HL statistic is widely used regardless of whether or not the number of covariate patterns (G) is close to the number of observations. Nevertheless, this statistic requires that the model considers at least three covariate patterns, rarely results in significance when G is less than 6, and works best when G is close to n (the latter occurs when some of the predictors are continuous).

Hosmer–Lemeshow (HL) statistic • G: number of covariate patterns

HL statistic formula

Hosmer-Lemeshow goodness of fit test

Some simple results for variable selection • When the predictor variable X is normally distributed with a different variance for the two values of Y, the log odds are a quadratic function of x. • When the predictor variable X is normally distributed with the same variance for the two values of Y, the log odds are a linear function of x. • If the variance–covariance matrix of the predictors differs across the two groups then the log odds are a function of Xi, Xi^2, Xi Xj. • When conducting a binary regression with a skewed predictor, it is often easiest to assess the need for x and log(x) by including them both in the model. • Alternatively, if the skewed predictor can be transformed to have a normal distribution conditional on Y , then just the transformed version of X should be included in the logistic regression model.

8.2.4 Marginal Model plots for Binary Data

Given that residual plots are difficult to interpret for binary data, we shall examine marginal model plots instead. Marginal means conditional expectation, E(Y |x) reads conditional expectation of Y given x. The plots of Y against x with loess fit for against x and the loess fit for Y against x both marked on it is called a marginal models plot for Y and x.

The product term xi xj is needed as a predictor if the covariance of xi and xj differs across the two values of (i.e., if the regression of xi on xj(or vice versa) has a different slope for the two values of y .). • A quadratic term in xiis needed as a predictor if the variance of xi differs across the two values of y. Use box plot to check this.

Use Analysis of Deviance to choose between different models • Page 290 • Page 292

Interpretation of summary results for the fitted model • Page 293

Assessing Discriminatory Performance of a Binary Logistic Model: ROC Curves • Describe and illustrate methods for assessing the extent that a fitted binary logistic model can be used to distinguish the observed cases (Y = 1) from the observed noncases (Y = 0).

The proportion of true positives (rightly classified Y=1)among all cases (Y=1) is called sensitivity (Se), and the proportion of true negatives among all noncases is called the specificity (Sp). Ideally, perfect discrimination would occur if both sensitivity andspecificity are equal to 1. Se nTP/n1 = 70/100=0.7Sp = nTN/n0= 80/100=0.8

ROC stands for receiver operating characteristic, which was originally developed in the context of electronic signal detection. When applied to a logistic model, an ROC is a plot of sensitivity (Se) vs. 1 - specificity (1- Sp) derived from several cut-points for the predicted value. • The larger the area under the curve, the better is the discrimination.

HW assignment • Exercises 4, 5, and 6,

Chapter 8 Logistic Regression

Chapter 8 Logistic Regression

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Chapter 2: Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Chapter 8 Logistic Regression

Logistic regression

Logistic Regression