Logistic Regression

1 / 35

# Logistic Regression - PowerPoint PPT Presentation

Logistic Regression. November 2, 2004 Curtis A. Parvin, Ph.D. Associate Professor and Director of Informatics and Statistics Division of Laboratory Medicine Phone: 454-8699 email: parvin@wustl.edu. Regression.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Logistic Regression' - omer

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Logistic Regression

November 2, 2004

Curtis A. Parvin, Ph.D.

Associate Professor and Director of Informatics and Statistics

Division of Laboratory Medicine

Phone: 454-8699 email: parvin@wustl.edu

Regression
• Relate one or more independent (predictor) variables to a dependent (outcome) variable
• Ordinary linear regression
• Continuous outcome variable
• Determine the relationship between a continuous outcome variable and the predictor variable(s)
• Logistic regression
• Binary outcome variable
• Determine the relationship between the probability of the outcome occurring and the predictor variable(s)
Example: Relationship between gestational age at birth and whether an infant is breast feeding at time of hospital discharge
Probability, Odds, and the Logit Transform
• Probabilities range between zero and one
• Odds = P/(1-P)
• Odds range between zero and infinity
• Logit = ln(P/(1-P))
• The logit transform ranges between negative infinity and infinity
Logistic Regression
• Model the logarithm of the odds of an outcome as a linear combination of predictor variables
• Logit = ln(P/(1-P) = b0+b1X1+b2X2+. . .
• Estimate the coefficients b0, b1, b2 based on a random sample of subjects’ data
• Determine which of the predictors are “good”
• Assess model fit
• Use the model to predict future cases
Odds and Odds Ratios
• Odds is the probability of an event occurring divided by the probability of the event not occurring
• An odds ratio is the ratio of the odds for two different groups
• An odds ratio = 1 implies equal risk in the two groups
• Example: the calculated odds ratio for breast feeding at hospital discharge for GA=32 compared to GA=28 is 4.0/0.5 = 8.0
Logistic Regression Coefficients and Odds Ratios
• If ln(P/(1-P)) = b0+b1X1+b2X2+. . ., then b1, b2, … are slope coefficients reflecting rates of change
• ln(odds(X1+1)) – ln(odds(X1)) = b1
• ln(odds(X1+1)/odds(X1)) = b1
• odds(X1+1)/odds(X1) = exp(b1)
• exp(b1) represents the odds ratio associated with a 1 unit increase in X1
• exp(k*b1) = odds ratio for a k unit increase in X1
• Breast feeding example: the odds of breast feeding at hospital discharge increase by a factor of exp(.577) = 1.78 for each additional week of GA
One Binary Outcome and One Binary Predictor
• Case-Control Study
• Disease
• Cases Controls
• Risk Yes a b
• Factor No c d
• Odds Ratio (OR)= a/c = a/b = ad
• b/d c/d bc
Example: CHD and Age (Dichotomized at 55 Years)

2X2 Table calculation: OR = (21/22)/(6/51) = 8.11

Logistic Regression: ln(OR) = -0.841 + 2.094 * Age

OR = exp(2.094) = 8.11

Multiple Predictor Variables
• The independent variables (predictors, risk factors) can be categorical or continuous
• Example: TDx-FLM II and gestational age as predictors of risk for respiratory distress syndrome (RDS)
• TDx-FLM II measures mg surfactant/g of albumin in amniotic fluid
Logistic Regression Parameter Estimates

------------------------------------------------------------------------------

rds | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699

ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446

_cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839

------------------------------------------------------------------------------

ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA

Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89

Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75

------------------------------------------------------------------------------

rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313

ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387

------------------------------------------------------------------------------

Using the Logistic Model to Predict Risk of RDS
• We can use the logistic model equation to;
• Identify variables that are significant predictors
• calculate the absolute risk (probability) of RDS (may give biased estimates)
• calculate the relative risk (odds ratio) of RDS
• develop a classifier for diagnosing RDS
Logistic Regression Parameter Estimates

------------------------------------------------------------------------------

rds | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699

ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446

_cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839

------------------------------------------------------------------------------

ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA

Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89

Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75

------------------------------------------------------------------------------

rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313

ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387

------------------------------------------------------------------------------

Logistic Regression Predicted Probabilities and Classification with 0.20 cutoff

TDxFLM GA RDS Logistic P Classify

75 30 0 .0115517 0 TN

7 31 1 .9521286 1 TP

14.8 31 1 .8912354 1 TP

18.3 31 1 .8462539 1 TP

27 31 1 .6718219 1 TP

22 31 0 .7832782 1 FP

29 31 0 .6198854 1 FP

135 31 0 .0000095 0 TN

4 32 1 .9543484 1 TP

15 32 1 .8568574 1 TP

16.5 32 1 .8346432 1 TP

25 32 1 .6575863 1 TP

44.2 32 1 .1779585 0 FN

35.5 32 0 .3679177 1 FP

41 32 0 .2374989 1 FP

48 32 0 .1232235 0 TN

49 32 0 .1114575 0 TN

55.8 32 0 .0547323 0 TN

59 32 0 .0386864 0 TN

59 32 0 .0386864 0 TN

Other Prediction Methods
• Artificial Neural Networks
• Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 1996;49:1225-31.
• Linear or Quadratic Discriminant Analysis
• Classification and Regression Trees (CART)
• Multivariate Adaptive Regression Splines (MARS)
Other Flavors of Logistic Regression
• Ordinal Logistic Regression
• More than two ordered groups
• Multinomial Logistic Regression
• (Polychotomous, Polytomous, Discrete Choice)
• More than two unordered groups
• Conditional Logistic Regression
• Matched pairs data (1:1 or 1:M matching)
References
• Hosmer DW, Lemeshow S. Applied logistic regression, 2nd ed., New York, NY: John Wiley & Sons, 2000.
• Kleinbaum DG. Logistic regression: a self-learning text. New York, NY: Springer-Verlag, 1994.
• Bagley SC, White H, Golumb BA. Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 2001;54:979-85.
• (http://www.sciencedirect.com/science/publications/journal)
• Ostir GV, Uchida T. Logistic regression: a nontechnical review. Am J Phys Med Rehabil 2000;79:565-72.
• (pdf file available online through Ovid gateway)
• http://www.ioa.pdx.edu/newsom/pa551/lectur21.htm