1 / 35

Logistic Regression

Logistic Regression. November 2, 2004 Curtis A. Parvin, Ph.D. Associate Professor and Director of Informatics and Statistics Division of Laboratory Medicine Phone: 454-8699 email: parvin@wustl.edu. Regression.

omer
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression November 2, 2004 Curtis A. Parvin, Ph.D. Associate Professor and Director of Informatics and Statistics Division of Laboratory Medicine Phone: 454-8699 email: parvin@wustl.edu

  2. Regression • Relate one or more independent (predictor) variables to a dependent (outcome) variable • Ordinary linear regression • Continuous outcome variable • Determine the relationship between a continuous outcome variable and the predictor variable(s) • Logistic regression • Binary outcome variable • Determine the relationship between the probability of the outcome occurring and the predictor variable(s)

  3. Example: Relationship between gestational age at birth and whether an infant is breast feeding at time of hospital discharge

  4. Ordinary Linear Regression

  5. Logistic Regression

  6. Probability, Odds, and the Logit Transform • Probabilities range between zero and one • Odds = P/(1-P) • Odds range between zero and infinity • Logit = ln(P/(1-P)) • The logit transform ranges between negative infinity and infinity

  7. Odds and Logit for Breast Feeding Example

  8. Logistic Regression • Model the logarithm of the odds of an outcome as a linear combination of predictor variables • Logit = ln(P/(1-P) = b0+b1X1+b2X2+. . . • Estimate the coefficients b0, b1, b2 based on a random sample of subjects’ data • Determine which of the predictors are “good” • Assess model fit • Use the model to predict future cases

  9. Odds and Odds Ratios • Odds is the probability of an event occurring divided by the probability of the event not occurring • An odds ratio is the ratio of the odds for two different groups • An odds ratio = 1 implies equal risk in the two groups • Example: the calculated odds ratio for breast feeding at hospital discharge for GA=32 compared to GA=28 is 4.0/0.5 = 8.0

  10. Logistic Regression Coefficients and Odds Ratios • If ln(P/(1-P)) = b0+b1X1+b2X2+. . ., then b1, b2, … are slope coefficients reflecting rates of change • ln(odds(X1+1)) – ln(odds(X1)) = b1 • ln(odds(X1+1)/odds(X1)) = b1 • odds(X1+1)/odds(X1) = exp(b1) • exp(b1) represents the odds ratio associated with a 1 unit increase in X1 • exp(k*b1) = odds ratio for a k unit increase in X1 • Breast feeding example: the odds of breast feeding at hospital discharge increase by a factor of exp(.577) = 1.78 for each additional week of GA

  11. Logistic Regression Odds Ratios

  12. One Binary Outcome and One Binary Predictor • Case-Control Study • Disease • Cases Controls • Risk Yes a b • Factor No c d • Odds Ratio (OR)= a/c = a/b = ad • b/d c/d bc

  13. Example: CHD and Age (Dichotomized at 55 Years) 2X2 Table calculation: OR = (21/22)/(6/51) = 8.11 Logistic Regression: ln(OR) = -0.841 + 2.094 * Age OR = exp(2.094) = 8.11

  14. Multiple Predictor Variables • The independent variables (predictors, risk factors) can be categorical or continuous • Example: TDx-FLM II and gestational age as predictors of risk for respiratory distress syndrome (RDS) • TDx-FLM II measures mg surfactant/g of albumin in amniotic fluid

  15. The Data (some of it)

  16. Logistic Regression Parameter Estimates ------------------------------------------------------------------------------ rds | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699 ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446 _cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839 ------------------------------------------------------------------------------ ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89 Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75 ------------------------------------------------------------------------------ rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313 ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387 ------------------------------------------------------------------------------

  17. Using the Logistic Model to Predict Risk of RDS • We can use the logistic model equation to; • Identify variables that are significant predictors • calculate the absolute risk (probability) of RDS (may give biased estimates) • calculate the relative risk (odds ratio) of RDS • develop a classifier for diagnosing RDS

  18. Logistic Regression Parameter Estimates ------------------------------------------------------------------------------ rds | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699 ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446 _cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839 ------------------------------------------------------------------------------ ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89 Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75 ------------------------------------------------------------------------------ rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313 ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387 ------------------------------------------------------------------------------

  19. Absolute Risk of RDS based on TDX FLM II and gestational age (for RDS prevalence of 8.5%)

  20. Odds ratios for RDS relative to a TDX FLM II ratio of 70 mg/g at 37 weeks gestational age

  21. Logistic Regression Predicted Probabilities and Classification with 0.20 cutoff TDxFLM GA RDS Logistic P Classify 75 30 0 .0115517 0 TN 7 31 1 .9521286 1 TP 14.8 31 1 .8912354 1 TP 18.3 31 1 .8462539 1 TP 27 31 1 .6718219 1 TP 22 31 0 .7832782 1 FP 29 31 0 .6198854 1 FP 135 31 0 .0000095 0 TN 4 32 1 .9543484 1 TP 15 32 1 .8568574 1 TP 16.5 32 1 .8346432 1 TP 25 32 1 .6575863 1 TP 44.2 32 1 .1779585 0 FN 35.5 32 0 .3679177 1 FP 41 32 0 .2374989 1 FP 48 32 0 .1232235 0 TN 49 32 0 .1114575 0 TN 55.8 32 0 .0547323 0 TN 59 32 0 .0386864 0 TN 59 32 0 .0386864 0 TN

  22. Other Prediction Methods • Artificial Neural Networks • Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 1996;49:1225-31. • Linear or Quadratic Discriminant Analysis • Classification and Regression Trees (CART) • Multivariate Adaptive Regression Splines (MARS)

  23. Other Flavors of Logistic Regression • Ordinal Logistic Regression • More than two ordered groups • Multinomial Logistic Regression • (Polychotomous, Polytomous, Discrete Choice) • More than two unordered groups • Conditional Logistic Regression • Matched pairs data (1:1 or 1:M matching)

  24. Software Packages that perform Logistic Regression • STATA • SAS • SPSS • Others

  25. References • Hosmer DW, Lemeshow S. Applied logistic regression, 2nd ed., New York, NY: John Wiley & Sons, 2000. • Kleinbaum DG. Logistic regression: a self-learning text. New York, NY: Springer-Verlag, 1994. • Bagley SC, White H, Golumb BA. Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 2001;54:979-85. • (http://www.sciencedirect.com/science/publications/journal) • Ostir GV, Uchida T. Logistic regression: a nontechnical review. Am J Phys Med Rehabil 2000;79:565-72. • (pdf file available online through Ovid gateway) • http://www.ioa.pdx.edu/newsom/pa551/lectur21.htm • http://personal.ecu.edu/whiteheadj/data/logit/

More Related