1 / 33

Logistic Regression and Odds Ratios

Logistic Regression and Odds Ratios. Psych 818 - DeShon. Dichotomous Response. Used when the outcome or DV is a dichotomous, random variable Can only take one of two possible values (1,0) Pass/Fail Disease/No Disease Agree/Disagree True/False Present/Absent

Download Presentation

Logistic Regression and Odds Ratios

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression and Odds Ratios Psych 818 - DeShon

  2. Dichotomous Response • Used when the outcome or DV is a dichotomous, random variable • Can only take one of two possible values (1,0) • Pass/Fail • Disease/No Disease • Agree/Disagree • True/False • Present/Absent • This data structure causes problems for OLS regression

  3. Dichotomous Response • Properties of dichotomous response variables (Y) • POSITIVE RESPONSE (Success =1)  p • NEGATIVE RESPONSE (Failure = 0)  q = (1-p) •  observed proportion of successes • Var(Y) = p*q • Ooops! Variance depends on the mean

  4. Dichotomous Response • Lets generate some (0,1) data • Y <- rbinom(n=1000,size=1,prob=.3) • mean(Y)= 0.295 •  = .3 • var(Y) = 0.208 • 2= (.3 *.7) = .21 hist(Y)

  5. Describing Dichotomous Data • Proportion of successes (p) • Odds • Odds of an event is the probability it occurs divided by the probability it does not occur • p/(1-p) • if p=.53; odds=.53/.47 = 1.13

  6. Modeling Y (Categorical X) • Odds Ratio • Used to compare two proportions across groups • odds for males =.54/(1-.53) = 1.13 • odds for females = .62/(1-.62) = 1.63 • Odds-ratio = 1.62/1.13 = 1.44 • A female is 1.44 times more likely than a male to get a 1 • Or… 1.13/1.62 = 0.69 • A male is .69 times as likely as a female to get a 1 • OR > 1: increased odds for group 1 relative to 2 • OR = 1: no difference in odds for group 1 relative to 2 • OR < 1: lower odds for group 1 relative to 2

  7. Modeling Y (Categorical X) • Odds-ratio for a 2 x 2 table • Odds(Hi) • 11/4 • Odds(Lo) • 2/5 • O.R. = (11/4)/(2/5)=8.25 • Odds of HD are 8.25 time larger for high cholesterol

  8. Odds-Ratio • Ranges from 0 to infinity • 01∞ • Tends to be skewed • Often transform to log-odds to get symmetry • The log-OR comparing females to males = log(1.44) = 0.36 • The log-OR comparing males to females = log(0.69) = -0.36

  9. Modeling Y (Continuous X) • We need to form a general prediction model • Standard OLS regression won’t work • The errors of a dichotomous variable can not be normally distributed with constant variance • Also, the estimated parameters don’t make much sense • Let’s look at a scatterplot of dichotomous data…

  10. Dichotomous Scatterplot • What smooth function can we use to model something that looks like this?

  11. Dichotomous Scatterplot • OLS regression? Smooth but…

  12. Dichotomous Scatterplot • Could break X into groups to form a more continuous scale for Y • proportion or percentage scale

  13. Dichotomous Scatterplot • Now, plot the categorized data Notice the “S” Shape? = sigmoid Notice that we just shifted to a continuous scale?

  14. Dichotomous Scatterplot • We can fit a smooth function by modeling the probability of success (“1”) directly Model the probability of a ‘1’ rather than the (0,1) data directly

  15. Another Example

  16. Another Example (cont)

  17. Logistic Equation • E(y|x)= (x) = probability that a person with a given x-score will have a score of ‘1’ on Y • Could just expand u to include more predictors for a multiple logistic regression

  18. Logistic Regression  - shifts the distribution (value of x where  =.5)  - reflects the steepness of the transition (slope)

  19. Features of Logistic Regression • Change in probability is not constant (linear) with constant changes in X • probability of a success (Y = 1) given the predictor variable (X) is a non-linear function • Can rewrite the logistic equation as an Odds

  20. Logit Transform • Can linearize the logistic equation by using the “logit” transformation • apply the natural log to both sides of the equation • Yields the logit or log-odds:

  21. Logit Transformation • The logit transformation puts the interpretation of the regression estimates back on familiar footing •  = expected value of the logit (log-odds) when X = 0 •  = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;

  22. Logit • Logit • the natural log of the odds • often called a log odds • logit scale is continuous, linear, and functions much like a z-score scale. • p = 0.50, then logit = 0 • p = 0.70, then logit = 0.84 • p = 0.30, then logit = -0.84

  23. Odds-Ratios and Logistic Regression • The slope may also be interpreted as the log odds-ratio associated with a unit increase in x • exp()=odds-ratio • Compare the log odds (logit) of a person with a score of x to a person with a score of x+1

  24. There and back again… • If the data are consistent with a logistic function, then the relationship between the model and the logit is linear • The logit scale is somewhat difficult to understand • Could interpret as odds but people seem to prefer probability as the natural scale, so…

  25. There and back again… Logit Odds Probability

  26. Estimation • Don’t meet OLS assumptions so some variant of MLE is used • Let’s develop the likelihood • Assuming observations are independent…

  27. Estimation • Likelihood • recall..

  28. Estimation • Upon substitution…

  29. Example • Heart Disease & Age • 100 participants • DV = presence of heart disease • IV = Age

  30. Heart Disease Example

  31. Heart Disease Example • library(MASS) • glm(formula = y ~ x, family = binomial,data=mydata) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13365 -4.683 2.82e-06 *** age 0.11092 0.02406 4.610 4.02e-06 *** Null deviance: 136.66 on 99 degrees of freedom Residual deviance: 107.35 on 98 degrees of freedom AIC: 111.35 Number of Fisher Scoring iterations: 4

  32. Heart Disease Example • Logistic regression • Odds-Ratio • exp(.111)=1.117

  33. Heart Disease Example • In terms of logits…

More Related