330 likes | 371 Views
This comprehensive guide covers Logistic Regression, Odds Ratios, and dichotomous response variables in Psych 818. Learn about modeling categorical and continuous variables, interpreting odds ratios, and utilizing the logit transformation in logistic regression analysis.
E N D
Logistic Regression and Odds Ratios Psych 818 - DeShon
Dichotomous Response • Used when the outcome or DV is a dichotomous, random variable • Can only take one of two possible values (1,0) • Pass/Fail • Disease/No Disease • Agree/Disagree • True/False • Present/Absent • This data structure causes problems for OLS regression
Dichotomous Response • Properties of dichotomous response variables (Y) • POSITIVE RESPONSE (Success =1) p • NEGATIVE RESPONSE (Failure = 0) q = (1-p) • observed proportion of successes • Var(Y) = p*q • Ooops! Variance depends on the mean
Dichotomous Response • Lets generate some (0,1) data • Y <- rbinom(n=1000,size=1,prob=.3) • mean(Y)= 0.295 • = .3 • var(Y) = 0.208 • 2= (.3 *.7) = .21 hist(Y)
Describing Dichotomous Data • Proportion of successes (p) • Odds • Odds of an event is the probability it occurs divided by the probability it does not occur • p/(1-p) • if p=.53; odds=.53/.47 = 1.13
Modeling Y (Categorical X) • Odds Ratio • Used to compare two proportions across groups • odds for males =.54/(1-.53) = 1.13 • odds for females = .62/(1-.62) = 1.63 • Odds-ratio = 1.62/1.13 = 1.44 • A female is 1.44 times more likely than a male to get a 1 • Or… 1.13/1.62 = 0.69 • A male is .69 times as likely as a female to get a 1 • OR > 1: increased odds for group 1 relative to 2 • OR = 1: no difference in odds for group 1 relative to 2 • OR < 1: lower odds for group 1 relative to 2
Modeling Y (Categorical X) • Odds-ratio for a 2 x 2 table • Odds(Hi) • 11/4 • Odds(Lo) • 2/5 • O.R. = (11/4)/(2/5)=8.25 • Odds of HD are 8.25 time larger for high cholesterol
Odds-Ratio • Ranges from 0 to infinity • 01∞ • Tends to be skewed • Often transform to log-odds to get symmetry • The log-OR comparing females to males = log(1.44) = 0.36 • The log-OR comparing males to females = log(0.69) = -0.36
Modeling Y (Continuous X) • We need to form a general prediction model • Standard OLS regression won’t work • The errors of a dichotomous variable can not be normally distributed with constant variance • Also, the estimated parameters don’t make much sense • Let’s look at a scatterplot of dichotomous data…
Dichotomous Scatterplot • What smooth function can we use to model something that looks like this?
Dichotomous Scatterplot • OLS regression? Smooth but…
Dichotomous Scatterplot • Could break X into groups to form a more continuous scale for Y • proportion or percentage scale
Dichotomous Scatterplot • Now, plot the categorized data Notice the “S” Shape? = sigmoid Notice that we just shifted to a continuous scale?
Dichotomous Scatterplot • We can fit a smooth function by modeling the probability of success (“1”) directly Model the probability of a ‘1’ rather than the (0,1) data directly
Logistic Equation • E(y|x)= (x) = probability that a person with a given x-score will have a score of ‘1’ on Y • Could just expand u to include more predictors for a multiple logistic regression
Logistic Regression - shifts the distribution (value of x where =.5) - reflects the steepness of the transition (slope)
Features of Logistic Regression • Change in probability is not constant (linear) with constant changes in X • probability of a success (Y = 1) given the predictor variable (X) is a non-linear function • Can rewrite the logistic equation as an Odds
Logit Transform • Can linearize the logistic equation by using the “logit” transformation • apply the natural log to both sides of the equation • Yields the logit or log-odds:
Logit Transformation • The logit transformation puts the interpretation of the regression estimates back on familiar footing • = expected value of the logit (log-odds) when X = 0 • = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;
Logit • Logit • the natural log of the odds • often called a log odds • logit scale is continuous, linear, and functions much like a z-score scale. • p = 0.50, then logit = 0 • p = 0.70, then logit = 0.84 • p = 0.30, then logit = -0.84
Odds-Ratios and Logistic Regression • The slope may also be interpreted as the log odds-ratio associated with a unit increase in x • exp()=odds-ratio • Compare the log odds (logit) of a person with a score of x to a person with a score of x+1
There and back again… • If the data are consistent with a logistic function, then the relationship between the model and the logit is linear • The logit scale is somewhat difficult to understand • Could interpret as odds but people seem to prefer probability as the natural scale, so…
There and back again… Logit Odds Probability
Estimation • Don’t meet OLS assumptions so some variant of MLE is used • Let’s develop the likelihood • Assuming observations are independent…
Estimation • Likelihood • recall..
Estimation • Upon substitution…
Example • Heart Disease & Age • 100 participants • DV = presence of heart disease • IV = Age
Heart Disease Example • library(MASS) • glm(formula = y ~ x, family = binomial,data=mydata) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13365 -4.683 2.82e-06 *** age 0.11092 0.02406 4.610 4.02e-06 *** Null deviance: 136.66 on 99 degrees of freedom Residual deviance: 107.35 on 98 degrees of freedom AIC: 111.35 Number of Fisher Scoring iterations: 4
Heart Disease Example • Logistic regression • Odds-Ratio • exp(.111)=1.117
Heart Disease Example • In terms of logits…