1 / 40

Logistic Regression III: Advanced topics

Logistic Regression III: Advanced topics . Conditional Logistic Regression for Matched Data. Recall: Matching. Matching can control for extraneous sources of variability and increase the power of a statistical test.

silas
Download Presentation

Logistic Regression III: Advanced topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression III: Advanced topics

  2. Conditional Logistic Regression for Matched Data

  3. Recall: Matching • Matching can control for extraneous sources of variability and increase the power of a statistical test. • Match M controls to each case based on potential confounders, such as age and gender. • If the data are matched, you must account for the matching in the statistical analysis!!

  4. Recall: Agresti example, diabetes and MI Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI.

  5. Diabetes No Diabetes 9 37 16 82 MI controls MI cases 46 Diabetes No diabetes 98 25 119 144 odds(“favors” case/discordant pair) =

  6. Conditional Logistic Regression

  7. The denominator is the probability that the case gets disease and the control does not OR that the control (with all her exposures) gets disease and the case doesn’t (with all her exposure). The numerator is the probability (as a function of exposures) that the case gets disease and the control does not. The Conditional Likelihood: each discordant stratum (rather than individual) gets 1 term in the likelihood For each stratum, we add to the likelihood: the CONDITIONAL probability that the case got disease and the control did not, given that we have a case-control pair. Note: the marginal probability of disease may differ in each age-gender stratum, but we assume that the (multiplicative) increase in disease risk due to exposure is constant across strata.

  8. Recall probability terms:

  9. Case (MI) Case (MI) Case (MI) Case (MI) Control Control Control Control 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 Diabetes Diabetes Diabetes Diabetes No diabetes No diabetes No diabetes No diabetes

  10. Each age-gender stratum has the same baseline odds of disease; but these baseline odds may differ across strata The conditional likelihood=

  11. Conditional Logistic Regression

  12. Example: MI and diabetes

  13. Conditional Logistic Regression

  14. In SAS… • proclogistic data = YourData;model MI (event = "Yes") = diabetes;strata PairID;run;

  15. Example:Prenatal ultrasound examinations and risk of childhood leukemia: case-control study BMJ 2000;320:282-283 • Could there be an association between exposure to ultrasound in utero and an increased risk of childhood malignancies? • Previous studies have found no association, but they have had poor statistical power to detect an association. • Swedish researchers performed a nationwide populationbased case-control study using prospectively assembled data onprenatal exposure toultrasound.

  16. Example:Prenatal ultrasound examinations and risk of childhood leukemia: case-control study BMJ 2000;320:282-283 • 535 cases: all children born and diagnosed as having myeloid leukemia between 1973 and 1989 in Swedish registers of birth, cancer, and causesof death. • 535 matched controls: 1 control was randomly selectedfor each case from the Swedish Birth Registry, matched by sex and year and month of birth.

  17. Ultrasound No Ultrasound Myeloid leukemia controls Leukemia cases 200 Ultrasound No ultrasound 335 215 320 535 115 85 100 235 But this type of analysis is limited to single dichotomous exposure…

  18. Used conditional logistic regression to look at dose-response with number of ultrasounds: • Results: • Reference OR = 1.0; no ultrasounds • OR =.91 for 1-2 ultrasounds • OR=.64 for >=3 ultrasounds • Conclusion: no evidence of a positive association between prenatal ultrasound and childhood leukemia; even evidence of inverse association (which could be explained by reasons for frequent ultrasound)

  19. Extension: 1:M matching • Each term in the likelihood represents a stratum of 1+M individuals • More complicated likelihood expression! • Just as easy to implement in SAS as we’ll see Wednesday…

  20. Ordinal Logistic Regression

  21. Ordinal Logistic Regression What if your outcome variable has more than two levels? For ordinal outcomes, use ordinal logistic regression: *Relies on the cumulative logit *Models the predicted probability of multiple outcomes *Also known as the “proportional odds model”

  22. Ordinal Variable Example: Likert Scale 1 = strongly disagree 2 = disagree 3 = neutral 4 = agree 5 = strongly agree Cumulative outcomes: *strongly agree vs. the rest *agree or strongly agree vs. neutral or negative *agree or neutral vs. negative *the rest vs. strongly negative Ordinal logistic regression gives you a way to model these cumulative outcomes all at once!

  23. Ordinal Variable Example: Continuous variable measured crudely 1 = breastfed >=6 months 2 = breastfed 4-5 months 3 = breastfed 2-3 months 4 = breastfed <2 months The outcome variable, breastfeeding, was only measured at limited time points. So, may not be best modeled as continuous variable in linear regression. Use ordinal logistic!

  24. Most “severe” outcome More inclusive definition of a “positive” outcome Another example, 3 levels: 1 = eumenorrhea (normal menses) (66.6%) 2 = oligomenorrhea (mild irregularity) (24.6%) 3 = amenorrhea (severe irregularity) (8.6%) From my data on runners:

  25. Cumulative logit, 3 groups(2 potential “positive” outcomes) In words: The log odds of having amenorrhea (versus everything else). And the log odds of having any irregularity (versus normal).

  26. Corresponding logistic model (no predictors) • The intercept-only model, no predictors (two intercepts!): • Log odds (amenorrhea)= amen • Log odds (any irregularity)= amen or oligo

  27. Fitted model: • Logit of amenorreha= • 8.6% of my sample has amenorrhea • Odds = 8.6/91.4=.094 • Ln (.094) = -2.3623 • Logit of any irregularity= • 33.3% has any irregularity (24.6% + 8.6%) • Odds=(1/3)/(2/3) = 1/2 • Ln(1/2) = -.70 • Fitted models are: Log odds (amenorrhea)= -2.36 Log odds (any irregularity)= -0.70

  28. Logistic model with predictors: • Log odds (amenorrhea)= amen + β1*X1 + β2*X2 • Log odds (any irregularity)= amen or oligo + β1*X1 + β2*X2 • Note, different intercepts but shared betas (shared slopes)!

  29. Odds ratio interpretation (a):

  30. Odds ratio interpretation (b):

  31. Odds ratio interpretation: • Interpretation of the betas: • eβ = adjusted odds ratio • For every 1-unit increase in X, it’s the increase in the odds of any menstrual irregularity compared with none and it’s also the increase in the odds of amenorrhea compared with the other two categories (adjusted for any other predictors in the model). • Note: proportional odds assumption! The odds ratios are the same across different levels of the outcome.

  32. Example predictor, EDI-A: • Score on the anorexia subscale of the eating disorder inventory (EDI-A)

  33. These lines should be linear and parallel (equal slopes, one beta!) The slopes represent the increase in the log odds of either outcome for every 1-unit increase in EDI-A score. The intercept for any irregularity (the log odds of any irregularity where EDI-A=0) The intercept for amenorrhea (the log odds of amenorrhea where EDI-A=0) Cumulative logit plot (4 bins)

  34. Fitted model with EDI-A: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1 -3.2630 0.3823 72.8648 <.0001 Intercept 2 1 -1.3888 0.2478 31.4220 <.0001 EDIA 1 0.1211 0.0265 20.9065 <.0001 Log odds (amen)= -3.2630 + 0.1211*EDI-A Log odds (any irregularity)= -1.3888 + 0.1211*EDI-A

  35. Fitted Model: Predicted logit at every level of EDI-A

  36. Compare actual data and fitted model:

  37. Fitted model with EDI-A: Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits EDIA 1.129 1.072 1.189 For every 1-unit increase in EDI-A score, there’s a 13% increase in the odds of being amenorrheic versus the other two categories and a 13% increase in the odds of being amenorrheic or oligomenorrheic versus normal.

  38. Predictions: Log odds (outcome)= -3.2630 + -1.3888 + 0.1211*EDIA-1 The model predicts that a woman with an EDI-A score of 15 would have:

  39. 50% probability line Predictions: Predicted logit=.4281 Predicted probability = 60.5% Predicted logit=-1.446 Predicted probability = 19%

  40. Advantages & disadvantages • Ordinal logistic is better than running separate logistic models for different outcomes (e.g., one model for amenorrhea, one model for any irregularity) because of the improvement in statistical power! • Ordinal logistic prevents you from having to arbitrarily turn an ordinal variable into a binary variable! • But does require that you meet the proportional odds assumption…

More Related