Lecture 8: Selection Bias, Matching, & Control Selection Matthew Fox Advanced Epidemiology
Selection bias or confounding? • Comparison of mortality among office workers and longshoremen from MI • Comparison is biased because those who self-select into longshoremen are fitter which leads to less MI • What is the bias?
In a case control study, can we match cases to controls based on exposure?
Misclassification Summary I #1 Non-differential and independent misclassification of dichotomous exposure or disease (usually) creates an expectation that estimates of effect are biased towards the null. #2 Non-differential and independent misclassification of a covariate creates an expectation that the relative risk due to confounding is biased towards the null, yielding residual confounding.
Misclassification Summary II #3 Errors due to misclassification can be corrected algebraically #4 Differential misclassification yields an unpredictable bias of the estimates of effect (still correctable). #5 There are important exceptions to the mantra that “non-differential misclassification biases towards the null.”
This Session • Selection bias • Definition & control • Matching • Cohort vs. Case-control studies • When to adjust, when not to adjust • Control selection • Adjustment • Is it possible?
Selection bias — definition • Distortions of the estimate of effect arising from procedures to select subjects and from factors that influence participation • Common element is that the exposure-disease relation is different among participants than among those theoretically eligible • Observed estimate of effect reflects a mixture of forces affecting participation and forces affecting disease occurrence
Separate from Confounding • Cohort studies don’t have selection bias at entry even if subjects self select • Selection into cohort can create confounding, but this can be undone by adjustment • Or becomes an issue of generalizablity • Cohort studies/RCTs can have selection bias at end through differential LTFU • Some can be undone if we know enough about the selection mechanism
Selection bias — Fallacy • Formerly frequently viewed as disease-dependent selection forces • Exposure-dependent selection forces were thought to be confounders or part of the population definition. • Sometimes selection factors can be controlled as if they were confounders • For example, matched factors in case-control studies and two-stage studies. • However, not all selection factors related to exposure can be so treated
OR = [50/4000] / [40/8000] = 2.5 Selection bias
Selection forces don’t create bias if they are not related to both exposure and disease
OR = [50/4000] / [100/8000] = 1 Selection bias
OR = [50/4000] / [50/4000] = 1 Selection bias
OR = [50/10000] / [40/8000] = 1 Selection bias
Selection Bias Occurs When Selection is Related to Both the Exposure and the Outcome Sounds like confounding, but this time E and D affect Selection
Selection Bias in a Case Control Study: • Case controls study of the relationship between estrogens and myocardial infarction • Cases are those hospitalized for MI • Controls are those hospitalized for hip fracture • Could this cause selection bias?
Selection Bias in a Case Control Study: • E= estrogens D = myocardial infarction • F= hip fracture C = selection into study Selection bias occurs because we condition on a common effect of both E and D
Selection Bias in a Cohort Study: • Cohort study of relationship between HAART and progression to AIDS • LTFU occurs more among those with low CD4 • LTFU occurs more among those with AIDS • But now selection out occurs before AIDS • Could this cause selection bias?
Selection Bias in a Cohort Study: Differential LTFU • E = ART, D = AIDS, L = vector of symptoms • U = True immunosuppression (unmeasured) • C= Drop out (LTFU) Selection bias occurs because we condition on a common effect of both E and a common cause C and D
Selection Bias in a Cohort Study: Differential LTFU • E = ART, D = AIDS, L = vector of symptoms • U = True immunosuppression (unmeasured) • C= Drop out
Selection Bias vs. Confounding • Bias is a systematic difference between the truth and the observed • Pr[Ya=1=1] - Pr[Ya=0=1] ≠ Pr[Y=1|a=1] - Pr[Y=1|a=0] • Separate from random error which is not structural • Using DAGs we can see the common structures • Confounding = common causes (directly or through other mechanisms) • Selection bias = conditioning on common effects
To see the difference • Comparison of mortality among office workers and longshoremen from MI • Comparison is biased because those who self-select into longshoremen are fitter which leads to less MI • What is the DAG? Occupation MI Fitness
Adjustment for loss to follow up through weighting • Because selection bias means we are only looking at those included in the study we can’t adjust through stratification • We don’t have the data on those not included • Can use weighting, because this does not require us to have data on those missing • Inverse probability of censoring weighting • Assumes we have enough data to predict the drop out
Further stratify IPC weights for predictors of censoring • As shown assumes those lost are same as those retained • Not likely to be true • Calculate weights within levels of predictors of censoring • Valid if we can produce conditional exchangeability between those lost and those not lost • Weights can be multiplied by IPTW weights to simultaneously adjust for confounding