EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009

EPI 5240:Introduction to EpidemiologyCase-control StudiesNovember 30, 2009 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

Case-control studies • More correctly called: case-referent studies. • Compare a group of cases to a referent group which reflects the exposure experience of the underlying population which gave rise to the cases. • Cases • Prevalent • Incident • Controls • Prevalent • Incident • Density sampled

Case-control (1) • Key feature is that subjects in the ‘case’ group are selected after they have developed the outcome of interest. • Interviews are done after the fact • Limit the potential for some measures (e.g. biomarkers, psychological state) • Subject to biases • Need a comparison group (control group or reference group) • Choosing a suitable group is a major challenge.

Case-control (1A) • Can be done prospectively or retrospectively • Prospective: Start recruiting cases on day study starts as new cases of the outcome are diagnosed • Slows done the study but get better data • Retrospective: choose a date prior to the start of the study and recruit newly diagnosed cases • Faster but limits interview options • Deaths • Do not select prevalent cases (those alive on a target date) • Strong potential for bias.

Case-control studies (2) • Some ‘names’ or labels • Case-control • Case-cohort • Case-base • Case-only • Case-crossover

Case-control studies (3) • Situations where case-control designs are used • Exposure data are difficult or expensive to obtain • Nested case-control • Case-cohort • Disease is rare • Disease has long induction and latent period • Little is known about the disease • Underlying population is dynamic

Advantages of Case-Control Studies • Relatively quick and cheap (not always; depends on the design used) • Appropriate for studying rare outcomes. • Require a smaller number of subjects than cohort study (assuming you can find enough cases) • Allow study of multiple potential exposure factors in the same study

Disadvantages of Case-Control Studies • Cannot determine incidence directly (except in special circumstances). • Not appropriate for studying rare exposures. • Higher risk of biases in exposure estimation, etc. • Selection of appropriate comparison group can be hard. • They have a bad reputation • Complex design and methodological features

Case-control (4) • So far, we’ve discussed the traditional view of case-control studies • Select a group of people with the outcome (cases) • Select a group of people to whom to compare the cases (controls) • Compare exposure profiles in the two groups. • No clear rule to logically link the two groups • Originally, people thought this didn’t matter • Sometimes thought of as a backwards cohort study • TROHOC • Logic is ‘backwards’ • From effect  cause

Case-control (5) • The ‘Modern view’ • An alternate view of doing a cohort study which • Studies only a sample of the members of the cohort who do not get the outcome • Provides a logical link between the two study groups • Cases and referents • Is more efficient

Case-control (6) • Suppose we wanted to do a cohort study to find out if DDE exposure increases the risk of breast cancer in women. • Recruit 100,000 women without breast cancer and follow them for 20 years. • Collect a blood sample at baseline to determine DDE exposure level. • Analyze the blood samples of all 100,000 women to generate this table of results (count data)

Case-control (7) BC+ BC- High 500 14,500 15,000 Low 1,500 83,500 85,000 2,000 98,000 100,000 DDE 500/15000 CIR = ------------------- = 1.88 1500/85000 OR = 1.92 (1.73 - 2.13) COST: $500/DDE sample. TOTAL Cost = $500 * 100,000 = $50,000,000 BC+ Cost = $500 * 2,000 = 1,000,000 BC- Cost = $500 * 98,000 = 49,000,000 CAN WE DO THIS CHEAPER?

Case-control (8) • 98% of cost is going to study women who didn’t get breast cancer. Do we really need 98,000 of them? • Suppose we could reduce the number of BC negative women to 4,000. • Then cost would be only $3,000,000 (rather than $50,000,000). • Can’t do this at the start of the cohort because we don’t know which women will develop breast cancer. • BUT, if we wait until the end to do our lab studies, we will know which women developed breast cancer. • Select a random sample of 4,000 BC negative women • Keep all 2000 BC positive women. • Now, only need to do lab tests of 6,000 women.

Case-control (9) Select 4000 of 98000 of the BC negative women and generate this table BC+ BC- High 500 592 (1092) Low 1,500 3408 (4908) 2,000 4,000 6000 DDE Sampled Study OR = 1.92 (1.68 – 2.19) Cost = $3,000,000 Full study OR = 1.92 (1.73 - 2.13) Cost = $50,000,000 Can’t compute the CIR from this study (1.50 ≠ 1.88)

Case-control (10) • This is a NESTED CASE-CONTROL study • The basis for the modern framing of the case control. • Comparison group is selected from the people who belong to the cohort • They were candidates to be ‘cases’. If they had got the outcome. • Controls were elected from people who remained disease-free through-out follow-up • Prevalent controls • Not the best but close to the traditional case-control approach. • How else could we get a comparison group? • Select a random sample of the entire cohort! • Yes, some people might be included twice in study.

Case-control (11) Select 4000/100000 of the cohort to generate this table BC+ BC- High 500 600 (1100) Low 1,500 3400 (4900) 2,000 4,000 6000 DDE Sampled Study OR = 1.88 Cost = $3,000,000 Full study CIR = 1.88 Cost = $50,000,000 CASE-COHORT Study Design

Case-control (12) • CASE-COHORT study (or case-base) • Select the reference group from all people in the cohort • If someone is selected for the reference group and then gets the outcome, they remain in the study twice • Once as case • Once as referent • The OR from a case-cohort study is algebraically identical to the CIR from the underlying cohort study

Case-control (13) • Third method of selecting referent group • During the 20 years of follow-up, every time a case occurs, select a referent group member • Candidates for this selection are all people who are in the cohort and are outcome-free and still under follow-up • The ‘RISK SET’ • This is density sampling. • The OR from this design is identical to the Rate Ratio from the underlying cohort study.

BAD CHOICE

Case-control (14) • Modern case-referent study • Linked to an underlying cohort study • May exist (primary base) • May be conceptual (secondary base) • Comparison group is selected to represent the exposure experience of the underlying cohort. • Provides a basis to decide if the referent group is any good • Can select controls from people free of the outcome at the • Start • End • Through-out study

Key Design Points • Selecting the cases • Selecting the controls • Determining exposure status • Sample size and power.

Study Base (1) • The set of persons or person-time in which disease subjects become cases. • The members of the source population • Primary base • Investigator defines the population experience of interest (e.g. the 1st example) • Closed vs. dynamic • Secondary base • Defined implicitly as the population which gave rise to the cases • All people who would have been diagnosed at the Ottawa hospital if they had got the disease under study

Study Base (2) • Cases should be exclusively people in the base • All or a random sample • Controls (referents) estimate the exposure experience of the base • Primary base • Main challenge is complete case ascertainment • Secondary base • Main challenge is definition of study base and control selection

Selecting the Cases • Incident vs. prevalent cases • Incident cases are preferred • Can be hard to establish ‘point of onset’. • Chronic disease • Sub-clinical phase • Latency periods

Defining a Case (1) • Existing entity • Severity (mild vs. severe) • Disease heterogeneity • Criteria to establish diagnosis (e.g. Rheumatoid Arthritis

Defining a Case (2) • Existing entity • Severity (mild vs. severe) • Disease heterogeneity • Criteria to establish diagnosis (e.g. Rheumatoid Arthritis • Incubation period • Subjective vs. objective criteria

Defining a Case (3) • New disease • No clear guidelines • Depends on clinical insights and formation of homogeneous groups • AIDS/HIV initial case definition limited to homosexual men • efficient design to find cases • limited etiological focus to lifestyle issues vs. infection

Identifying a Case (1) • Goal is to identify all cases meeting criteria. Ideally, population based (Primary base). Could be hospital/clinic/etc based (Secondary base) • All true cases should have equal probability of being chosen. • Text states that complete ascertainment from base is not needed • True, but only if you can define the base population so you can: • Select a random sample of cases from the base • Selecting a convenience sample is not OK in most cases, especially when the proportion of selected cases is low.

Identifying a Case (1A) • Selection Biases • Berkson's bias • Neyman fallacy (prevalence-incidence bias) • Detection bias

Identifying a Case (2) • Sources for Cases • Death Certificates • Registries • Hospital/clinic lists • Pathology records • Advertising

Selection of Controls (1) ‘Without controls, there can be no case-control studies but with the wrong controls, there can only be regrettable case-control studies.’ Oleckno

Selection of Controls (2) UNDERLYING REQUIREMENT • The control group should represent the exposure experience of the subjects (cohort) which gave rise to the case group. • Very hard to achieve this goal when using a secondary base approach.

Selection of Controls (3) General Control Selection Methods • Survivor Sampling • Only subjects who are disease free at the end of the cohort are eligible. • Base sampling • All subjects at the start of the cohort are eligible • Risk set sampling • Controls are selected through-out follow-up/recruitment from those who are disease-free and under follow-up • A subject can be both a case and a control

Selection of Controls (4) • Wacholder et al lists four key principles of control selection: • The study base principle • Deconfounding principle • Comparable accuracy principle • Efficiency

Selection of Controls (5) Study Base Principle • Primary base: pre-defined group (population experience) which is to be studied. Cases are derived only from people in the 'experience' • major challenge is complete case ascertainment. Can be infeasible for mild outcomes (e.g. male infertility) • Can ascertain cases through clinics, etc. if they capture all cases in ‘cohort’ • Easier to select a valid control group • Commonly a population-based study

Selection of Controls (6) Study Base Principle • Secondary base. Defined implicitly as the 'group of people who would have become study cases if they had acquired the outcome during the course of the study'. • Hard to define to avoid selection bias problems. • referral filters • Usually a hospital or clinic based study • Cases can come from a wide geographic area without complete coverage

Selection of Controls (7) Selection of Controls from Study Base • Usually use simple random sample but can be more complex (e.g. 2 stage sample) • Controls need to be representative of the base population not of the general population • Exclusions applied to both cases and controls are fine. Those applied only to controls (or only to cases) produce bias. • BAD: • exclude controls with dementia (can’t get exposure info) • Keep cases with dementia (since you can get exposure info from the hospital chart). • External controls can be OK.

Selection of Controls (8) Deconfounding Principle • Measured confounders can be controlled in the analysis. • Can select controls to control unmeasured confounders (e.g. neighbourhood controls or sibling controls). • Can impact on study efficiency.

Selection of Controls (9) Comparable Accuracy Principle • Aim is to produce non-differential misclassification • Try and collect information from cases and controls in the same manner • Using clinic charts for cases and personal interview for controls would be a problem. • Dead cases. • Don't select dead controls • Use proxies but using proxies for controls doesn’t work • Unavailable cases. • Use proxies.

Sources of Controls (1) • Population Controls • Main method used in study with primary base. • Roster based selection (very limited options in 2009) • Census • Property taxation roles • Medical insurance files • Driver’s licence files • Random Digit Dialling • Main method used at present due to privacy restrictions • Neighbourhood controls

Sources of Controls (2) • Population Controls (cont) • Advantages • Same study base as cases. • Easier to include exclusion criteria • Permits extrapolation to base to produce estimates of risk. • Disadvantages • Problematic if case ascertainment is incomplete. • Inconvenient • Recall bias • Motivation

Sources of Controls (3) • Hospital/registry Controls • Commonly used with secondary base. • Apply all eligibility rules to both the cases and controls • Condition used to define control group MUST not be related to exposure • Don’t use COPD controls in a study of smoking and lung cancer • Often choose more than one condition

Sources of Controls (4) • Hospital/registry Controls (cont) • Advantages • Useful if a large number of potential cases don’t get recruited (e.g. due to distance from study). • Comparable quality of information. • Convenience/cost • Disadvantages • Controls often have different catchment area from cases. • Berkson’s bias

Sources of Controls (5) • Medical Practice Controls • May be a good match for secondary base referral patterns; BUT • Exposure profile may differ from true base due to selection effects of interventions by HCP’s.

EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009