Epidemiologic Study Design: Cohort and Case-Control Studies

Epidemiologic Study Design: Cohort and Case-Control Studies Erin Richman, ScD richmane@urology.ucsf.edu 01/13/2012

Highlights from Last Week • Sufficient-component cause model • Useful for examining multiple risk factors (‘web of causation’), but • Omits discussion of origins of causes, focuses on proximal causes, and ignores induction period • Does not address indirect effects • Causes of disease in individuals,not in populations • Does not consider factors that control distribution of risk factors • Ignores dynamic non-linear relations • Counterfactual model • Can be applied at individual or population level • Specifies what would happen to individuals or populations under alternative patterns of exposure • Forces researchers to think about operational definitions of cases and controls, sampling schemes, and other important design questions • One of the two conditions in the definitions of the effect measures must be contrary to fact – exposures or treatment vs. a reference condition

Today’s Objectives • Cohort Studies • Closed vs. Open Cohorts • Classifying Person-Time • Case-control Studies • Selection of cases and controls • Measures of association • Control sampling schemes • Sources of controls

Overview • Randomized controlled trial = Investigator assigns exposure and follows participants overtime to ascertain the outcome • Cohort study = Analogous to RCT, but investigator does not assign exposure • Case-control study = Analogous to cohort study, but with more efficient sampling

Cohort Studies

Cohort Studies • Investigator identifies a group of individuals who are free of the disease, but at risk for the disease • Classifies individuals into 2+ groups based on exposure • Follows over time to determine who develops disease

Closed Cohorts • Fixed population • Membership-defining event • Cohort of persons born in 2012 (calendar time) • Cohort formed at entry into UCSF medical school (an event) • Can directly calculate: • Risk ratio • Incidence rate ratio • Odds ratio

Closed Cohorts cont. • Limitations of incidence proportion (cumulative incidence) analyses: • Loss to follow-up • Competing risks • For certain outcomes (e.g. death), incidence proportion tends toward 1 over time • Exposures may change over time • Solution: Calculate Incidence Rates

Open Cohorts • No fixed roster – person-time accrued from a potentially changing roster of individuals • Residents of California • Members of Kaiser Permanente • Individuals can contribute varying amounts of person-time • Individuals can leave and re-enter the cohort • Individuals can contribute person-time to multiple exposure categories over time

Open Cohorts cont. • Rates can be directly measured • Incidence proportion cannot be directly measured • Can calculate IP from IR assuming: • The IR is constant over the follow-up periods (j) • No competing risks or loss to follow-up related to disease risk • Number of events in each time interval is small relative to number of people at risk in that time interval • Use for communicability of results Incidence Proportion =

Classifying Person-Time • Each unit of person-time contributed by an individual has its own exposure classification • Must consider the etiologically relevant exposure • Exposure may change over time Exposure Disease Initiation Disease Detection Latent period Induction period

Classifying Person-Time cont. • Time at which exposure occurs ≠ time at risk of exposure effects • Radiation from an atomic bomb and risk of cancer • Only the time at risk for exposure effects should be counted in the denominator of the incidence rate for that level of exposure • If the induction time is not known, can estimate empirically by calculating the incidence rates for differing categories of time since exposure

Classifying Person-Time cont. • How do you classify person-time contributed by exposed subjects before the minimum induction time has elapsed or after the maximum induction time has passed? • Example: • Exposure = Rotavirus vaccine • Outcome = Intussusception • Assume induction period ranges from 1-7 days Exposure Disease Initiation Induction period

Classifying Exposure • Exposure may change over time • Ideally, measure exposure constantly and classify each unit of person-time • A given individual can contribute person-time to one or more exposure category in the same study! • More often, assume one measure of exposure history is the only aspect of exposure associated with current disease risk • Current, average, cumulative, etc. • Lag exposure to account for induction time between exposure and disease initiation

BASELINE QUESTIONNAIRE Have you smoked 20 packs of cigarettes or more in your lifetime? What specific brand and type? If quit, how long ago? At each age from <15 to 60+, what was the average number of cigarettes you smoked per day? FOLLOW-UP QUESTIONNAIRES Do you currently smoke cigarettes? If so, how many per day?

Timing of Outcome Events • Events and the person-time being accumulated at the moment of the event are assigned to the same category of exposure • Must clearly define the outcome event, including a protocol for determining the timing of the event • Goal is to detect events at onset, however this may not be possible • Date of diagnosis for cancer • Hospitalization for end-stage renal disease

Immortal Person-Time • Entry criteria in a cohort may depend on survival • Study of pre-term birth and mortality in adulthood. Individuals had to survive at least 1 year. • First year of life = immortal person-time • During this time, no one is at risk to become an event in the study • Exclude immortal person-time from the denominator of incidence rates

Case-Control Studies

Case-Control Studies • Imagine a population in which a cohort study could be conducted • Identify cases as you would in the cohort study • Sample from the study base (person-time) to determine exposure distribution in the population that gave rise to the cases = Controls • Controls must be sampled independent of exposure! • More efficient version of cohort study • Sampling creates new opportunities for bias

Selection of Cases and Controls • Cases = numerators of the rates you would have measured in the corresponding cohort study • Controls = relative size of the exposed and unexposed person-time in the study base ≈ person-time denominators you would have measured in the corresponding cohort study

Selection of Cases and Controls • Cases are the same people that would be cases in the underlying cohort study • Can randomly sample cases if sampling is independent of exposure • Controls are a random, or conditionally random within strata, sample of study base • Exposure distribution in the controls is the same as in the population that gave rise to the cases, conditional on matching factors

Selection of Controls • Controls should be selected from the same population – the source population – that gave rise to the cases. • Controls should be selected independently of exposure, within strata of factors that will be used for stratification in the analysis. Persons are eligible to be selected as a control as long as they are at risk for disease  a person can be both a control and a case in the same study!

Measures of Association • Can also estimate the risk ratio or incidence odds ratio from case-control studies • The measure of association estimated by the OR depends on the control sampling scheme

Measures of Association • Controls must be sampled independent of exposure: f1 = f0 • Generally, control sampling rate is not known, so cannot calculate incidence rates in exposed and unexposed • Generally, rare disease assumption is NOT needed • As with cohort studies, the incidence odds ratio and rate ratio are only good approximations of the risk ratio if the incidence proportion is less than 0.1

Control Sampling Schemes * Only need rare disease assumption when estimating the risk ratio from the odds ratio.

Full Cohort Analysis X X X X X Time

Closed Cohort Analysis X X X X X Time

Density Sampling • Sample controls at a steady rate per unit time over period in which cases are sampled • Probability of being selected as a control is proportional to amount of time person spends at risk of disease in source population • Individual may be selected as a control while they are at risk for disease, and subsequently become a case • Incidence density sampling or “risk set sampling” is a form of density sampling in which you match cases and controls on time

Sources of Controls • Primary study base • Base population known • Study of Kaiser Permanente members • Study conducted within existing cohort, “Nested case-control” • Secondary study base • Identify cases first, base population not known • Hospital-based case-control study • Must identify person-time contributed by persons who would have become a case in your study had they developed the disease

Neighborhood Controls • Sample residences • May individually match cases to one or more controls residing in the same neighborhood • If neighborhood is associated with exposure, must control for matching in the analysis • Neighbors may not be the source population of the cases • Cases at a VA hospital

Friend/Family Controls • Being named as a friend control may be related to exposure • Reclusive people are less likely to be named • Investigator dependent on cases for identifying controls • Friend groups often overlap, so persons with more friends are more likely to be selected as a control

Random Digit Dialing • Case eligibility should include residence in a house with a telephone • Probability of calling a number ≠ probability of contacting an eligible control • Households vary in the number of people, amount of time a person is at home, and the number of operating phones • Method requires a great deal of time and labor

Random Digit Dialing cont. • Answering machines, voicemail, and caller ID reduce response rates • Cell phones reduce validity of assuming source population can be randomly sampled using this method • Recent CDC survey showed 2% increase in binge drinking compared to 2009 data – more cell phone numbers included, and average age of respondents decreased • May not be able to distinguish business and residential numbers - difficult to estimate proportion of non-responders

Hospital-Based Controls • If not randomly selecting controls, must be cautious that control selection is independent of exposure • May not represent exposure distribution in source population if exposure is associated with hospitalization, other diseases, or both • Example: Hospital-based study examining smoking and pancreatic cancer where controls are selected from persons admitted to the hospital for other conditions.

Hospital-Based Controls cont. • Limit diagnoses for controls to conditions with no association with the exposure • May exclude most potential controls • Exclusion criteria only applies to the cause of the current hospitalization • Reasonable to exclude categories of potential controls on the suspicion that a given category might be related to exposure • Imprudent to use only a single diagnostic category as a source of controls

Hospital-Based Controls cont. • Bias will occur if the exposure directly affects risk of being hospitalized, even if exposure is unrelated to the study disease or control diseases • Berkson’s Bias

Deceased Controls • Not members of the source population for the cases • If exposure is associated with mortality, dead controls will misrepresent exposure distribution in source population • Even if cases are dead, generally better to choose living controls • Do not need a proxy interview for living controls of dead cases

Comparability of Information • Comparability of information is often used to guide control selection and data collection • BUT • Non-differential exposure measurement error does not guarantee that bias will be toward the null • Efforts to ensure equal accuracy of exposure data tend to produce equal accuracy of data on other variables • Overall bias due to non-differential error in confounders and effect modifiers can be larger than error produced by unequal accuracy of exposure data from cases and controls

Exposure Classification • Same principles as discussed for cohort studies • Cases’ exposure should be classified as of the time of diagnosis or disease onset, accounting for induction time hypotheses • Controls should be classified according to their exposure status at the time of selection, accounting for induction time hypotheses

Timing of Exposure Classification • Selection time does not necessarily refer to the time at which a control is first identified • For hospital-based controls, selection time may be date of diagnosis for the disease that resulted in the current hospitalization • Date of interview is often used if there is not an event analogous to the cases’ date of diagnosis • Interviewers should be blinded to case-control status whenever possible

Conclusion • Causal models discussed last week inform decisions made in the design phase of a study • Every decision made by the investigator comes with its own set of assumptions • Cohort study = Analogous to RCT, but investigator does not assign exposure • Case-control study = Analogous to cohort study, but with more efficient sampling

Epidemiologic Study Design: Cohort and Case-Control Studies