Observational Studies of Disease

Observational Studies of Disease • Descriptive (incidence, prevalence) • Analytic (associate characteristics of population with risk of disease) • Population experience can be studied with group level or individual level data • Studies using group level data are called ecological studies

Studies Using Group Data (1) • Disadvantages of group data/ecological studies • Most group data measures are made on individuals but not under investigator’s control • Do not know if person’s with given characteristic are those at higher risk of disease: the “ecological fallacy” • Confounding problem for all observational studies but greatest with group data: lack of investigator control in measuring confounding variables

Studies Using Group Data (2) • Advantages of group data/ecological studies • Inexpensive: secondary data already collected (vital statistics, disease registries, HMO’s, etc) • Rapid test of hypothesis • Idea that ecological studies are hypothesis-generating doesn’t reflect their usual purpose • If hypothesized risk factor is associated with disease, it may well be seen in group level data • Can overcome “threshold” problem: exposure is so universal that effect is difficult to detect in one setting

Studies Using Group Data (3) • Advantages of group data/ecological studies • Some disease transmission dynamics can only be studied at group level (eg, herd immunity and infectious disease transmission • Allows global measures of group characteristics (e.g., type of health care system) • Allows tests of area-level interventions (eg, closing of a public hospital)

Studies Using Group Data (3) • All ecological studies are not created equal • Typical study relates a disease rate across different geographic areas with an aggregrate measure of a characteristic of the individuals in the areas (eg, average alcohol consumption) or with a global measure of a characteristic of the area (eg, climate). No attempt is made to control for confounding. • Quality of secondary data varies widely • Length of time secondary data has been collected varies (relevant to looking at an association in different time periods)

Strategies used to strengthen ecological studies (1) • There are several strategies that can strengthen inferences from ecological studies • Multiple kinds of comparisons to strengthen inference of association; eg, across geographic areas and over different time periods • Example: Valerie Beral’s study showing inverse association between average family size and ovarian cancer mortality using comparisons among different birth cohorts, different countries, and different social and ethnic groups (Lancet, 1978)

Strategies used to strengthen ecological studies (2) • “Small Area” analysis: Used in health services research to investigate variation within small geographic areas • Reduce confounding by comparing small areas from a larger area thought to be fairly homogeneous on potential confounders (SES, disease prevalence) • Example: Wennberg’s study of variation in rates of surgical procedures in 6 areas of Vermont with similar disease prevalence (Medical Care, 1987)

Strategies used to strengthen ecological studies (3) • Mixed studies that collect data on individuals but use secondary group data for rare outcomes • Doesn’t avoid ecological fallacy but reduces confounding by key measures at individual level • Using group data may make study feasible that would be otherwise prohibitively expensive • Example: Bindman’s study of health care access and rates of preventable hospitalizations in California medical service areas (JAMA, 1995)

Studies Using Individual Data • Unifying concept: Characterize morbidity and mortality in a defined population during a defined period of time • Defined population = Study Base = morbidity and mortality experience of a cohort of individuals over time • Cohort studies, case-control studies, and cross-sectional studies are best understood within the framework of a common study base

Study Base • Establish by • Assembling an explicit cohort from: • Sample of larger population of interest • Sample of persons with and without an exposure of interest • Identifying cases of a specific disease and defining population that gave rise to the cases • Defines the cohort within which the cases occurred • Study designs differ in how they sample disease experience of the study base

Cohort Study • Easiest design to understand because it explicitly defines the study base as a cohort • Measures individual characteristics before disease occurrence fulfilling the temporal order required for cause and effect (but is not the only study design that can do this). • Provides conceptual basis for understanding sampling strategies of case-control, case-cohort, and cross-sectional designs

Cohort Study X L Subjects dying or lost to follow-up X X X D L L D X X D D D Subjects followed until end of study D D D D D Begin End Time of Follow-up X = dead L = lost D = disease

Types of Cohort Studies • Fixed (closed) versus dynamic (open) cohort • Fixed: All subjects identified at baseline in study • Dynamic (open): Additional subjects taken during follow-up; subjects enter at different times • Fixed versus dynamic exposure measurement • Groups of individuals with and without exposure of interest do not change during follow-up. Sometimes assembled and followed as two separate cohorts of exposed and unexposed. • Dynamic exposure may vary during follow-up (eg, individuals stop or start a behavior or exposure is defined by an accumulation of years of exposure)

Measuring exposures that can vary over time in a cohort study • Simple cohort study with exposure status fixed at baseline calculates risk by number of cases of disease among exposed and unexposed subjects • More complex cohort studies allow individuals to change from exposed to unexposed and therefore have to calculate disease occurrence on basis of both number of persons and length of time exposed (called person-time) • Diseases with long incubation periods, such as cancer, require lag time to be taken into account in relating exposure to disease occurrence

Threats to Validity of a Cohort Study • Ascertainment of disease outcome • Length of follow-up • Time between ascertainment of status (visits, follow-up interviews, medical record checks, etc.) • Subtlety of disease onset (case definition) • Secondary data sources for outcomes (eg, registries) • Subjects lost during follow-up • Key issue is whether losses are related to exposure and disease outcome • If disease incidence is important outcome, losses may bias results even if not related to exposures

Threats to Validity of a Cohort Study (2) • Long follow-up time biggest threat to validity of cohort studies • Difficult to retain cohort and ascertain all outcomes • Bias from loss to follow-up is analogous to bias in case-control study based on prevalent cases • Large size and expense of cohort may require compromise in measurements • Can be complicated to measure dynamic exposures and allow for incubation periods

Common paradigm ofstudy design presents time the study is undertaken as key to design but neglects time measurements taken and concept of study base Past Present Future Cross-sectional: Classify exposure and disease at one time Cohort: Classify by exposure Classify by disease Case-control: Classify by disease Classify by exposure

Timing of Study and Measurements • Prospective versus retrospective study terminology not always clear about when measurements were made • Exposure and disease measurements may be concurrent, non-concurrent, or both with respect to the experience of the study base • Study may be carried out concurrently, or non-currently, or both with respect to the experience of the study base

Timing of Study and Measurements • Some authors designate case-control studies as “retrospective studies”: inaccurate since cohort studies can also be retrospective • Distinction between when measurement of exposure was made and study recorded it • Key issue for causality is measuring exposure before disease--that is not design dependent

Timing of measurement of exposures and disease with respect to timing of study A CHD Diet Exercise Study begins and makes measurements B CHD Diet Exercise Study begins and records measurements made previously in medical record C CHD Diet Exercise Study begins, asks subjects to recall information in the past

Timing of Study and Measurements (2) • Schematic A is a prospective or concurrent cohort study • Schematic B could be either retrospective cohort or case-control using medical records • Schematic C is most often a case-control study but could be a retrospective cohort that uses recall to ascertain exposure • Mixed designs are also possible with some measures concurrent with study and some measures non-concurrent

Cross-sectional Study • In context of a cohort, a cross-sectional sample is equivalent to sampling those with prevalent disease and those without at one point in time in the follow-up • A comparison of exposure in those with and without disease is equivalent to a case-control study using prevalent cases and concurrent controls

Cross-sectional Study in Context of a Cohort Cross-sectional sample of cohort (population) at one point in time: Equivalent to sampling prevalent cases and concurrent controls Possible source of bias: Missing potential subjects D D D D C C D D C D Subjects in Cross-sectional Study C D C D C D C D = disease = case C = control (no disease)

Cross-sectional Study in a Dynamic Population Cross-sectional sample of a dynamic population differs from sampling in fixed cohort setting. Persons enter as well as leave the population. Disease sampling is still of prevalent cases. Persons entering the population D Subjects in cross-sectional study D D Persons leaving the population D D D D D D D D D = disease = case

Cross-sectional Study (2) • Using cross-sectional study to identify a cohort can provide a representative sample (prevalent cases of disease usually excluded) • Repeated cross-sectional studies of a population can provide important information on trends a cohort might miss • Although major weakness is problem of temporal order (cause and effect), timing of exposure and disease can sometimes be determined retrospectively

Case-Control Study Designs (1) • Best conceptualized as occurring within a cohort study • Variations on the case-control design come from how the cases and the controls are sampled • From the point of view of design, case-control studies can be just as valid as cohort studies • Threats to validity come from greater difficulty in defining and sampling the study base and in measuring exposure prior to disease

Case-Control Studies: Case-based sampling • In context of a cohort, case-based sampling identifies all cases of disease during the follow-up period and samples individuals disease disease free at the time of study (end of follow-up in the cohort context) • Unbiased sample of cases but possibly biased sample of controls • Requires rare disease assumption for odds ratio to estimate relative risk

Case-Control Study with Case-Based Sampling Sampling within a Cohort Study: Ascertaining all cases and sampling controls from subjects disease free at end of follow-up Possible bias: Potential controls not in study at end of follow-up D D D C D C C D Subjects in Case-Control Study C D C D C D C D C C D C D = disease = case C = control (no disease)

Case-Control Studies: Case-Based Sampling (2) • Case-based sampling is most common case-control design outside setting of explicit cohort • The study base that gave rise to the cases is often not defined • Examples of study bases: • Cases from population disease registry; study base is the population covered by the registry • Cases from HMO; study base is plan members • Hospital cases; study base is persons who would have been admitted to hospital with the disease

Nested Case-control Studies • Nested case-control studies occur within a defined cohort and sample controls from the “risk set” of persons at risk in the cohort at the occurrence of each case (called incidence-density sampling) • Controls may become cases at some point later in follow-up (true of any study design if everyone is not followed until death)

Nested Case-Control (Incidence Density Sampling) Sampling within a cohort: Including all cases and sampling controls from subjects disease free at the time each case is diagnosed Cases = 10 D’s Controls = 10 C’s Formed from 9 risk sets D C D C D C C D C Subjects in Case-Control Study D D C C D D C D C D C Risk Set 1 Risk Set 2 Etc. Risk Set 9

Nested Case-control Studies (2) • In example, 10 cases occur at nine points in time (2 cases occur at same time) giving rise to 9 risk sets • One control for each case is selected in each risk set, so 2 controls selected in risk set with 2 cases • One of the controls, selected in the second risk set, becomes a case at the fourth risk set

Nested Case-control Studies (3) • “Nested” used by some authors to mean any case-control study conducted within a cohort study; used here to mean incidence-density sampling design • Outside of prior cohort study, incidence sampling of the study base giving rise to the cases produces same “nested” design • Example: Identify cases as they occur from cancer disease registry for S.F. county and obtain controls from random sample of county at time each case occurs

Nested Case-control Studies (4) • Avoids potential biases of prevalent controls or prevalent cases • Incidence density sampling gives unbiased estimate of ratio of disease rates in exposed and unexposed subjects • Controls for secular (calendar time) trends since cases and controls are matched on calendar time

Case-Cohort Studies • Alternative design to nested case-control study--in context of a cohort selects all cases and takes random sample of the cohort baseline for controls • Like the nested design, some persons selected as controls may become cases • Like the nested design, can be extended outside setting of cohort study to a study base

Case-Cohort Study Sampling within a cohort: Including all cases and sampling controls from all subjects at baseline of cohort Study subjects C D C D D C D C D Controls in Case-Cohort Study C C D Cases in Case-Cohort Study C D D C D C D C D = disease = case C = control (no disease)

Case-Cohort Studies (2) • Taking random sample of cohort at baseline gives estimate of prevalence of exposure in the cohort and allows calculation of attributable risk • Controls are not linked to timing of disease occurrence so not matched to cases on calendar time • A single baseline control group can be used for more than one disease outcome

Case-Cohort Studies (3) • No necessity to screen out “silent” cases of disease from the control group • Same sub-cohort can be used for future period of extended cohort follow-up • Gives unbiased estimate of relative risk

Choosing a Study Design • What has already been done? • If no research, a rapid and inexpensive ecological study may be useful • If several case-control studies have already been done, what would yours contribute? • Is it worth repeating a cohort study that has been done in a one population in a different population (eg, in women rather than in men)?

Choosing a Study Design (2) • Cohort study decisions • Need to represent a larger population? • Not necessarily relevant to biological question of relative disease risk in exposed and unexposed • May be important to generalizing findings • Larger cohort versus longer follow-up • If disease rate is constant, same number of outcome events by more subjects rather than more follow-up • Shorter follow-up limits potential usefulness of cohort to examine other research questions • Shorter follow-up desirable if rapid answer to research question is a high priority

Choosing a Study Design: Case-cohort versus nested case-control • Nested case-control somewhat more statistically efficient in cohorts with long follow-up and substantial censoring • Analysis is more familiar and available for nested case-control • Power of nested case-control requires only estimate of number of cases and controls; case-cohort requires information on whole cohort and drop out rates

Choosing a Study Design: Case-cohort versus nested case-control (2) • Case-cohort can use same controls for multiple disease outcomes • Case-cohort allows direct modeling of disease incidence in exposed and unexposed • Case-cohort allows multiple time scales (age, calendar time); nested case-control only one • Nested case-control allows more efficient collection of time dependent exposures

Choosing a Study Design: Case-cohort versus nested case-control (3) • Case-cohort can use same controls for a future period of additional cohort follow-up • Case-cohort can use controls for other purposes (such as monitoring compliance) • Controls can be selected more rapidly in case-cohort; nested case-control may require control selection at end of study for late cases

Observational Studies of Disease

Observational Studies of Disease

Presentation Transcript

Observational Studies

Observational Studies

Observational Studies

Understanding Observational Studies

Observational Studies

Observational Studies

Observational Studies and RCT

Analysis of Observational Studies

Experiments and Observational Studies

Experiments and Observational Studies

Experiments and Observational Studies

Experiments and Observational Studies

Aprotinin Observational Studies

Observational Studies

Observational Studies of roAp Stars

Session 2 Observational studies

Observational Studies

Observational Studies

Experiments vs. Observational Studies

Observational Studies Interventional Studies Tutorial

Understanding Observational Studies

OBSERVATIONAL STUDIES