epi 5240 introduction to epidemiology case control studies november 30 2009 n.
Skip this Video
Loading SlideShow in 5 Seconds..
EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009 PowerPoint Presentation
Download Presentation
EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009

Loading in 2 Seconds...

play fullscreen
1 / 87

EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009 - PowerPoint PPT Presentation

  • Uploaded on

EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Case-control studies. More correctly called: case-referent studies.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'EPI 5240: Introduction to Epidemiology Case-control Studies November 30, 2009' - elata

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
epi 5240 introduction to epidemiology case control studies november 30 2009
EPI 5240:Introduction to EpidemiologyCase-control StudiesNovember 30, 2009

Dr. N. Birkett,

Department of Epidemiology & Community Medicine,

University of Ottawa

case control studies
Case-control studies
  • More correctly called: case-referent studies.
  • Compare a group of cases to a referent group which reflects the exposure experience of the underlying population which gave rise to the cases.
  • Cases
    • Prevalent
    • Incident
  • Controls
    • Prevalent
    • Incident
    • Density sampled
case control 1
Case-control (1)
  • Key feature is that subjects in the ‘case’ group are selected after they have developed the outcome of interest.
    • Interviews are done after the fact
      • Limit the potential for some measures (e.g. biomarkers, psychological state)
      • Subject to biases
  • Need a comparison group (control group or reference group)
    • Choosing a suitable group is a major challenge.
case control 1a
Case-control (1A)
  • Can be done prospectively or retrospectively
    • Prospective: Start recruiting cases on day study starts as new cases of the outcome are diagnosed
      • Slows done the study but get better data
    • Retrospective: choose a date prior to the start of the study and recruit newly diagnosed cases
      • Faster but limits interview options
      • Deaths
  • Do not select prevalent cases (those alive on a target date)
    • Strong potential for bias.
case control studies 2
Case-control studies (2)
  • Some ‘names’ or labels
    • Case-control
    • Case-cohort
    • Case-base
    • Case-only
    • Case-crossover
case control studies 3
Case-control studies (3)
  • Situations where case-control designs are used
    • Exposure data are difficult or expensive to obtain
      • Nested case-control
      • Case-cohort
    • Disease is rare
    • Disease has long induction and latent period
    • Little is known about the disease
    • Underlying population is dynamic
advantages of case control studies
Advantages of Case-Control Studies
  • Relatively quick and cheap (not always; depends on the design used)
  • Appropriate for studying rare outcomes.
  • Require a smaller number of subjects than cohort study (assuming you can find enough cases)
  • Allow study of multiple potential exposure factors in the same study
disadvantages of case control studies
Disadvantages of Case-Control Studies
  • Cannot determine incidence directly (except in special circumstances).
  • Not appropriate for studying rare exposures.
  • Higher risk of biases in exposure estimation, etc.
  • Selection of appropriate comparison group can be hard.
  • They have a bad reputation
  • Complex design and methodological features
case control 4
Case-control (4)
  • So far, we’ve discussed the traditional view of case-control studies
    • Select a group of people with the outcome (cases)
    • Select a group of people to whom to compare the cases (controls)
    • Compare exposure profiles in the two groups.
    • No clear rule to logically link the two groups
      • Originally, people thought this didn’t matter
    • Sometimes thought of as a backwards cohort study
      • TROHOC
    • Logic is ‘backwards’
      • From effect  cause
case control 5
Case-control (5)
  • The ‘Modern view’
    • An alternate view of doing a cohort study which
      • Studies only a sample of the members of the cohort who do not get the outcome
      • Provides a logical link between the two study groups
        • Cases and referents
      • Is more efficient
case control 6
Case-control (6)
  • Suppose we wanted to do a cohort study to find out if DDE exposure increases the risk of breast cancer in women.
  • Recruit 100,000 women without breast cancer and follow them for 20 years.
  • Collect a blood sample at baseline to determine DDE exposure level.
  • Analyze the blood samples of all 100,000 women to generate this table of results (count data)
case control 7
Case-control (7)


High 500 14,500 15,000

Low 1,500 83,500 85,000

2,000 98,000 100,000



CIR = ------------------- = 1.88


OR = 1.92 (1.73 - 2.13)

COST: $500/DDE sample.

TOTAL Cost = $500 * 100,000 = $50,000,000

BC+ Cost = $500 * 2,000 = 1,000,000

BC- Cost = $500 * 98,000 = 49,000,000


case control 8
Case-control (8)
  • 98% of cost is going to study women who didn’t get breast cancer. Do we really need 98,000 of them?
  • Suppose we could reduce the number of BC negative women to 4,000.
    • Then cost would be only $3,000,000 (rather than $50,000,000).
  • Can’t do this at the start of the cohort because we don’t know which women will develop breast cancer.
  • BUT, if we wait until the end to do our lab studies, we will know which women developed breast cancer.
  • Select a random sample of 4,000 BC negative women
    • Keep all 2000 BC positive women.
  • Now, only need to do lab tests of 6,000 women.
case control 9
Case-control (9)

Select 4000 of 98000 of the BC negative women and generate this table


High 500 592 (1092)

Low 1,500 3408 (4908)

2,000 4,000 6000


Sampled Study

OR = 1.92 (1.68 – 2.19)

Cost = $3,000,000

Full study

OR = 1.92 (1.73 - 2.13)

Cost = $50,000,000

Can’t compute the CIR from this study (1.50 ≠ 1.88)

case control 10
Case-control (10)
  • This is a NESTED CASE-CONTROL study
  • The basis for the modern framing of the case control.
  • Comparison group is selected from the people who belong to the cohort
    • They were candidates to be ‘cases’. If they had got the outcome.
  • Controls were elected from people who remained disease-free through-out follow-up
    • Prevalent controls
    • Not the best but close to the traditional case-control approach.
  • How else could we get a comparison group?
    • Select a random sample of the entire cohort!
    • Yes, some people might be included twice in study.
case control 11
Case-control (11)

Select 4000/100000 of the cohort to generate this table


High 500 600 (1100)

Low 1,500 3400 (4900)

2,000 4,000 6000


Sampled Study

OR = 1.88

Cost = $3,000,000

Full study

CIR = 1.88

Cost = $50,000,000

CASE-COHORT Study Design

case control 12
Case-control (12)
  • CASE-COHORT study (or case-base)
    • Select the reference group from all people in the cohort
    • If someone is selected for the reference group and then gets the outcome, they remain in the study twice
      • Once as case
      • Once as referent
    • The OR from a case-cohort study is algebraically identical to the CIR from the underlying cohort study
case control 13
Case-control (13)
  • Third method of selecting referent group
    • During the 20 years of follow-up, every time a case occurs, select a referent group member
      • Candidates for this selection are all people who are in the cohort and are outcome-free and still under follow-up
        • The ‘RISK SET’
    • This is density sampling.
    • The OR from this design is identical to the Rate Ratio from the underlying cohort study.



case control 14
Case-control (14)
  • Modern case-referent study
    • Linked to an underlying cohort study
      • May exist (primary base)
      • May be conceptual (secondary base)
    • Comparison group is selected to represent the exposure experience of the underlying cohort.
      • Provides a basis to decide if the referent group is any good
      • Can select controls from people free of the outcome at the
        • Start
        • End
        • Through-out study
key design points
Key Design Points
  • Selecting the cases
  • Selecting the controls
  • Determining exposure status
  • Sample size and power.
study base 1
Study Base (1)
  • The set of persons or person-time in which disease subjects become cases.
    • The members of the source population
  • Primary base
    • Investigator defines the population experience of interest (e.g. the 1st example)
      • Closed vs. dynamic
  • Secondary base
    • Defined implicitly as the population which gave rise to the cases
      • All people who would have been diagnosed at the Ottawa hospital if they had got the disease under study
study base 2
Study Base (2)
  • Cases should be exclusively people in the base
    • All or a random sample
  • Controls (referents) estimate the exposure experience of the base
  • Primary base
    • Main challenge is complete case ascertainment
  • Secondary base
    • Main challenge is definition of study base and control selection
selecting the cases
Selecting the Cases
  • Incident vs. prevalent cases
    • Incident cases are preferred
    • Can be hard to establish ‘point of onset’.
      • Chronic disease
      • Sub-clinical phase
      • Latency periods
defining a case 1
Defining a Case (1)
  • Existing entity
    • Severity (mild vs. severe)
    • Disease heterogeneity
    • Criteria to establish diagnosis (e.g. Rheumatoid Arthritis
defining a case 2
Defining a Case (2)
  • Existing entity
    • Severity (mild vs. severe)
    • Disease heterogeneity
    • Criteria to establish diagnosis (e.g. Rheumatoid Arthritis
    • Incubation period
    • Subjective vs. objective criteria
defining a case 3
Defining a Case (3)
  • New disease
    • No clear guidelines
    • Depends on clinical insights and formation of homogeneous groups
    • AIDS/HIV initial case definition limited to homosexual men
      • efficient design to find cases
      • limited etiological focus to lifestyle issues vs. infection
identifying a case 1
Identifying a Case (1)
  • Goal is to identify all cases meeting criteria. Ideally, population based (Primary base). Could be hospital/clinic/etc based (Secondary base)
  • All true cases should have equal probability of being chosen.
  • Text states that complete ascertainment from base is not needed
    • True, but only if you can define the base population so you can:
      • Select a random sample of cases from the base
    • Selecting a convenience sample is not OK in most cases, especially when the proportion of selected cases is low.
identifying a case 1a
Identifying a Case (1A)
  • Selection Biases
    • Berkson's bias
    • Neyman fallacy (prevalence-incidence bias)
    • Detection bias
identifying a case 2
Identifying a Case (2)
  • Sources for Cases
    • Death Certificates
    • Registries
    • Hospital/clinic lists
    • Pathology records
    • Advertising
selection of controls 1
Selection of Controls (1)

‘Without controls, there can be no case-control studies but with the wrong controls, there can only be regrettable case-control studies.’


selection of controls 2
Selection of Controls (2)


  • The control group should represent the exposure experience of the subjects (cohort) which gave rise to the case group.
  • Very hard to achieve this goal when using a secondary base approach.
selection of controls 3
Selection of Controls (3)

General Control Selection Methods

  • Survivor Sampling
    • Only subjects who are disease free at the end of the cohort are eligible.
  • Base sampling
    • All subjects at the start of the cohort are eligible
  • Risk set sampling
    • Controls are selected through-out follow-up/recruitment from those who are disease-free and under follow-up
  • A subject can be both a case and a control
selection of controls 4
Selection of Controls (4)
  • Wacholder et al lists four key principles of control selection:
    • The study base principle
    • Deconfounding principle
    • Comparable accuracy principle
    • Efficiency
selection of controls 5
Selection of Controls (5)

Study Base Principle

  • Primary base: pre-defined group (population experience) which is to be studied. Cases are derived only from people in the 'experience'
    • major challenge is complete case ascertainment. Can be infeasible for mild outcomes (e.g. male infertility)
      • Can ascertain cases through clinics, etc. if they capture all cases in ‘cohort’
    • Easier to select a valid control group
    • Commonly a population-based study
selection of controls 6
Selection of Controls (6)

Study Base Principle

  • Secondary base. Defined implicitly as the 'group of people who would have become study cases if they had acquired the outcome during the course of the study'.
    • Hard to define to avoid selection bias problems.
    • referral filters
    • Usually a hospital or clinic based study
      • Cases can come from a wide geographic area without complete coverage
selection of controls 7
Selection of Controls (7)

Selection of Controls from Study Base

  • Usually use simple random sample but can be more complex (e.g. 2 stage sample)
  • Controls need to be representative of the base population not of the general population
  • Exclusions applied to both cases and controls are fine. Those applied only to controls (or only to cases) produce bias.
    • BAD:
      • exclude controls with dementia (can’t get exposure info)
      • Keep cases with dementia (since you can get exposure info from the hospital chart).
  • External controls can be OK.
selection of controls 8
Selection of Controls (8)

Deconfounding Principle

  • Measured confounders can be controlled in the analysis.
  • Can select controls to control unmeasured confounders (e.g. neighbourhood controls or sibling controls).
  • Can impact on study efficiency.
selection of controls 9
Selection of Controls (9)

Comparable Accuracy Principle

  • Aim is to produce non-differential misclassification
  • Try and collect information from cases and controls in the same manner
    • Using clinic charts for cases and personal interview for controls would be a problem.
    • Dead cases.
      • Don't select dead controls
      • Use proxies but using proxies for controls doesn’t work
    • Unavailable cases.
      • Use proxies.
sources of controls 1
Sources of Controls (1)
  • Population Controls
    • Main method used in study with primary base.
    • Roster based selection (very limited options in 2009)
      • Census
      • Property taxation roles
      • Medical insurance files
      • Driver’s licence files
    • Random Digit Dialling
      • Main method used at present due to privacy restrictions
    • Neighbourhood controls
sources of controls 2
Sources of Controls (2)
  • Population Controls (cont)
    • Advantages
      • Same study base as cases.
      • Easier to include exclusion criteria
      • Permits extrapolation to base to produce estimates of risk.
    • Disadvantages
      • Problematic if case ascertainment is incomplete.
      • Inconvenient
      • Recall bias
      • Motivation
sources of controls 3
Sources of Controls (3)
  • Hospital/registry Controls
    • Commonly used with secondary base.
    • Apply all eligibility rules to both the cases and controls
    • Condition used to define control group MUST not be related to exposure
      • Don’t use COPD controls in a study of smoking and lung cancer
    • Often choose more than one condition
sources of controls 4
Sources of Controls (4)
  • Hospital/registry Controls (cont)
    • Advantages
      • Useful if a large number of potential cases don’t get recruited (e.g. due to distance from study).
      • Comparable quality of information.
      • Convenience/cost
    • Disadvantages
      • Controls often have different catchment area from cases.
      • Berkson’s bias
sources of controls 5
Sources of Controls (5)
  • Medical Practice Controls
    • May be a good match for secondary base referral patterns; BUT
    • Exposure profile may differ from true base due to selection effects of interventions by HCP’s.
sources of controls 6
Sources of Controls (6)
  • Friend Controls
    • Ask for more than one friend and choose study participant at random
    • Serious potential biases.
    • differential prob. of selection
    • selection bias (friendly control bias)
    • overmatching
    • Enemy controls 
sources of controls 7
Sources of Controls (7)
  • Relative Controls
    • Useful for gene-environment interactions
    • Violates study-base principle.
    • Can over-control for environmental factors
determining exposure status 1
Determining Exposure Status (1)
  • Determine the etiologically relevant time period.
  • Can we measure exposure at that time?
    • Is the information potentially available?
      • Psychological state 5 years ago
      • Life-style factors
      • Occupational exposure
    • Potential impact of outcome on exposure level
      • Confounding by indication
determining exposure status 2
Determining Exposure Status (2)
  • Sources of Exposure Information
    • Questionnaires
      • Face-to-face
      • Telephone
      • Self-administered
      • Internet
    • Pre-existing records
      • Medical
      • Regulatory
    • Biomarkers
determining exposure status 3
Determining Exposure Status (3)
  • Interviewer bias
    • Use the same method in cases and controls.
    • blinding
  • Recall Bias
    • Hard to prevent/control
    • Blind subjects to objectives
    • Include ‘lie detection’ questions
    • Use validated questionnaires
    • Use alternative sources of exposure information (e.g. records, biomarkers)
other control issues
Other Control Issues
  • # of control groups
    • Use one good group unless strong reason not to do so
  • Equal opportunity for exposure is NOT required.
    • Case-control study of lung cancer and smoking should not exclude control subjects with religious beliefs which preclude smoking.
sample size and power
Sample Size and Power
  • Standard method of estimation is based on Chi-square test of proportions.
    • Adjustment methods are available for unequal group sizes.
    • Convert target OR into proportions.
      • Want to be able to detect OR = 2.
      • Assume exposure prevalence in controls is 10% (estimated from general population level).
        • For OR=2, prevalence in case group would be 18%
sample size and power1
Sample Size and Power
  • May need large sample size, especially if studying interactions
    • 2,000 per group minimum for GxE studies.
  • Multiple controls per case
    • Useful mainly when controls are more available than cases or are cheaper to study
    • 4 controls per case is the largest useful ratio.
    • Not required to have more controls than cases.
    • Use when methods require it.
  • Case-control studies are powerful but require careful design.
  • Strong utility if nested within cohort study.
    • Expands potential of cohort study
  • Key issue is control group selection
    • Must reflect the exposure experience of the cohort giving rise to the cases.
epi 5240 introduction to epidemiology matching november 30 2009
EPI 5240:Introduction to EpidemiologyMATCHINGNovember 30, 2009

Dr. N. Birkett,

Department of Epidemiology & Community Medicine,

University of Ottawa

confounding 5
Confounding (5)

How do we deal with confounding?


You need to ‘break’ one of the links between the confounder and the exposure or outcome

‘Treatment’ (analysis)

Stratified analysis (like my simple example)

Standardization (we’ll discuss this later)

Regression modeling methods (covered in a different course )


confounding 6
Confounding (6)



One of the big advantages of an RCT


Limits the subject to one level of confounder (e.g. study effect of alcohol on mouth cancer ONLY in non-smokers)


Ensures that the distribution of the exposure is the same for all levels of confounder


confounding 10
Confounding (10)


The process of making a study group and a comparison group comparable with respect to some extraneous factor.

Breaks the confounder/exposure link

Most often used in case-control studies.

Usually can’t match on more than 3-4 factors in one study

Minimum # of matching groups: 2x2x2x2 = 16

Let’s talk more about matching


matching 1
Matching (1)

Example study (case-control)

Identify 200 cases of mouth cancer from a local hospital.

As each new case is found, do a preliminary interview to determine their smoking status.

Identify a non-case who has the same smoking status as the case

If there are 150 cases who smoke, there will also be 150 controls who smoke.

matching 1a
Matching (1a)

OR =

Implies no smoking/outcome link and no confounding

Except, this isn’t quite the requirement to eliminate confounding (since it is in the whole sample and not in the unexposed).

Outcome status

Case Control

+ve 150 150

-ve 50 50


matching 2
Matching (2)

Two main types of matching

Individual (pair)

Matches subjects as individuals


Right/left eye


Ensures that the distribution of the matching variable in cases and controls is similar but does not match individual people.

matching 3
Matching (3)

Matching by itself does not fully eliminate confounding in a case-control study!

You must use analytic methods as well

Matched OR

Stratified analyses

Logistic regression models

In a cohort study, you don’t have to use these methods although they can help.

But, matching in cohort studies is uncommon

matching 4
Matching (4)


Strengthens statistical analysis, especially when the number of cases is small.

Increases study credibility for ‘naive’ readers.

Useful when confounder is a complex, nominal variable (e.g. occupation).

Standard statistical methods can be problematic, especially if many levels have very few subjects.

matching 5
Matching (5)


You can not study the relationship of matched variable to outcome.

Can be costly and time consuming to find matches, especially if you have many matching factors.

Often, some important predictors can not be matched since you have no information on their level in potential controls before doing interview/lab tests



If matching factor is not a confounder, can reduce precision and power.

matching 6
Matching (6)

Individual matching

My personal view: In many apparent cases of individual matching, that isn’t what is going on.

Most useful when there is a strong ‘natural’ pairing.


Body parts

Analysis uses McNemar method to estimate OR (and to do a chi-square test).

Unit of analysis is the pair.

matching 7
Matching (7)

With pair matching, there are four possible ‘outcomes’:


exp exp a

exp not exp b

not exp exp c

not exp not exp d

Usually present this info as a 2x2 table

matching 8
Matching (8)

625 pairs of subjects

201 pairs where both case and control were exposed

80 pairs where only case was exposed

43 pairs where only control was exposed

302 pairs where neither case or control were exposed

Control member

+ve - ve

+ve 201 80

-ve43 302



matching 9
Matching (9)

If exposure causes disease, there should be more pairs with only the case exposed then pairs with only the control exposed.

McNemar OR = 80/43 = 1.86

Ignoring matching would give OR=1.28

Control member

+ve - ve

+ve 201 80

-ve43 302



matching 10
Matching (10)

McNemar OR = b/c

‘a’ and ‘d’ pairs contribute no information on OR (wasteful of interviews).

Make sure table is set-up correctly!!

More sophisticated analysis uses conditional logistic regression modeling (another course).

Control member

+ve - ve

+ve a b

-ve c d




matching 11
McNemar OR is the same as the Mantel-Hanzel OR!

Let each matched pair be a stratum.

Apply MH method to these tables.

Let’s look at each of the four possible tables in turn

Matching (11)


matching 12
The MH OR formula is:

Here, each stratified table has only 2 subjects so the ‘ni’s’ cancel out to leave:

Matching (12)


matching 13
Type 1 table (both exposed):Matching (13)

Case Cont

Exp + 1 1 2

Exp - 0 0 0

1 1 2

The contribution to the numerator of the MH estimate is: 1 * 0/2 = 0

The contribution to the denominator of the MH estimate is: 0 * 1/2 = 0

Number of these tables: = a


matching 14
Type 2 table (case exposed, control not):Matching (14)

Case Cont

Exp + 1 0 1

Exp - 0 1 1

1 1 2

The contribution to the numerator of the MH estimate is: 1 * 1/2 = 0.5

The contribution to the denominator of the MH estimate is: 0 * 0/2 = 0

Number of these tables: = b


matching 15
Type 5 table (case not exposed, control is):Matching (15)

Case Cont

Exp + 0 1 1

Exp - 1 0 1

1 1 2

The contribution to the numerator of the MH estimate is: 0 * 0/2 = 0

The contribution to the denominator of the MH estimate is: 1 * 1/2 = 0.5

Number of these tables: = c


matching 16
Type 4 table (neither exposed):Matching (16)

Case Cont

Exp + 0 0 0

Exp - 1 1 2

1 1 2

The contribution to the numerator of the MH estimate is: 0 * 1/2 = 0

The contribution to the denominator of the MH estimate is: 1 * 0/2 = 0

Number of these tables: = d


matching 17
So, what is the MH OR from this method:Matching (17)
  • Approach can be extended to triplets, etc.
  • Leads to conditional logistic regression


matching 18
Matching (18)

Frequency matching

Most commonly used method

Many ways to implement this. Here’s one:

Case-control study of prostate cancer.

Cases will include all new cases in Ottawa in one year.

Based on cancer registry data, we know what the age distribution of cases will be.

Controls selected at random from the population.

We use the projected distribution of age in the cases to describe how many controls we need in each age group.


matching 19
Matching (19)

400 cases & 400 conts

5% of cases are under age 60

I want 5% of my controls to be under 60

400 * 0.05 = 20

Similar for other age groups
















matching 20
Matching (20)

Frequency matching (cont)

Do you distribute the control recruitment through-out the case recruitment period?

Having too many matching groups is a problem

How do I find the matching controls?

Only 4% of the population is age 75-84 but about 30% of my cases are in this group. How do I efficiently over-sample this age group?

Lack of control selection lists in Canada

Mandates use of Random Digit Dialing (RDD) methods.


matching 21
Matching (21)

Frequency matching (cont)

Analysis must stratify by matching groups or strata

If matched on combination of variables (e.g. age/sex), analysis must include the combinations

Stratify by matching variables and do MH test

Do a logistic regression with the matching factors included as covariates.