Bias in studies of the human genome l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

Bias in Studies of the Human Genome PowerPoint PPT Presentation


  • 215 Views
  • Updated On :
  • Presentation posted in: General

Bias in Studies of the Human Genome. Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI. Lecture 6: Bias in Studies of the Human Genome. 1. Consider the causes of heterogeneity of results in gene association studies.

Download Presentation

Bias in Studies of the Human Genome

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bias in studies of the human genome l.jpg

Bias in Studies of the Human Genome

Thomas A. Pearson, MD, PhD

University of Rochester

School of Medicine

Visiting Scientist, NHGRI


Lecture 6 bias in studies of the human genome l.jpg

Lecture 6: Bias in Studies of the Human Genome

1. Consider the causes of heterogeneity of results in gene association studies.

2. Review the types and sources of bias relevant to human genomic research.

3. Provide examples from genome-wide association studies to illustrate biases or potential for bias.

4. Identify strategies in study design, data collection, statistical analysis, and interpretation which could prevent or minimize bias in human genome research.


Slide3 l.jpg

Larson, G. The Complete Far Side. 2003.


Slide4 l.jpg

PLoS Med. 2005 Aug;2(8):e124.


Slide5 l.jpg

WSJ. 2004Sep14.


Slide6 l.jpg

Only 6/600 Gene-Disease Associations Significant in >75% of Studies (Hirschhorn J et al, Genet Med 2002; 4:45-61)

Hirschhorn J et al, Genet Med 2002; 4:45-61.


Possible explanations of heterogeneity of results in genetic association studies l.jpg

Possible Explanations of Heterogeneity of Results in Genetic Association Studies

  • Biologic mechanisms

    • Genetic heterogeneity

    • Gene-gene interactions

    • Gene-environment interactions

  • Spurious mechanisms

    • Inadequacies of genomic markers

    • Type 1 error

    • Limited sample sizes and power

    • Cohort, age, period (secular) effects

    • Bias


Definition of bias in human research l.jpg

Definition of Bias in Human Research

  • Sackett (1975): “Any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth.”

  • Gordis (2004): “Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on risk or disease.”


Effects of bias on gwas results l.jpg

Effects of Bias on GWAS Results

  • False negatives

  • False positives

  • Inaccurate effect sizes

    • Underestimates

    • Overestimates


Slide10 l.jpg

Larson, G. The Complete Far Side. 2003.


Types of bias in genome association studies l.jpg

Types of Bias in Genome Association Studies

  • Selection of cases and controls

  • Information on genotype or phenotype

  • Analysis and presentation of results

  • Interpretation of results


20 types of biases potentially encountered in gwas l.jpg

20 Types of Biases Potentially Encountered in GWAS

  • Common to all human observational studies (N=12)

  • Unique or common in GWAS (N=8)

    • Supercase or supercontrol biases

    • Latent case bias

    • Population stratification

    • Hardy-Weinberg disequilibrium

    • Genotyping quality bias

    • Transmission disequilibrium bias

    • Winner’s Curse


Systematic review of gwas nhgri catalog of gwas in print l.jpg

Systematic Review of GWAS: NHGRI Catalog of GWAS in Print*

  • 109 studies from 3/05 to 3/08.

  • Genotyping platforms of density>100,000 SNPs

  • Each study reviewed for:

    • Study design

    • Description of case and comparison groups

    • Collection of genotype and other risk factors

    • Presentation of study results

    • Interpretation of study results

      *http://www.genome.gov/gwastudies/


Characteristics of 109 gwas l.jpg

Characteristics of 109 GWAS

  • Phenotypes

    • Discrete outcomes or traits: 91 in 83 studies

    • Quantitative traits: 40 in 26 studies

  • Design of discovery study

    N %

    • Case-control 77 70.6

    • Trio 4 3.7

    • Nested case-control 4 3.7

    • Cross-sectional/Cohort 24 22.0


Four key requirements for a bias free case control study l.jpg

Four Key Requirements for a Bias-Free Case-Control Study

Selection Bias

  • Cases are representative of all those who develop the disease being studied.

  • Controls are representative of all those at risk of developing the disease and eligible to become cases and be included in the study.

  • Ancestral geographical origins and predominant environmental exposures of cases do not differ dramatically from controls.

    Information Bias

    - Collection of risk factor and exposure information is the same for cases and controls.


Selection biases in gwas criteria for classification l.jpg

Selection Biases in GWAS: Criteria for Classification

  • Misclassification bias: Absence of description or use of adequate means to define cases and/or controls.

  • Nonresponse bias: Absence of description of rates of recruitment and participation in cases and/or controls.

  • Prevalence-incidence bias: Use of prevalent cases of disease which have sizable short term case-fatality or remission rates.


Slide17 l.jpg

Larson, G. The Complete Far Side. 2003.


Characteristics of 109 gwas selection of study subjects l.jpg

Characteristics of 109 GWAS:Selection of Study Subjects

  • Methods of selection/recruitment frequently (30%) described in supplement or other publication.

  • Few baseline descriptors or cases/controls

    • Tables comparing cases vs. controls: 36.0%

      • Statistical comparison of cases/controls: 3.5%

    • Participation rates (cases or controls): 9.0%

      • Comparison of participants/nonparticipants: 2.0%

  • Most cases (67%) prevalent cases derived from clinical sources, rather than population-based or incident cases.


Gwas of type ii diabetes in mexican americans l.jpg

GWAS of Type II Diabetes in Mexican-Americans*

  • Case-control study design

    • 281 cases with diabetes defined by current Dx/RX or fasting blood glucose or 2 hour GTT

    • 280 persons from a random population sample whose T2DM status is unknown

  • 112,541 SNPs assayed in each person

  • 4 genes identified

  • ?Misclassification: Substantial prevalence (7-14%) of T2DM likely in controls.

    *Hayes MG et al. Diabetes, 9/10/07.


Selection biases in gwas criteria for classification20 l.jpg

Selection Biases in GWAS:Criteria for Classification

  • Supercase bias: Use of additional criteria in case selection that increases the chance of a genetic etiology.

  • Supercontrol bias: Use of additional criteria in control selection that decreases the chance of a genetic etiology.

  • Latent case bias: Inclusion as controls of persons who could never develop the disease even if a gene carrier.


A case control gwas of prostate cancer l.jpg

A Case-Control GWAS of Prostate Cancer*

  • Discovery Study

    • 1854 cases with symptomatic prostate cancer and diagnosis <60years or positive family history.

    • 1894 controls with age>50 years and PSA<0.5 ng/ml.

    • Genotyping of 541,129 SNPs

    • 11 new SNPs associated (P<E-6)

  • Replication Study

    • 3268 cases/3354 controls

    • Genotyping of 11 SNPs

    • 7 SNPs independently associated (P<E-7)

      *Eeles RA et al. NatGen 2/10/08


Prostate cancer 7 novel snps in discovery and replication studies l.jpg

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies

Discovery Replication

SNP OR 95%CI OR 95%CI

rs2660753 1.52 1.30-1.77 1.18 1.06-1.31

rs9364554 1.28 1.16-1.41 1.17 1.08-1.26

rs6465657 1.30 1.19-1.43 1.12 1.05-1.20

rs7920517 1.39 1.27-1.53 1.22 1.14-1.31

rs10993994 1.62 1.47-1.78 1.25 1.17-1.34

rs7931342 0.79 0.72-0.86 0.84 0.79-0.90

rs7931342 1.39 1.23-1.57 1.03 0.94-1.14

Eeles RA et al: Nat Gen 2/10/08


Latent cases in a gwas of prostate cancer l.jpg

Latent Cases in a GWAS of Prostate Cancer*

CasesControls

Discovery Study MaleFemale

Iceland 1890 9312 12060

Replications

Netherlands 998 1004 1017

Spain 548 742 874

Sweden 2893 1781 -

US-Baltimore 1545 576 -

US-Chicago 665 368 184

US-Nashville 526 613 -

US-Rochester 1140 503 -

*Gudmundsson J et al. Nat Gen 2008; 40:281-3


Selection biases in gwas criteria for classification24 l.jpg

Selection Biases in GWAS: Criteria for Classification

  • Membership bias: Membership in a group may imply a degree of health which differs systematically from that of the general population.

  • Population Stratification: Genetic differences between cases and controls unrelated to disease but due to sampling from populations of different ancestries.

  • Phenotypic variation bias: The use of different definitions of cases or controls between discovery study and subsequent replications.


Wellcome trust case control wtcc consortium l.jpg

Wellcome Trust Case-Control (WTCC) Consortium*

Genotyping: 500,000 SNPs (Affymetrix)

Cases: 2000 persons from each of 7 diseases: (bipolar disorder,coronary artery disease, Crohn disease, rheumatoid arthritis, T1DM, T2DM, hypertension)

Controls: 3000 persons without disease

1500 in 1958 British Birth Cohort

1500 UK blood donors

*WTCC, Nature 2007; 447:661-678.


Population stratification l.jpg

Population Stratification*

Each population has unique genetic and social history; ancestral patterns of migration, mating, expansions/bottlenecks, stochastic variation all yield differences in allele frequencies between populations.

Population stratification: cases and controls have different allele frequencies due to diversity in populations of origin and unrelated to outcome, requiring:

1) differences in disease prevalence

2) differences in allele frequencies

*Cardon LR, Palmer LJ, Lancet 2003


Slide27 l.jpg

Downloaded from: StudentConsult (on 11 May 2008 06:40 PM)

© 2005 Elsevier


Population stratification and allelic association l.jpg

Population Stratification and Allelic Association

Full heritage Am. Indian population

Gm3;5,13,14 prevalence: 1%

NIDDM prevalence: 40%

Caucasian population

Gm3;5,13,14 prevalence: 66%

NIDDM prevalence: 15%

OR = 0.27

[0.18,0.40]]

Cardon LR and Palmer LJ, Lancet 2003; 361:598-604, after Knowler et al 1988.


Unlinked genetic markers in population stratification l.jpg

Unlinked Genetic Markers in Population Stratification

  • Population stratification (or any non-random mating) allows marker-allele frequencies to vary among population segments.

  • Disease more prevalent in one subpopulation will be associated with any alleles in high frequency in that subpopulation.

  • If population stratification exists, can often be detected by analysis of unlinked marker loci. [Pritchard JD, Rosenberg NA; AJHG 1999; 65:220-228]

.


Adjusting for population stratification in a gwas of t2dm l.jpg

Adjusting for Population Stratification in a GWAS of T2DM*

  • Case-control study of 661 cases of T2DM and 614 controls from France.

  • Genotyping assayed 392,935 SNPs

  • SNP 200kb from lactase gene on 2q21:

    • Strong association with T2DM

    • Strong north-south prevalence gradient in France

  • Used 20,323 SNPs not related to T2DM as measure of population stratification.

  • After adjustment for stratification, most of the association was removed.

    *Sladek R et al. Nature 2007; 445: 881-885.


Phenotypic variation bias are the case homogeneous l.jpg

Phenotypic Variation Bias: Are the case homogeneous?

  • GWAS of Atrial Fibrillation*

    • Sample 1: hospital diagnosis of AF “confirmed by 12-lead ECG”.

    • Sample 2: patients with ischemic stroke or TIA, diagnosis of AF “based on 12-lead ECG.”

    • Sample 3: patients hospitalized with acute stroke “diagnosed with AF.”

    • Sample 4: patients with lone AF of AF plus hypertension referred to arrythmia service, “AF documented by ECG.”

      Gudbjartsson et al, Nature 2007; 448: 353-357


Information bias systematic differences in data collection between cases and controls l.jpg

Information Bias: Systematic differences in data collection between cases and controls

  • Genotyping quality bias: Lack of genotyping protocol for exclusion of SNPs for quality control criteria or publication of call rate.

    • Testing for Hardy-Weinberg disequilibrium

    • Transmission disequilibrium testing: differential rate of genotyping error leading to distortion of allele frequency in cases/controls


Is dna collected and handled identically in cases and controls l.jpg

Is DNA Collected and Handled Identically in Cases and Controls?

  • T1DM gene association study: cases from GRID Study, controls from 1958 British Birth Cohort Study examining 6322 SNPs.

  • Samples from lymphoblastoid cell lines extracted using same protocol in two different laboratories.

  • Case and control DNAs randomly ordered with teams masked to case/control status.

  • Some extreme associations could not be replicated by second genotyping method.

    Clayton DG et, Nat Genet 2005; 37: 1243-46.


Biases in the analysis and presentation of data l.jpg

Biases in the Analysis and Presentation of Data

Environmental exposure information bias:

Lack of collection or presentation of known environmental causes of the disease or comparisons between cases and controls.

Confounding control bias: Lack of statistical adjustment or stratified analysis in presence of potential confounding.


Characteristics of 109 gwas confounding l.jpg

Characteristics of 109 GWAS: Confounding

  • Few comparisons of environmental exposures known to predispose to disease between cases and controls.

    • Table comparing cases and controls: 36%

    • Statistical comparison of cases/controls: 3.5%

    • Statistical adjustment for differences: 16%

    • Stratified analysis by confounder group: 16%


Distribution of three known risk factors for neovascular amd in a gwa dewan a et al science 2006 l.jpg

Distribution of Three Known Risk Factors for Neovascular AMD in a GWA[DeWan A et al, Science 2006]

DeWan A et al, Science 2006; 314:989-992.


Confounding l.jpg

  • Confounder: “A factor that distorts the apparent magnitude of the effect of a study factor on risk. Such a factor is a determinant of the outcome of interest and is unequally distributed among the exposed and the unexposed” (Last, 1983).

    • Associated with exposure

    • Independent cause or predictor of disease

    • Not an intermediate step in causal pathway

Confounding

C

E D

E C D

Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003.


Fto variants type 2 diabetes and obesity l.jpg

FTO Variants, Type 2 Diabetes, and Obesity*

Diabetes Association

CohortOR[ 95% CI ] P

WTCCC phase 1 1.2 [1.16-1.37] 2xE-8

WTCCC phase 2 1.22 [1.12-1.32] 5xE-7

DGI 1.03 [0.91-1.71] 0.25

Frayling, 2007 and Zeggini, 2007


Fto variants type 2 diabetes and obesity39 l.jpg

FTO Variants, Type 2 Diabetes, and Obesity*

BMI Association

TTAT AA

WTCC Cases30.230.5 32.0

WTCC Controls26.326.3 27.1

*Frayling 2007 and Zeggini 2007


Fto variants type 2 diabetes and obesity40 l.jpg

FTO Variants, Type 2 Diabetes, and Obesity*

Diabetes Association

Cohort OR[+/-95%] P

WTCCC phase 1 1.27 [1.16-1.37] 2xE-8

WTCCC phase 2 1.22 [1.12-1.32] 5xE-7

DGI 1.03 [0.91-1.71] 0.25

Diabetes Association

Adjusted for BMI

WTCCC phase 2 1.03 [0.96-1.10] 0.44

Frayling TM,et al. Science 2007; 316: 889-894.

Zeggini E, et al. Science 2007; 316: 1336-1341.


Dealing with confounders l.jpg

Dealing with Confounders

  • In design

    • Randomize

    • Restrict: confine study subjects to those within specified category of confounder

    • Match: select cases and controls so confounders equally distributed

  • In analysis

    • Standardize: for age, gender, time

    • Stratify: separate sample into subsamples according to specified criteria

    • Multivariate analysis: adjust for many confounders

      Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003


Biases in the analysis and presentation of data cont l.jpg

Biases in the Analysis and Presentation of Data (Cont.)

  • Alpha error control bias: Lack of correction of level of alpha error accepted as significant.

  • Data dredging bias: Lack of replication studies testing hypotheses identified in a discovery study.

  • The winner’s curse: The overestimation of the effect size in discovery GWAS at the extremes of their range with inability to replicate the odds ratios due to lack of adequate power to identify the true odds ratio of smaller magnitude.


Prostate cancer 7 novel snps in discovery and replication studies43 l.jpg

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies

Discovery Replication

SNP OR 95%CI OR 95%CI

rs2660753 1.52 1.30-1.77 1.18 1.06-1.31

rs9364554 1.28 1.16-1.41 1.17 1.08-1.26

rs6465657 1.30 1.19-1.43 1.12 1.05-1.20

rs7920517 1.39 1.27-1.53 1.22 1.14-1.31

rs10993994 1.62 1.47-1.78 1.25 1.17-1.34

rs7931342 0.79 0.72-0.86 0.84 0.79-0.90

rs7931342 1.39 1.23-1.57 1.03 0.94-1.14

Eeles RA et al: Nat Gen 2/10/08


Slide44 l.jpg

Larson, G. The Complete Far Side. 2003.


Interpretation biases in genomic research l.jpg

Interpretation Biases in Genomic Research*

  • Confirmation bias: evaluating evidence that supports one’s preconceptions differently from evidence that challenges these convictions.

  • Rescue bias: discounting data by finding selective faults in the experiments

  • Mechanism bias: being less skeptical when underlying science furnishes credibility for the data.

    *Kaptchuk TJ. BMJ 2003; 326: 1453-5.


Information to be included in initial report l.jpg

Information to be Included in Initial Report

  • Study information:

    • Source of cases and controls

    • Methods used for defining disease or trait

    • Participation rates and flow chart of selection

    • Standard “Table 1,” including rates of missing data

    • Success rate of DNA acquisition, comparability

  • Genotyping and quality control procedures

  • Results

    • Analysis methods in sufficient detail to understand and reproduce what was done

    • Simple single-locus and multi-marker (haplotype) association analyses

    • Significance of any known 'positive controls'

      Chanock, Manolio et al, Nature 2007; 447: 655-660


Controlling bias in genomic research design l.jpg

Controlling Bias in Genomic Research: Design

  • Define population to be studied

  • Maximize representativeness

  • Use standard, reproducible methods for assignment of case/control status

  • Use incident cases

  • Select controls eligible to become cases

  • Estimate and maximize participation rates

  • Apply standard genotyping QC methods

  • Replicate positive findings on different genotyping platform


Controlling bias in genomic research analysis l.jpg

Controlling Bias in Genomic Research: Analysis

  • Describe sources and methods of ascertaining cases and controls

  • Compare participants and non-participants

  • Compare cases and controls

  • Stratify and adjust for important confounders (including population stratification)

  • Stratify and test for important interactions

  • Report results of genotyping QC

  • Report results of prior known associations


Slide50 l.jpg

Larson, G. The Complete Far Side. 2003.


  • Login