HapMap:
Download
1 / 55

- PowerPoint PPT Presentation


  • 203 Views
  • Uploaded on

HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium. Goals of this segment. Briefly summarize HapMap design and current status

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

HapMap:

application in the design and interpretation of association studies

Mark J. Daly, PhD

on behalf of

The International HapMap Consortium


Goals of this segment
Goals of this segment

  • Briefly summarize HapMap design and current status

  • Discuss the application of HapMap to all aspects of association study design, analysis and interpretation


Hapmap project
HapMap Project

A freely-available public resource

to increase the power and efficiency

of genetic association studies to medical traits

High-density SNP genotyping across the genome provides information about

  • SNP validation, frequency, assay conditions

  • correlation structure of alleles in the genome

All data is freely available on the web for application

in study design and analyses as researchers see fit


Hapmap samples
HapMap Samples

  • 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI)

  • 90 individuals (30 trios) of European descent from Utah (CEU)

  • 45 Han Chinese individuals from Beijing (CHB)

  • 45 Japanese individuals from Tokyo (JPT)


Hapmap progress
HapMap progress

  • PHASE I – completed, described in Nature paper

    • * 1,000,000 SNPs successfully typed in all 270 HapMap samples

    • * ENCODE variation reference resource available

    • PHASE II –data generation complete, data released this past Monday

    • * >3,500,000 SNPs typed in total !!!


Encode hapmap variation project
ENCODE-HAPMAP variation project

  • Ten “typical” 500kb regions

  • 48 samples sequenced

  • All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples

  • Current data set – 1 SNP every 279 bp

A much more complete variation resource by which

the genome-wide map can evaluated


Completeness of dbsnp
Completeness of dbSNP

Vast majority of common SNPs are contained in or highly

correlated with a SNP in dbSNP


Recombination hotspots are widespread and account for ld structure
Recombination hotspots are widespreadand account for LD structure

7q21


Utility of ld in association study
Utility of LD in association study

  • “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.”


Coverage of phase ii hapmap estimated from encode data
Coverage of Phase II HapMap(estimated from ENCODE data)

Panel %r2 > 0.8 max r2

YRI 81 0.90

CEU 94 0.97

CHB+JPT 94 0.97

From Table 6 –

“A Haplotype Map of the Human Genome”, Nature


Coverage of phase ii hapmap estimated from encode data1
Coverage of Phase II HapMap(estimated from ENCODE data)

Panel %r2 > 0.8 max r2

YRI 81 0.90

CEU 94 0.97

CHB+JPT 94 0.97

Percentage of deeply ascertained common variants

highly correlated with a HapMap SNP

From Table 6 –

“A Haplotype Map of the Human Genome”, Nature


Coverage of phase ii hapmap estimated from encode data2
Coverage of Phase II HapMap(estimated from ENCODE data)

Panel %r2 > 0.8 max r2

YRI 81 0.90

CEU 94 0.97

CHB+JPT 94 0.97

Average maximum correlation between a deeply

ascertained variant and a neighboring HapMap SNP

From Table 6 –

“A Haplotype Map of the Human Genome”, Nature


Coverage of phase ii hapmap estimated from encode data3
Coverage of Phase II HapMap(estimated from ENCODE data)

Panel %r2 > 0.8 max r2

YRI 81% 0.90

CEU 94% 0.97

CHB+JPT 94% 0.97

Vast majority of common variation (MAF > .05)

captured by Phase II HapMap


Applying the hapmap
Applying the HapMap

  • Study design - tagging

  • Study coverage evaluation

  • Study analysis - improving association testing

  • Study interpretation

    • Comparison of multiple studies

    • Connection to genes/genomic features

    • Integration with expression and other functional data

  • Other uses of HapMap data

    • Admixture, LOH, selection


Tagging from hapmap
Tagging from HapMap

  • Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies


Pairwise tagging

G/C

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

G

G

A

A

G

G

G

T

T

G

G

A

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

A

T

T

G

G

G

C

C

C

high r2

high r2

high r2

Pairwise tagging

Tags:

SNP 1

SNP 3

SNP 6

3 in total

Test for association:

SNP 1

SNP 3

SNP 6

After Carlson et al. (2004) AJHG 74:106


Pairwise tagging efficiency
Pairwise Tagging Efficiency

Tag SNPs were picked to capture common SNPs in release 16c.1 for every

7,000 SNP bin using Haploview.

Tagging Phase I HapMap offers 2-5x gains in efficiency


Use of haplotypes can improve genotyping efficiency

G/C

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

A

A

G

G

G

G

G

T

T

G

G

A

A

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

T

T

A

A

G

G

G

C

C

C

Use of haplotypes can improve genotyping efficiency

Tags:

SNP 1

SNP 3

2 in total

Test for association:

SNP 1 captures 1+2

SNP 3 captures 3+5

“AG” haplotype captures SNP 4+6

Tags:

SNP 1

SNP 3

SNP 6

3 in total

Test for association:

SNP 1

SNP 3

SNP 6

tags in multi-marker test should be conditional on significance of LD in order to avoid overfitting


Efficiency and power
Efficiency and power

tag SNPs

~300,000 tag SNPs

needed to cover common

variation in whole genome

in CEU

Relative power (%)

random

SNPs

Average marker density (per kb)

P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005


How to pick tag snps
How to pick tag SNPs?

  • What is the genetic hypothesis? Which variants do you want to test for a role in disease?

    • functional annotation (coding SNPs)

    • allele frequency (HapMap ascertainment)

    • previously implicated associations

  • Go to http://www.hapmap.org – DCC supported interactive tagging

  • Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg)


Will tag snps picked from hapmap apply to other population samples
Will tag SNPs picked from HapMap apply to other population samples?

CEU

CEU

CEU

Utah residents with European ancestry(CEPH)

Whites from

Los Angeles, CA

Botnia, Finland

Population differences add very little inefficiency

Platform presentation: Paul de Bakker (#223: Sat 9.30)


Applying the hapmap1
Applying the HapMap samples?

  • Study design - tagging

  • Study coverage evaluation

  • Study analysis - improving association testing

  • Study interpretation

    • Comparison of multiple studies

    • Connection to genes/genomic features

    • Integration with expression and other functional data

  • Other uses of HapMap data

    • Admixture, LOH, selection


Genome wide association coverage
Genome-wide association coverage samples?

  • If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product

    • ENCODE (deep ascertainment)

    • Phase II (dense, genome-wide)


Association tests with fixed markers

G/C samples?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

A

A

G

G

G

G

G

T

T

G

A

G

A

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

A

A

A

A

G

C

G

G

C

C

C

C

Association tests with fixed markers

Tests of association:

SNP 1

SNP 3

= SNP on whole-genome product

(~1 - 5% common variation directly assayed)


Association tests with fixed markers1

G/C samples?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

G

G

A

A

G

G

G

T

T

A

G

G

A

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

A

A

A

A

G

C

G

G

C

C

C

C

high r2

high r2

Association tests with fixed markers

Tests of association:

SNP 1

SNP 3


Association tests with fixed markers2

G/C samples?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

A

A

G

G

G

T

T

A

G

A

C

C

C

C

C

C

C

C

C

C

C

C

T

T

A

A

G

C

G

C

C

C

high r2

high r2

Association tests with fixed markers

Tests of association:

SNP 1

SNP 3

SNPs actually tested:

SNP 1

SNP 3

SNP 2

SNP 5


Genome wide products can capture most common variation
Genome-wide products can capture most common variation samples?

Example: 500K data generated by Affymetrix and

recently submitted to HapMap DCC


More on this topic
More on this topic samples?

  • Platform presentations tomorrow morning 8 AM sharp:

    • Peer

    • Jorgenson

    • Lazarus

    • As well as several detailed posters!


Applying the hapmap2
Applying the HapMap samples?

  • Study design - tagging

  • Study coverage evaluation

  • Study analysis - improving association testing

  • Study interpretation

    • Comparison of multiple studies

    • Connection to genes/genomic features

    • Integration with expression and other functional data

  • Other uses of HapMap data

    • Admixture, LOH, selection


Can incorporating tests of haplotypes of snps on the genome wide product improve this coverage

Can incorporating tests of haplotypes of SNPs on the genome-wide product improve this coverage?


Improving association power using data from hapmap

G/C genome-wide product improve this coverage?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

G

G

A

A

G

T

T

G

A

A

C

C

C

C

C

C

C

C

C

C

C

C

A

A

T

T

G

C

G

C

C

C

Improving association power using data from HapMap

Tests of association:

SNP 1

SNP 3

SNPs actually tested:

SNP 1

SNP 3

SNP 2

SNP 5


Improving association power using data from hapmap1

G/C genome-wide product improve this coverage?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

G

G

A

A

G

T

T

G

A

A

C

C

C

C

C

C

C

C

C

C

C

C

A

A

T

T

G

C

G

C

C

C

Improving association power using data from HapMap

Tests of association:

SNP 1

SNP 3

SNPs actually tested:

SNP 1

SNP 3

SNP 2

SNP 5


Improving association power using data from hapmap2

G/C genome-wide product improve this coverage?

3

G/A

2

T/C

4

G/C

5

A/T

1

A/C

6

A

A

G

G

G

T

G

A

C

C

C

C

C

C

C

C

A

A

T

T

G

G

C

C

Improving association power using data from HapMap

Tests of association:

SNP 1

SNP 3

“AG haplotype”

SNPs actually tested:

SNP 1

SNP 3

SNP 2

SNP 5

SNP 4

SNP 6


Haplotypes increase coverage
Haplotypes increase coverage genome-wide product improve this coverage?


Applying the hapmap3
Applying the HapMap genome-wide product improve this coverage?

  • Study design - tagging

  • Study coverage evaluation

  • Study analysis - improving association testing

  • Study interpretation

    • Connection to genes/genomic features

    • Comparison of multiple association studies

    • Integration with expression and other functional data

  • Other uses of HapMap data

    • Admixture, LOH, selection


Integration with genomic features
Integration with genomic features genome-wide product improve this coverage?

  • Positive association to a SNP on HapMap enables detailed interpretation:

    • How many other SNPs are in LD with this SNP?

    • What genes are in LD with this SNP?

    • What coding variants and putative functional variants are in LD with this SNP?

      Potential to improve power by modifying Bayesian priors

      of each association test based on this information


Example complement factor h amd
Example: genome-wide product improve this coverage?Complement Factor H - AMD

  • Original SNP hit in Affy 100K experiment – rs380390

  • Extent and structure of LD from HapMap aids in the fine mapping phase of project

Klein et al Science 2005


Example complement factor h amd1
Example: genome-wide product improve this coverage?Complement Factor H - AMD

rs380390


Example complement factor h amd2
Example: genome-wide product improve this coverage?Complement Factor H - AMD

rs380390


Meta analysis of association studies
Meta-analysis of association studies genome-wide product improve this coverage?

  • When different marker sets are used to study association (candidate gene or genome-wide), results can be readily integrated when all markers are typed on HapMap samples


Example dtnbp1 and schizophrenia
Example: DTNBP1 and schizophrenia genome-wide product improve this coverage?

  • Multiple studies have described modest association to schizophrenia

  • Most studies have examined small numbers of non-overlapping sets of SNPs

  • HapMap data can be used to determine whether these association finding

Derek Morris, Mousumi Mutsuddi (WCPG meeting)


Extensive ld across dtnbp1
Extensive LD across DTNBP1 genome-wide product improve this coverage?

Phase II

HapMap -

186 SNPs

180 kb


Phylogeny of dtnbp1 tag snps

2 genome-wide product improve this coverage?

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

4 (GA), 5 (CT)

10 (AT)

7(CT)

2 (AG)

3 (GA)

Phylogeny of DTNBP1 tag SNPs

Ancestral haplotype

6% 33% 42% 8% 11%


Associated alleles reported

Tag SNPs genome-wide product improve this coverage?

2

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

Associated alleles reported

Straub 2002

Van den Oord 2003


Associated alleles reported1

Tag SNPs genome-wide product improve this coverage?

2

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

Associated alleles reported

Straub 2002

Van den Oord 2003

Schwab 2003


Associated alleles reported2

Tag SNPs genome-wide product improve this coverage?

2

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

Associated alleles reported

Straub 2002

Van den Oord 2003

Van den Bogaert 2003

Funke 2004

Schwab 2003


Associated alleles reported3

Tag SNPs genome-wide product improve this coverage?

2

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

Associated alleles reported

Straub 2002

Van den Oord 2003

Williams 2004

Bray 2005

Van den Bogaert 2003

Funke 2004

Schwab 2003


Associated alleles reported4

Tag SNPs genome-wide product improve this coverage?

2

3

4

5

7

10

AGGCCA

AAGCCT

AGGCCT

AGGCCA

AGATTA

GGATCA

Associated alleles reported

Kirov 2004

Straub 2002

Van den Oord 2003

Williams 2004

Bray 2005

Van den Bogaert 2003

Funke 2004

Schwab 2003


Inconsistent findings
Inconsistent findings genome-wide product improve this coverage?

  • No consistently associated SNP/haplotype pattern across studies

  • All studies (European-derived populations) had allele/haplotype frequencies compatible with HapMap-CEU sample

  • HapMap can successfully relate associations from diverse marker sets


Other applications structural variation
Other Applications – genome-wide product improve this coverage?Structural Variation

  • 3 papers coming out in the next month describe use of HapMap data to identify large, common deletion polymorphisms

  • LD around these polymorphisms permits their assessment with tag SNPs/haplotypes in genome-wide association studies


Other applications admixture scanning
Other Applications – genome-wide product improve this coverage?Admixture Scanning

  • HapMap data provides a rich source of highly differentiated SNPs for design of admixture panels

  • Fine mapping of admixture signals can be focused on the full set of highly differentiated alleles in any region of the genome


Other applications loh
Other Applications – genome-wide product improve this coverage?LOH

  • HapMap identifies

    • Regions of extended LD that may manifest themselves as unusually long stretches of homozygosity in individual samples

    • The catalog of large deletion variants on the HapMap will differentiate between LOH that is potentially de novo and causal, and that which is simply commonly segregating in the population

LOH analysis cognizant of HapMap patterns under development


Early results encouraging
Early results encouraging genome-wide product improve this coverage?

  • At this meeting

    • Arking and colleagues describe identification of variant altering QT-interval

    • Herbert and colleagues describe a novel gene for obesity

    • Wijmenga and colleagues describe a novel gene for celiac disease


ad