SNP Resources:  Finding SNPs
Download
1 / 53

2.Rieder-Pt-I-MarchPGA - PowerPoint PPT Presentation


  • 332 Views
  • Uploaded on

SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006 SNP Resources: SNP discovery and cataloging SNP discovery/genotyping: Genome-wide approaches The current state of SNP resources Comprehensive SNP discovery

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '2.Rieder-Pt-I-MarchPGA' - jacob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

SNP Resources: Finding SNPs

Discovery and Databases

Mark J. Rieder, PhD

SeattleSNPs Workshop

March 20-21, 2006


Slide2 l.jpg

SNP Resources: SNP discovery and cataloging

  • SNP discovery/genotyping: Genome-wide approaches

  • The current state of SNP resources

  • Comprehensive SNP discovery

    Seattle SNPs - Program for Genomic Application

    SNP Databases - “How to” Manual for finding SNPs

    In class - Tutorial


Slide3 l.jpg

Genetic Markers: Overview

  • RFLPs (SNPs circa 1980) - sequence variants that lead to a

    change in restriction site detected by Southern Blot Analysis

    2. Microsatellites (di-, tri-, tetranucleotide repeats)

    • 1/50,000 bp

    • Linkage Studies - 300-600 markers (~1 Mbp)

    • Multi-allelic/High heterozygosity/informative

    • Complex genotyping assays

      3. Single Nucleotide Polymorphisms (SNPs)

    • Most frequent genetic variant (base substitutions)

    • 1/1000 bp (comparing two randomly selected chromosomes)

    • Biallelic/less informative

    • Simplified genotyping platforms (+/- calling)


Slide4 l.jpg

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp

How has SNP discovery progressed toward this goal?


Slide5 l.jpg

Finding SNPs: Marker Discovery and Methods

SNP discovery has proceeded in two distinct phases:

1 - SNP Identification

Define the alleles

Map this to a unique place in the genome

2 - SNP Characterization

Determination of the genotype in many individuals

Population frequency of SNPs


Slide6 l.jpg

Finding SNPs: Marker Discovery and Methods

SNP Discovery has proceeded in two distinct phases:

1 - SNP Discovery**

Human Genome Project - overlaps of BAC

clones and ESTs

The SNP Consortium - Reduced

Representation Sequencing

The HapMap

2 - SNP Discovery/Characterization**

The HapMap


Slide7 l.jpg

Genomic

mRNA

BAC

Library

RRS

Library

cDNA

Library

BAC

Overlap

Shotgun

Overlap

EST

Overlap

Finding SNPs: Sequence-based SNP Mining

How do you find SNPs if you don’t have a reference sequence

RT errors?

DNA

SEQUENCING

Sequencing

Quality

Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC


Slide8 l.jpg

Finding SNPs: Sequence-based SNP Mining

BAC = Bacterial Artificial Chromosome

Primary vector for DNA cloning in the HGP

DNA from multiple individuals

Clone large fragments into BACs

(unknown sequence)

Fragment DNA

Sequence and Reassemble

(known sequence)

Assembly with other overlapping BACs

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC


Slide9 l.jpg

Finding SNPs: Marker Discovery and Methods

$ 45 Million - 2 years (1999, 2001 - 2003)

Goals: Identify 300,000 SNPs and map 150,000 (April 1999)

Determine allele frequency of SNPs

If you don’t have a reference genome - how do you find SNPs?


Slide10 l.jpg

Finding SNPs: Sequence-based SNP Mining

RRS = Reduced Representation Sequencing

Genomic DNA (multiple individuals)

RE to generate

fragments

Clone DNA fragments

into plasmid vectors

Altshuler, et al. Nature (2000)

Sequence and align and cluster

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

From overlap identify mismatches = SNPs


Slide11 l.jpg

TSC and HGP: High Resolution SNP Map

Feb. 2001 - Human Genome Project and TSC


Slide12 l.jpg

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1 - 5% MAF) - 1/300 bp

Feb 2001 - 1.42 million (1/1900 bp)


Slide13 l.jpg

SNP Discovery: dbSNP database

dbSNP

-NCBI SNP database


Slide14 l.jpg

Unique mapping

to a genome location

(reference SNP = rs#)

SNPs submitted

By research communtiy

(submitted SNPs = ss#)

(by 2hit-2allele)

SNP data submitted to dbSNP: Clustering

dbSNP processing of SNPs


Slide15 l.jpg

Finding SNPs: Marker Discovery and Methods

SNP Discovery has proceeded in two distinct phases:

1 - SNP Discovery**

Human Genome Project - overlaps of BAC

clones and ESTs

The SNP Consortium - Reduced

Representation Sequencing

The HapMap

2 - SNP Discovery/Characterization**

The HapMap


Slide16 l.jpg

HapMap Project Proposed: Map more SNPs and genotype

  • Increase SNP density over the first 6 - 12 months

  • Ultimately produce a fine scale genetic map (HapMap) which

    • would serve as a common resource for all biomedical reseseachers

  • Genotype 600,000 SNPs genome-wide

  • Four populations: CEPH (Europe), Yoruban (Africa), Japanese/

    • Chinese (Asian)


Slide17 l.jpg

Nov 2003 - 5.7 million (2 million validated) - 1/1500 bp

Feb 2004 - 7.2 million (3.3 million validated) - 1/900 bp

Genomic DNA

(multiple individuals)

Random Shotgun Sequencing

Sequence and align

(reference sequence)

Draft Human Genome

GTTACGCCAATACAGGATCCAGGAGATTACC

HapMap SNP Discovery: Prior to Genotyping

Initiation of project planning (July 2001):

2.8 million SNPs (1.4 million validated) - 1/1900 bp

Generate more SNPs:

Other Sources of SNPs:

Perlegen (Affymetrix chips) SNP data (chr22)

Sequence chromatograms from Celera project

TACGCCTATA

TCAAGGAGAT


Slide18 l.jpg

HapMap SNP Discovery

HapMap Discovery Increased SNP Density and Validated SNPs

10 million

rs SNPs

5 million

validated

rs SNPs


Slide19 l.jpg

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp

Feb 2001 - 1.42 million (1/1900 bp)

Nov 2003 - 2.0 million (1/1500 bp)

Feb 2004- 3.3 million (1/900 bp)

Mar 2005 - 5.0 million (validated - 1/600 bp)

When will we have them all?


Slide20 l.jpg

mRNA

cDNA

Library

BAC

Library

EST

Overlap

BAC

Overlap

Finding SNPs: Sequence-based SNP Mining

Genomic

RRS

Library

Random

Shotgun

DNA

SEQUENCING

Shotgun

Overlap

Align to

Reference

RANDOM Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC


Slide21 l.jpg

1.0

8

Fraction of SNPs Discovered

0.5

2

0.0

0.0

0.1

0.2

0.3

0.4

0.5

Minor Allele Frequency (MAF)

SNP discovery is dependent on your sample population size

{

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

2 chromosomes

8


Slide22 l.jpg

SNP Characterization/Genotyping

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

Mar 2005 - 5.0 million (validated/mapped - 1/600 bp)

5.0/10.0 = 50% of all common SNPs (validated)!


Slide23 l.jpg

HapMap Project Proposed: Map more SNPs and genotype

  • Genotype 600,000 SNPs genome-wide

  • Four populations:

    • CEPH (CEU) (Europe - n = 90, trios)

    • Yoruban (YRI) (Africa - n = 90, trios)

    • Japanese (JPT) (Asian - n = 45)

    • Chinese (HCB) (Asian - n =45)


Slide24 l.jpg

Finding SNPs: Genotype Data Adds Value to SNPs

HapMap Genotyping

  • Confirms SNP as “real” and “informative”

  • Minor Allele Frequency (MAF) - common or rare

  • MAF in different populations

  • Detection of SNP x SNP correlations

    (Linkage Disequilibrium)

  • Determine haplotypes



Slide26 l.jpg

Perlegen Large-scale Genotyping Capacity

1.58 millions SNPs genotyped

71 individuals from 3 American populations

European, African and Asian ancestry


Hapmap completion l.jpg
HapMap Completion

Nature - Oct 27 (2005)

HapMap + Perlegen


Slide29 l.jpg

dbSNP: Increasing numbers of SNPs now have genotype data

HapMap

Phase II

Perlegen

Perlegen

Data


Slide30 l.jpg

Current State of dbSNP

Many SNPs left to validate and characterize.


Slide31 l.jpg

15,367 dbSNP

16,248 New SNPs

50% of SNPs in dbSNP

5 Mb/31,500 SNPs =

1/160 bp

Increasing SNP Density: HapMap ENCODE Project

ENCODE = ENCyclopedia Of DNA Elements

Catalog all functional elements in 1% of the genome (30 Mb)

10 Regions x 500 kb/region (Pilot Project)

David Altschuler (Broad), Richard Gibbs (Baylor)

16 CEU, 16 YRI, 8 HCB, 8 JPT

Comprehensive PCR based resequencing across these regions


Slide32 l.jpg

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

Mar 2005 - 5.0 million (validated - 1/600 bp)

~4.0 million validated SNPs with genotypes!

(HapMap confirmed, allele frequency/population,

SNPxSNP correlations (LD), haplotypes)


Slide33 l.jpg

1.0

96

48

24

16

8

8

Fraction of SNPs Discovered

0.5

2

0.0

0.0

0.1

0.2

0.3

0.4

0.5

Minor Allele Frequency (MAF)

SNP discovery is dependent on your sample population size

{

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

2 chromosomes


Slide34 l.jpg

Goal: Comprehensively identify all common sequence variation in candidate genes

Initial biological focus: Inflammation, Coagulation, Complement

Approach: Direct resequencing of genes

Sample:p1 = 24 African-Americans, 23 CEPH parents, 1 chimp

p2 = 24 HapMap-YRI, 23 HapMap-CEU, 1 gorilla

Status: 271 genes (21 kb ave)


Slide35 l.jpg

Targeted SNP Discovery

Directed analysis: cSNPs

5’

3’

Val-Val

Arg-Cys

PCR amplicons

Complete analysis: cSNP and Haplotype Structure Analysis

5’

3’

Arg-Cys

Val-Val

PCR amplicons

  • Generate SNP data from complete genomic resequencing

    • (i.e. 5’ regulatory, exon, intron, 3’ regulatory sequence)


Slide36 l.jpg

Comprehensive SNP Discovery: Resequencing

  • Overlapping PCR Amplicons across entire gene

  • Make no assumptions about sequence function

  • Sequence diversity and genetic structure for each gene is different

  • Proper association studies can only be designed in this context

  • Complete resequencing facilitates population genetics methods


Slide37 l.jpg

Sequence-based SNP Identification

Sequence

Amplify DNA

Phred

Phrap

Base-calling

Contig assembly

Sequence each end

5’

3’

Quality determination

Final quality determination

of the fragment.

PolyPhred

Polymorphism detection

Sequencing Capacity:

ABI 3730 – 96 capillary

~2000 individual chromatograms/day

~1,00,000 bp/day

~20 kbp sequence scanned in 48 individuals/wk

Consed

Sequence viewing

Polymorphism tagging

Analysis

Polymorphism reporting

Individual genotyping

Phylogenetic analysis


Slide39 l.jpg

Homozygous

C/C

Heterozygous

C/T

Homozygous

T/T



Slide41 l.jpg

Aston-Martin of SNP Detection

- PolyPhred 5.0

* Matthew Stephens

Peggy Dyer-Robertson

Jim Sloan

C/C

C/T

C/T

T/T


Slide42 l.jpg

Comparison PolyPhred v4.29 versus v5.0

PolyPhred v4.29

PolyPhred v5.0


Slide43 l.jpg

PolyPhred 5 Scores - Provide Quantitative Assessment of

SNP Genotype

PolyPhred 5.0

Perlegen

Double-Coverage - Automation = 93% of all SNPs, 100% of high-frequency SNPs, with no false positive SNPs identified, and 99.9% genotyping accuracy.


Slide44 l.jpg

Comparison of PolyPhred v5.0 with others

Mutation Surveyor

PolyPhred v4.29

novoSNP

PolyPhred v5.0



Slide46 l.jpg

60%

SNP Discovery: dbSNP database

NIEHS SNPs

dbSNP (Perlegen/HapMap)

{

{

15%

25%

Minor Allele Freq. (MAF)

Rarer and population specific SNPs are found by resequencing


Slide47 l.jpg

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

HapMap ENCODE = 1/160 bp (n = 48, 4 pops)

SeattleSNPs = 1/180 bp (n = 48, 2 pops)

NIEHS SNPs = 1/180 bp (n = 95, 4 pops)

Comprehensive resequencing can identify the

vast majority of SNPs in a region


Slide48 l.jpg

Study Designs Utilizing Resequencing Approaches

Detection of SNPs in an unstudied population (or not genotyped)

Population specific rare SNPs

Detection of SNPs directly in study samples (no discovery panel)

Sequencing for Association

Detection of SNPs in samples representing the tails of the

phenotypic distribution

How much do rare, possibly more penetrant SNPs, explain of

extreme phenotypes


Seattlesnps characterization l.jpg

p1 = 47 individuals - 23 CEPH and 24 African-American

(Non-HapMap, mostly)

Selection of informative (high frequency, coding, etc) SNPs to be genotyped in defined populations (~4500 SNPs)

SeattleSNPs Characterization

HapMap Populations

European (CEU,n=60)

African (YRI, n =60)

Asian (HCB, n = 45 and JPT, n = 45)

Non-HapMap Populations

Hispanic (n = 60)

African-American (n = 62)


Illumina seattlesnps genotyping l.jpg
Illumina SeattleSNPs Genotyping

  • Each well samples 1536 SNPs in one individual

  • For each HapMap sample 3 x 1536 ( 4600 genotyped SNPs)

  • 1,764,472 genotypes generated (total ~384 samples)


Slide51 l.jpg

0.341

0.274

0 . 282

0.931

0.415

0 . 610

0.626

0.533

0.841

0 . 410

0 . 761

0 . 952

0 . 765

0 . 420

0 . 589

African

American

Japanese

Chinese

CEPH

Hispanic

Yoruban

CEPH

Japanese

Chinese

African

American

Hispanic

Population Allele Frequency Correlations

Illumina NIEHS SNP Genotyping


Slide52 l.jpg

SeattleSNPs Genotype Data

P1 (~180 genes)

SNPs characterized in six

different major populations.


Slide53 l.jpg

Summary: The Current State of SNP Resources

  • SNPs have been rapidly adopted as the genetic marker of choice.

  • Approximately 10 million common SNPs exist in the human genome (1/300 bp).

  • Random SNP discovery processes generate many SNPs (TSC and HapMap).

  • Random approaches to SNPs discovery have reached limits of discovery

    and validation (1/600 bp; 50% SNP validation)

  • Most validated SNPs (5 million) will be genotyped by the HapMap (3 pops)

  • Resequencing approaches continue to catalog important variants (rarer)