slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SNP Resources: Finding SNPs Discovery and Databases PowerPoint Presentation
Download Presentation
SNP Resources: Finding SNPs Discovery and Databases

Loading in 2 Seconds...

play fullscreen
1 / 53

SNP Resources: Finding SNPs Discovery and Databases - PowerPoint PPT Presentation


  • 352 Views
  • Uploaded on

SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006 SNP Resources: SNP discovery and cataloging SNP discovery/genotyping: Genome-wide approaches The current state of SNP resources Comprehensive SNP discovery

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SNP Resources: Finding SNPs Discovery and Databases' - jacob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

SNP Resources: Finding SNPs

Discovery and Databases

Mark J. Rieder, PhD

SeattleSNPs Workshop

March 20-21, 2006

slide2

SNP Resources: SNP discovery and cataloging

  • SNP discovery/genotyping: Genome-wide approaches
  • The current state of SNP resources
  • Comprehensive SNP discovery

Seattle SNPs - Program for Genomic Application

SNP Databases - “How to” Manual for finding SNPs

In class - Tutorial

slide3

Genetic Markers: Overview

  • RFLPs (SNPs circa 1980) - sequence variants that lead to a

change in restriction site detected by Southern Blot Analysis

2. Microsatellites (di-, tri-, tetranucleotide repeats)

    • 1/50,000 bp
    • Linkage Studies - 300-600 markers (~1 Mbp)
    • Multi-allelic/High heterozygosity/informative
    • Complex genotyping assays

3. Single Nucleotide Polymorphisms (SNPs)

    • Most frequent genetic variant (base substitutions)
    • 1/1000 bp (comparing two randomly selected chromosomes)
    • Biallelic/less informative
    • Simplified genotyping platforms (+/- calling)
slide4

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp

How has SNP discovery progressed toward this goal?

slide5

Finding SNPs: Marker Discovery and Methods

SNP discovery has proceeded in two distinct phases:

1 - SNP Identification

Define the alleles

Map this to a unique place in the genome

2 - SNP Characterization

Determination of the genotype in many individuals

Population frequency of SNPs

slide6

Finding SNPs: Marker Discovery and Methods

SNP Discovery has proceeded in two distinct phases:

1 - SNP Discovery**

Human Genome Project - overlaps of BAC

clones and ESTs

The SNP Consortium - Reduced

Representation Sequencing

The HapMap

2 - SNP Discovery/Characterization**

The HapMap

slide7

Genomic

mRNA

BAC

Library

RRS

Library

cDNA

Library

BAC

Overlap

Shotgun

Overlap

EST

Overlap

Finding SNPs: Sequence-based SNP Mining

How do you find SNPs if you don’t have a reference sequence

RT errors?

DNA

SEQUENCING

Sequencing

Quality

Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

slide8

Finding SNPs: Sequence-based SNP Mining

BAC = Bacterial Artificial Chromosome

Primary vector for DNA cloning in the HGP

DNA from multiple individuals

Clone large fragments into BACs

(unknown sequence)

Fragment DNA

Sequence and Reassemble

(known sequence)

Assembly with other overlapping BACs

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

slide9

Finding SNPs: Marker Discovery and Methods

$ 45 Million - 2 years (1999, 2001 - 2003)

Goals: Identify 300,000 SNPs and map 150,000 (April 1999)

Determine allele frequency of SNPs

If you don’t have a reference genome - how do you find SNPs?

slide10

Finding SNPs: Sequence-based SNP Mining

RRS = Reduced Representation Sequencing

Genomic DNA (multiple individuals)

RE to generate

fragments

Clone DNA fragments

into plasmid vectors

Altshuler, et al. Nature (2000)

Sequence and align and cluster

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

From overlap identify mismatches = SNPs

slide11

TSC and HGP: High Resolution SNP Map

Feb. 2001 - Human Genome Project and TSC

slide12

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1 - 5% MAF) - 1/300 bp

Feb 2001 - 1.42 million (1/1900 bp)

slide13

SNP Discovery: dbSNP database

dbSNP

-NCBI SNP database

slide14

Unique mapping

to a genome location

(reference SNP = rs#)

SNPs submitted

By research communtiy

(submitted SNPs = ss#)

(by 2hit-2allele)

SNP data submitted to dbSNP: Clustering

dbSNP processing of SNPs

slide15

Finding SNPs: Marker Discovery and Methods

SNP Discovery has proceeded in two distinct phases:

1 - SNP Discovery**

Human Genome Project - overlaps of BAC

clones and ESTs

The SNP Consortium - Reduced

Representation Sequencing

The HapMap

2 - SNP Discovery/Characterization**

The HapMap

slide16

HapMap Project Proposed: Map more SNPs and genotype

  • Increase SNP density over the first 6 - 12 months
  • Ultimately produce a fine scale genetic map (HapMap) which
    • would serve as a common resource for all biomedical reseseachers
  • Genotype 600,000 SNPs genome-wide
  • Four populations: CEPH (Europe), Yoruban (Africa), Japanese/
    • Chinese (Asian)
slide17

Nov 2003 - 5.7 million (2 million validated) - 1/1500 bp

Feb 2004 - 7.2 million (3.3 million validated) - 1/900 bp

Genomic DNA

(multiple individuals)

Random Shotgun Sequencing

Sequence and align

(reference sequence)

Draft Human Genome

GTTACGCCAATACAGGATCCAGGAGATTACC

HapMap SNP Discovery: Prior to Genotyping

Initiation of project planning (July 2001):

2.8 million SNPs (1.4 million validated) - 1/1900 bp

Generate more SNPs:

Other Sources of SNPs:

Perlegen (Affymetrix chips) SNP data (chr22)

Sequence chromatograms from Celera project

TACGCCTATA

TCAAGGAGAT

slide18

HapMap SNP Discovery

HapMap Discovery Increased SNP Density and Validated SNPs

10 million

rs SNPs

5 million

validated

rs SNPs

slide19

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp

Feb 2001 - 1.42 million (1/1900 bp)

Nov 2003 - 2.0 million (1/1500 bp)

Feb 2004- 3.3 million (1/900 bp)

Mar 2005 - 5.0 million (validated - 1/600 bp)

When will we have them all?

slide20

mRNA

cDNA

Library

BAC

Library

EST

Overlap

BAC

Overlap

Finding SNPs: Sequence-based SNP Mining

Genomic

RRS

Library

Random

Shotgun

DNA

SEQUENCING

Shotgun

Overlap

Align to

Reference

RANDOM Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

slide21

1.0

8

Fraction of SNPs Discovered

0.5

2

0.0

0.0

0.1

0.2

0.3

0.4

0.5

Minor Allele Frequency (MAF)

SNP discovery is dependent on your sample population size

{

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

2 chromosomes

8

slide22

SNP Characterization/Genotyping

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

Mar 2005 - 5.0 million (validated/mapped - 1/600 bp)

5.0/10.0 = 50% of all common SNPs (validated)!

slide23

HapMap Project Proposed: Map more SNPs and genotype

  • Genotype 600,000 SNPs genome-wide
  • Four populations:
    • CEPH (CEU) (Europe - n = 90, trios)
    • Yoruban (YRI) (Africa - n = 90, trios)
    • Japanese (JPT) (Asian - n = 45)
    • Chinese (HCB) (Asian - n =45)
slide24

Finding SNPs: Genotype Data Adds Value to SNPs

HapMap Genotyping

  • Confirms SNP as “real” and “informative”
  • Minor Allele Frequency (MAF) - common or rare
  • MAF in different populations
  • Detection of SNP x SNP correlations

(Linkage Disequilibrium)

  • Determine haplotypes
slide26

Perlegen Large-scale Genotyping Capacity

1.58 millions SNPs genotyped

71 individuals from 3 American populations

European, African and Asian ancestry

hapmap completion
HapMap Completion

Nature - Oct 27 (2005)

HapMap + Perlegen

slide30

Current State of dbSNP

Many SNPs left to validate and characterize.

slide31

15,367 dbSNP

16,248 New SNPs

50% of SNPs in dbSNP

5 Mb/31,500 SNPs =

1/160 bp

Increasing SNP Density: HapMap ENCODE Project

ENCODE = ENCyclopedia Of DNA Elements

Catalog all functional elements in 1% of the genome (30 Mb)

10 Regions x 500 kb/region (Pilot Project)

David Altschuler (Broad), Richard Gibbs (Baylor)

16 CEU, 16 YRI, 8 HCB, 8 JPT

Comprehensive PCR based resequencing across these regions

slide32

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

Mar 2005 - 5.0 million (validated - 1/600 bp)

~4.0 million validated SNPs with genotypes!

(HapMap confirmed, allele frequency/population,

SNPxSNP correlations (LD), haplotypes)

slide33

1.0

96

48

24

16

8

8

Fraction of SNPs Discovered

0.5

2

0.0

0.0

0.1

0.2

0.3

0.4

0.5

Minor Allele Frequency (MAF)

SNP discovery is dependent on your sample population size

{

GTTACGCCAATACAGGATCCAGGAGATTACC

GTTACGCCAATACAGCATCCAGGAGATTACC

2 chromosomes

slide34

Goal: Comprehensively identify all common sequence variation in candidate genes

Initial biological focus: Inflammation, Coagulation, Complement

Approach: Direct resequencing of genes

Sample:p1 = 24 African-Americans, 23 CEPH parents, 1 chimp

p2 = 24 HapMap-YRI, 23 HapMap-CEU, 1 gorilla

Status: 271 genes (21 kb ave)

slide35

Targeted SNP Discovery

Directed analysis: cSNPs

5’

3’

Val-Val

Arg-Cys

PCR amplicons

Complete analysis: cSNP and Haplotype Structure Analysis

5’

3’

Arg-Cys

Val-Val

PCR amplicons

  • Generate SNP data from complete genomic resequencing
    • (i.e. 5’ regulatory, exon, intron, 3’ regulatory sequence)
slide36

Comprehensive SNP Discovery: Resequencing

  • Overlapping PCR Amplicons across entire gene
  • Make no assumptions about sequence function
  • Sequence diversity and genetic structure for each gene is different
  • Proper association studies can only be designed in this context
  • Complete resequencing facilitates population genetics methods
slide37

Sequence-based SNP Identification

Sequence

Amplify DNA

Phred

Phrap

Base-calling

Contig assembly

Sequence each end

5’

3’

Quality determination

Final quality determination

of the fragment.

PolyPhred

Polymorphism detection

Sequencing Capacity:

ABI 3730 – 96 capillary

~2000 individual chromatograms/day

~1,00,000 bp/day

~20 kbp sequence scanned in 48 individuals/wk

Consed

Sequence viewing

Polymorphism tagging

Analysis

Polymorphism reporting

Individual genotyping

Phylogenetic analysis

slide39

Homozygous

C/C

Heterozygous

C/T

Homozygous

T/T

slide41

Aston-Martin of SNP Detection

- PolyPhred 5.0

* Matthew Stephens

Peggy Dyer-Robertson

Jim Sloan

C/C

C/T

C/T

T/T

slide42

Comparison PolyPhred v4.29 versus v5.0

PolyPhred v4.29

PolyPhred v5.0

slide43

PolyPhred 5 Scores - Provide Quantitative Assessment of

SNP Genotype

PolyPhred 5.0

Perlegen

Double-Coverage - Automation = 93% of all SNPs, 100% of high-frequency SNPs, with no false positive SNPs identified, and 99.9% genotyping accuracy.

slide44

Comparison of PolyPhred v5.0 with others

Mutation Surveyor

PolyPhred v4.29

novoSNP

PolyPhred v5.0

slide46

60%

SNP Discovery: dbSNP database

NIEHS SNPs

dbSNP (Perlegen/HapMap)

{

{

15%

25%

Minor Allele Freq. (MAF)

Rarer and population specific SNPs are found by resequencing

slide47

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

HapMap ENCODE = 1/160 bp (n = 48, 4 pops)

SeattleSNPs = 1/180 bp (n = 48, 2 pops)

NIEHS SNPs = 1/180 bp (n = 95, 4 pops)

Comprehensive resequencing can identify the

vast majority of SNPs in a region

slide48

Study Designs Utilizing Resequencing Approaches

Detection of SNPs in an unstudied population (or not genotyped)

Population specific rare SNPs

Detection of SNPs directly in study samples (no discovery panel)

Sequencing for Association

Detection of SNPs in samples representing the tails of the

phenotypic distribution

How much do rare, possibly more penetrant SNPs, explain of

extreme phenotypes

seattlesnps characterization

p1 = 47 individuals - 23 CEPH and 24 African-American

(Non-HapMap, mostly)

Selection of informative (high frequency, coding, etc) SNPs to be genotyped in defined populations (~4500 SNPs)

SeattleSNPs Characterization

HapMap Populations

European (CEU,n=60)

African (YRI, n =60)

Asian (HCB, n = 45 and JPT, n = 45)

Non-HapMap Populations

Hispanic (n = 60)

African-American (n = 62)

illumina seattlesnps genotyping
Illumina SeattleSNPs Genotyping
  • Each well samples 1536 SNPs in one individual
  • For each HapMap sample 3 x 1536 ( 4600 genotyped SNPs)
  • 1,764,472 genotypes generated (total ~384 samples)
slide51

0.341

0.274

0 . 282

0.931

0.415

0 . 610

0.626

0.533

0.841

0 . 410

0 . 761

0 . 952

0 . 765

0 . 420

0 . 589

African

American

Japanese

Chinese

CEPH

Hispanic

Yoruban

CEPH

Japanese

Chinese

African

American

Hispanic

Population Allele Frequency Correlations

Illumina NIEHS SNP Genotyping

slide52

SeattleSNPs Genotype Data

P1 (~180 genes)

SNPs characterized in six

different major populations.

slide53

Summary: The Current State of SNP Resources

  • SNPs have been rapidly adopted as the genetic marker of choice.
  • Approximately 10 million common SNPs exist in the human genome (1/300 bp).
  • Random SNP discovery processes generate many SNPs (TSC and HapMap).
  • Random approaches to SNPs discovery have reached limits of discovery

and validation (1/600 bp; 50% SNP validation)

  • Most validated SNPs (5 million) will be genotyped by the HapMap (3 pops)
  • Resequencing approaches continue to catalog important variants (rarer)