Mapping and Sequencing Genomes - PowerPoint PPT Presentation

Mapping and sequencing genomes l.jpg
1 / 41

Mapping and Sequencing Genomes. Sanger Sequencing. Sanger Sequencing. Sanger Sequencing-Critical Innovations. Lee Hood (1986) Radiolabelled ddNTPs Fluorescently Labelled ddNTPs eliminate radioactivity, multiplex reactions 2) Molly Craxton (1991)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Mapping and Sequencing Genomes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Mapping and sequencing genomes l.jpg

Mapping and Sequencing Genomes

Sanger sequencing l.jpg

Sanger Sequencing

Sanger sequencing3 l.jpg

Sanger Sequencing

Sanger sequencing critical innovations l.jpg

Sanger Sequencing-Critical Innovations

  • Lee Hood (1986)

  • Radiolabelled ddNTPsFluorescently Labelled ddNTPs

  • eliminate radioactivity, multiplex reactions

  • 2) Molly Craxton (1991)

  • thermostable polymerases, cycle-sequencing

  • significantly reduced amount of template required, 96-well format

  • Michael Hunkapiller, Lee Hood, Applied Biosystems (circa 1991)

  • slab gelscapillary gel electrophoresis

  • much easier/better automation, lane tracking, high speed runs, better resolution, 3 x 96 well/24 hours  12 x 96 well/24 hours

Bac by bac sequencing l.jpg

BAC-by-BAC Sequencing

Whole genome shotgun sequencing l.jpg

Whole Genome Shotgun Sequencing

Combined approach l.jpg

Combined Approach

Shotgun clone

BAC Library/BAC end sequencing

Shotgun sequence

Physical Mapping

BAC shotgun sequencing

Calculating sequence coverage l.jpg

Calculating Sequence Coverage

Given a clone/BAC/Genome of a given size, how do I figure out how many sequencing reads to run?

What coverage?

How many gaps?

How large are gaps?

Lander waterman model l.jpg

Lander-Waterman Model

Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis“ Genomics 2 (3): 231- 239

  • Poisson Estimate

  • Number of reads

  • Average length of a read

Poisson distribution digression l.jpg

Poisson Distribution Digression

Poisson is a good estimate for l.jpg

Poisson is a good estimate for…

Helpful to think of Poisson as having to do with rates

  • Number of cars that pass through a toll plaza in a given hour

  • Number of misprints per page in a book

  • Number of alpha particles emitted in one second from a radioactive substance

  • Number of trout in a cubic meter of pond water

Poisson is specified by a single parameter l.jpg

Poisson is specified by a single parameter, 

Poisson distribution l.jpg

Poisson Distribution

If for a given sequence alignment I observe, on average, 3 mis-matches every 50 bp, what is the chance of observing a 50bp window with 5 mis-matches?


Poisson distribution14 l.jpg

Poisson Distribution

What is the chance of observing at least one mismatch in a 50 bp window?


Back to sequencing l.jpg

Back to Sequencing

Lander waterman assumptions l.jpg

Lander–Waterman Assumptions

  • Sequencing reads will be randomly distributed in the genome

  • 2. The ability to detect an overlap between two truly overlapping reads does not vary from clone to clone

Lander waterman assumptions21 l.jpg

Lander–Waterman Assumptions

  • Sequencing reads will be randomly distributed in the genome

  • 2. The ability to detect an overlap between two truly overlapping reads does not vary from clone to clone

In practice l.jpg

In practice…

Lander-Waterman is almost always an underestimate

-cloning biases in shotgun libraries


-GC/AT rich regions

-other low complexity regions

Mapping ordering bacs l.jpg

Mapping/Ordering BACs

What is a marker?

A way to uniquely locate a position in a genome

What is mapping?

Statistical association between markers, ordering markers in linear sequence.

How do we map?

“Shatter” genome and observe how often two markers travel together on the same piece of DNA

What does it mean for two markers to be linked?


What does it mean to order BACs?

Create a minimal tiling path.

Slide25 l.jpg

Marker every fifth lane

BAC fingerprint gel

96 samples, 25 marker lanes

29,950 bp

1% agarose; 8 hours, 140 volts @ 14°C

Marra et al., Genome Res., 7, 1072-1084 (1997)

560 bp

Slide26 l.jpg

Whole genome map assembly

Hybridize markers or

identify in BAC end sequence (e-PCR).

Slide27 l.jpg

Whole genome map assembly

Edit contigs and align to map.

Slide28 l.jpg

Human chr 2 physical map contig

Slide29 l.jpg

Verification of location by FISH

Slide30 l.jpg

Verification of location by FISH

  • BACs were directly labeled with fluorescein conjugated d-UTP by random priming. Labeled DNA was competitively hybridized to cytogenetically normal male human metaphase chromosomes.

When is a genome finished l.jpg

When is a genome finished?

1) Finishing is hard!

2) Quality values:

Phred score = -10*log10P(error)


How much continuous phred20 sequence?

3) Gaps? 1 contig/chromosome (probably not)

Est projects l.jpg

EST Projects

  • EST=Expressed Sequence Tag

  • Short, single pass reads from bits of mRNA

  • In practice random reads from cDNA libraries

  • polyA primed/random primed

  • Sometimes libraries are tissue specific

Slide33 l.jpg


  • Downs:

  • Libraries are highly biased

  • Can be hard to know when two ESTs are derived from the same gene

  • (generally) high error rates


  • Represent the part of the genome (most) people care about

  • Does not require a sequenced genome

  • Find genes

  • Find SNPs

  • Find splice isoforms

What is the future of sequencing l.jpg

What is the future of sequencing?

  • Resequencing

    • One human done4 billion to go

    • Locating polymorphisms for complex diseases

  • More species, more individuals

    • Comparative genomics

    • What resolution (ORFs, transcription factors, individual base pairs) determines how many genomes

Really high throughput sequencing l.jpg

Really high-throughput sequencing

  • High-throughput=Cheap

    • $50 million/per mammalian genome (now)

  • Reduce volumes=reduce reagent cost

  • Eliminate/parallelize cloning and DNA preparation

  • Multiplex!

The 1000 genome l.jpg

The $1000 Genome

“…we remain very far away from being able to afford to use comprehensive genomic sequence information in individual health care.”

Key ingredients so far l.jpg

Key ingredients so far…

  • Sequencing by synthesis

  • Elimination/parallelization of clone production

  • miniaturization

Pyrosequencing i l.jpg

Pyrosequencing I

Pyrosequencing ii l.jpg

Pyrosequencing II

  • Emulsion PCR (Dressman et al (2003) PNAS 100:8817-22) to generate templates. Eliminates library construction and DNA prep

  • Drop one bead/well into a “picotiter” plate (1.6 million wells)

  • Image reactions in a flow cell with CCD camera

Fluorescent in situ sequencing fisseq l.jpg

Fluorescent in situ sequencing (Fisseq)

  • Emulsion PCR to create bead library (as in pyrosequencing)

  • Polymerize beads in polyacrylamide on the surface of a microscope slide (2 billion beads/slide)

  • Perform sequencing by synthesis reactions using fluorescently labeled dNTPs


Single molecule sequencing l.jpg

Single Molecule Sequencing

Microscope slide




Single DNA


Super-cooled CCD





Helicos Biosciences Corp.

  • Login