cs 5263 bioinformatics n.
Download
Skip this Video
Download Presentation
CS 5263 Bioinformatics

Loading in 2 Seconds...

play fullscreen
1 / 23

CS 5263 Bioinformatics - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

CS 5263 Bioinformatics. Next-generation sequencing technology. Outline. First generation sequencing Next generation sequencing (current) AKA: Second generation sequencing Massively parallel sequencing Ultra high-throughput sequencing Future generation sequencing Analysis challenges.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS 5263 Bioinformatics' - gretchen-roth


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs 5263 bioinformatics

CS 5263 Bioinformatics

Next-generation sequencing technology

outline
Outline
  • First generation sequencing
  • Next generation sequencing (current)
    • AKA:
      • Second generation sequencing
      • Massively parallel sequencing
      • Ultra high-throughput sequencing
  • Future generation sequencing
  • Analysis challenges
sanger sequencing 1 st generation
Sanger sequencing (1st generation)
  • DNA is fragmented
  • Cloned to a plasmid vector
  • Cyclic sequencing reaction
  • Separation by electrophoresis
  • Readout with fluorescent tags

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

cyclic array methods next generation
Cyclic-array methods (next-generation)
  • DNA is fragmented
  • Adaptors ligated to fragments
  • Several possible protocols yield array of PCR colonies.
  • Enyzmatic extension with fluorescently tagged nucleotides.
  • Cyclic readout by imaging the array.

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

available next generation sequencing platforms
Available next-generation sequencing platforms

Illumina/Solexa

ABI SOLiD

Roche 454

Polonator

HeliScope

slide6

Emulsion PCR

  • Fragments, with adaptors, are PCR amplified within a water drop in oil.
  • One primer is attached to the surface of a bead.
  • Used by 454, Polonator and SOLiD.

Rothberg and Leomon Nat Biotechnol. 2008

Shendure and Ji Nat Biotechnol. 2008

slide7

454 Sequencing

  • Stats:
  • read lengths 200-300 bp
  • accuracy problem with homopolymers
  • 400,000 reads per run
  • costs $60 per megabase

Rothberg and Leomon Nat Biotechnol. 2008

bridge pcr
Bridge PCR
  • DNA fragments are flanked with adaptors.
  • A flat surface coated with two types of primers, corresponding to the adaptors.
  • Amplification proceeds in cycles, with one end of each bridge tethered to the surface.
  • Used by illumina/Solexa.
slide10

First Round

All 4 labeled nucleotides

Primers

Polymerase

slide11

2. Remove fluorophore

3. Remove block on 3’ terminus

1. Take image of first cycle

slide13

Stats:

  • read lengths up to 36 bp
  • error rates 1-1.5%
  • several million “spots” per lane (8 lanes)
  • cost $2 per megabase

http://seq.molbiol.ru/

conventional sequencing
Conventional sequencing

Can sequence up to 1,000 bp, and per-base 'raw' accuracies as high as 99.999%. In the context of high-throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase.

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

sequence qualities
Sequence qualities
  • In most cases, the quality is poorest toward the ends, with a region of high quality in the middle
  • Uses of sequence qualities
    • ‘Trimming’ of reads
      • Removal of low quality ends
    • Consensus calling in sequence assembly
    • Confidence metric for variant discovery
  • In general, newer approaches produce larger amounts of sequences that are shorter and of lower per-base quality
    • Next-generation sequencing has error rate around 1% or higher
phred quality score
Phred Quality Score
  • p=error probability for the base
  • if p=0.01 (1% chance of error), then q=20
  • p = 0.00001, (99.999% accuracy), q = 50
  • Phred quality values are rounded to the nearest integer
main illumina noise factors
Main Illumina noise factors

Schematic representation of main Illumina noise factors.

(a–d) A DNA cluster comprises identical DNA templates (colored boxes) that are attached to the flow cell. Nascent strands (black boxes) and DNA polymerase (black ovals) are depicted.

(a) In the ideal situation, after several cycles the signal (green arrows) is strong, coherent and corresponds to the interrogated position.

(b) Phasing noise introduces lagging (blue arrows) and leading (red arrow) nascent strands, which transmit a mixture of signals.

(c) Fading is attributed to loss of material that reduces the signal intensity (c).

(d) Changes in the fluorophore cross-talk cause misinterpretation of the received signal (blue arrows; d). For simplicity, the noise factors are presented separately from each other.

Erlich et al. Nature Methods 5: 679-682 (2008)

comparison of existing methods
Comparison of existing methods

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

read length and pairing
Read length and pairing

TCGTACCGATATGCTG

ACTTAAGGCTGACTAGC

  • Short reads are problematic, because short sequences do not map uniquely to the genome.
  • Solution #1: Get longer reads.
  • Solution #2: Get paired reads.
third generation
Third generation
  • Single-molecule sequencing
    • no DNA amplification is involved
    • Helicos HeliScope
    • Pacific Biosciences SMRT
  • Longer reads
    • Roche/454 > 400bp
    • Illumina/Solexa > 100bp
    • Pacific Bioscience > 1000 bp and single molecule
applications of next generation sequencing
Applications of next-generation sequencing

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

analysis tasks
Analysis tasks
  • Base calling
  • Mapping to a reference genome
  • De novo or assisted genome assembly
references
References
  • Next-generation DNA sequencing, Shendure and Ji, Nat Biotechnol. 2008.
  • Next-Generation DNA Sequencing Methods, Elaine R. Mardis, Annu. Rev. Genomics Hum. Genet. (2008) 9:387–402