250 likes | 429 Views
The Past, Present, and Future of DNA Sequencing . Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University. A very short history of DNA sequencing.
E N D
The Past, Present, and Future of DNA Sequencing Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University
I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff
Milestones • First Isolation of DNA : 1867 (FreidrichMeisher) • Composition of nucleic acids; tetranucleotide theory : 1909 - 1940 (Phoebus Levine) • G=C and A=T however, the G/C and A/T content of different organisms vary : 1950 (Edwin Chargaff) • G/C content measured by annealing : 1968 (Mandel and Marmur) • Maxam-Gilbert and Sanger Sequencing : 1977 • Next-Generation Sequencing : 2005
Genomes Sequenced • Virus – 3222 (Bacteriophage phi X 174, 5386 nt – 1977) • Bacteria – 2289 (Haemophilus influenza, 1.8 x 106nt– 1995) • Eukarya – 168 (S. cerevisiae1.2 x 107nt– 1995; H. sapien, 3x 109nt-2001) • Archaea – 152 (Methanococcusjannaschi , 1.7 x 106nt– 1996)
Next-Generation Sequencing Liu et al. Journal of Biomedicine and Biotechnology Volume 2012 (2012), Article ID 251364, 11 pages doi:10.1155/2012/251364
Changes in instrument capacity* ER Mardis. Nature470, 198-203 (2011) doi:10.1038/nature09796
Sequencing Cost Source - NHGRI : http://www.genome.gov/sequencingcosts/
Central Dogma of Molecular Biology James Watson version - 1965 DNA RNA Protein So once we have the genomic DNA sequence of a species we have all of the information there is? Really?
IlluminaHiSeq and MiSeq • Massively parallel • HiSeq : 150 or 180 million reads per lane • MiSeq : 15 million reads per run • Intermediate Read Length • HiSeq : 100 nt or 150 nt • MiSeq : 250 nt • High total output per run • HiSeq : 90 GB or 288 GB • MiSeq : 8 GB
Sequencing Types Single Read Paired-end read Mate-pair read
Library Types • Many different library preps : DNA, mate-pair, mRNA, miRNA, ChIP • Fragmentation • DNA : 300 – 500 nt • RNA : 150 – 200 nt • Attachment of appropriate adapters • Complex : flow cell binding, F & R sequencing, BC • Custom : Avoid if possible • Removal of dimers/small inserts • Amplification (or not)
Applications • de Novo sequencing (genomes, transcriptomes) • Resequencing (genomes, exomes, custom sequence capture) • RNA-seq (mRNA, miRNA, degradome) • Chip-Seq • Methyl-seq • RIP-seq • Amplicon
de Novo Experimental Design • Estimate of genome size • Coverage (30 x – 100 x) • Sequencing Type (paired-end or mate-pair) • Example 100 MB genome, 100 x 100 nt paired-end reads • (100 MB) x (30 x coverage) = 3 GB • 3 GB / (200 nt for each pair of paired-end reads) = 15 million read pairs • Replicates
RNA-seq Experimental Design • Estimate of transcriptome size (1-5% of genome ?) • Coverage (30 x ?) • mRNA or rRNA depleted RNA • Relative abundance of transcripts you are interested in • Sequencing Type (single read or paired-end) • Simple transcriptome vs. complex transcriptome • Splice variants • Example 3 GB genome, 100 nt single reads • (3 GB genome) x ( 5% transcriptome ) = 120 MB Transcriptome • (120 MB transcriptome) x (30 x coverage) = 4.5 GB total sequence • 4.5 GB / (100 nt for each read) = 45 million read pairs • Replicates : Yes!!!! • Biological not technical
ChIP-Seq http://www.nature.com/nmeth/journal/v4/n8/images/nmeth0807-613-F1.gif
RIP-seq Source : http://openi.nlm.nih.gov/imgs/rescaled512/3269675_ijms-13-00097f6.png
Methyl-seq 20 different types of base modifications in DNA are known and there are perhaps 200 modifications of RNA
Experimental Space: Next-Gen Platform • PacBio: 0.075 x 106 reads/sample, 1000 – 3000 nt • Whole transcript • Roche 454 FLX+ : 0.5 -1 x 106 reads/sample, 800 -1000 nt • Small – Medium Genome de novo sequencing • Long Amplicon • Transcriptome • PGM: 1-2 x 106 reads per sample, 400 nt • Small genome de novo • Medium Amplicon • MiSeq: 1-2 x 106 reads per sample, 50 – 250 nt • Small genome de Novo • Small Amplicon • HiSeq : 10-100 x 106 reads per sample, 50 – 150 nt • Counting Applications : RNA-seq, ChIP-seq, RIP-seq, Methyl-seq • Large genome de novo and resequencing
Experimental Space: The Relevancy of “Classic” Techniques Differential Gene Expression • Northern blotting (1977) : 1 Probe – 20 samples • Dot Blots (1987) : 100s of probes – 1 sample • RT-PCR (1992) : 100s of probes – 10 -100 samples • Microarrays (1995 ) : 100,000s of probes – 1 sample • Next-gen sequencing (2005) : 10-100 x 106 reads – 1 sample
The Future • More Reads • Longer Reads • Faster Sequencing • Cheaper Sequencing • New Applications