1 / 10

Genome Assembly

Genome Assembly. Bonnie Hurwitz Graduate student TMPL . Genome assembly. Genome assembly. …ACGGCTGCGTTACATCGATCAT. ACATCGATCATTTACGATACCATTG…. genomic DNA. Shotgun sequencing (WGS). sheared. clone library (insert sizes of 1-2, 3-4, 30-40, 100kb). end sequence clones (f / r).

kostya
Download Presentation

Genome Assembly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Assembly Bonnie Hurwitz Graduate student TMPL

  2. Genome assembly

  3. Genome assembly

  4. …ACGGCTGCGTTACATCGATCAT ACATCGATCATTTACGATACCATTG… genomic DNA Shotgun sequencing (WGS) sheared clone library (insert sizes of 1-2, 3-4, 30-40, 100kb) end sequence clones (f / r) assemble reads by alignment identity

  5. Genome scaffolding H G A B E’ C D F E’’ D contig A G B E F H C break mate pair linkage 4 1 3 7 6 8 5 2 “composite” genome scaffold

  6. Sanger sequencing costs 3.1 2.0 1.00 1.8 0.90 1.6 0.80 1.4 0.70 1.2 0.60 1.0 0.50 0.57 ¢ 0.46¢ .8 0.40 0.35¢ .6 0.30 .4 0.19¢ 0.20 0.10¢ .2 0.10 0 0 Sequence production (Billions of bases/month) Cost: Cents per base 2003 2005 1995 2001 1997 1999 1993 1989 1991 2008 ~ $1/read

  7. 0.01 ¢ 0.03 ¢ 0.003 ¢ Cost / bp --> (Sanger is currently 0.1 ¢ ) 454 Pyrosequencing - the generations

  8. When is a genome “finished”?(by Poisson Calculations) Fold coveragePercent of genome sequenced 0.25 x 22% 0.50 x 39% 0.75 x 53% 1 x 63% 2 x 88% 3 x 95% 4 x 98% 5 x 99.4% 6 x 99.75% 7 x 99.91% 8 x 99.97% 9 x 99.99% 10 x 99.995% Coverage: Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads (N), and the average read length(L) as NL / G

  9. Tablet: Assembly Viewer Current location Sequence Overlap Contig info Consensus Sequence reads

  10. Our goal today • Assemble a phage genome • Assemble a phage genome with different levels of coverage • Compute basic statistics on each genome assembly • View the assemblies • Compare the best assembly to the finished genome

More Related