1 / 20

FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing

FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing. Zemin Ning The Wellcome Trust Sanger Institute. Selexa reads assembler to extend long reads of 1-2Kb. forward-reverse paired reads. known dist. ~500 bp. 30-70 bp. 30-70 bp. Capillary reads assembler

Download Presentation

FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute

  2. Selexa reads assembler to extend long reads of 1-2Kb forward-reverse paired reads known dist ~500 bp 30-70 bp 30-70 bp Capillary reads assembler Phrap/Phusion Genome/Chromosome Assembly Strategy

  3. Kmer Extension & Repeat Junctions A = A1 + A2 A2 A1 B1 B = B1 + B2 B2

  4. Handling of Single Base Variations A B1 A B2 B1 = B2 S = A + B1

  5. Fuzzy Kmers Number of Mismatches between Two Kmers ACGTAACTAACAGTT 00 01 10 11 00 00 01 11 00 00 01 00 10 11 11 Kmer_1 ACGTAACTCACAGTT 00 01 10 11 00 00 01 11 01 00 01 00 10 11 11 Kmer_2 ACGTAACT ACAGTT 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 Kmer_1^Kmer_2

  6. Pileup of other reads like 454, Sanger etc at a repeat junction Kmer Extension & Repeat Junctions A2 A1 Consensus Means to handle repeats: - Base quality - Read pair - Fuzzy kmers - Closely related reference - 454 or Sanger reads

  7. Pileup of Solexa and 454 Reads

  8. S.Suis P1/7 Solexa/454 Assembly Solexa reads: Number of reads: 3,084,185;Finished genome size: 2,007,491 bp; Read length: 39 and 36 bp; Estimated read coverage: ~55X; Number of 454 reads: 100,000; Read coverage of 454: 10X; Assembly features: - contig statsTotal number of contigs: 73; Total bases of contigs: 1,999,817 bp N50 contig size: 62,508; Largest contig: 162,190 Averaged contig size: 27,394; Contig coverage over the genome: ~99 %; Contig extension errors: 2 Mis-assembly errors: 3

  9. Salmonella seftenberg Solexa Assembly from Pair-End Reads Solexa reads: Number of reads: 6,000,000;Finished genome size: ~4.8 Mbp; Read length: 2x37 bp; Estimated read coverage: ~92.5 X; Insert size: 170/50-300 bp; Assembly features: - contig stats Solexa 454Total number of contigs: 75; 390 Total bases of contigs: 4.80 Mbp 4.77 Mb N50 contig size: 139,353 25,702 Largest contig: 395,600 62,040 Averaged contig size: 63,969 12,224 Contig coverage on genome: ~99.8 % 99.4% Contig extension errors: 0 Mis-assembly errors: 0 4

  10. Extremely GC Biased Genomes GC 68.0% 50.5% 19.0% 68.0% 19.0% 50.8% 19.0% 19.0% 19.0% 19.0%

  11. Malaria 3D7 Assemblies Solexa reads: 2x36 bp 2x76 bp Number of reads: 14.0m 9.77mFinished genome size: 23 Mbp 23 Mbp Estimated read coverage: 43x 64x Insert size: 170 bp 170 bp Assembly features:Total number of contigs: 26,926 22839 Total bases of contigs: 19.2 Mbp 21.1 Mb N50 contig size: 1456 1621 Largest contig: 9106 9825 Averaged contig size: 706 923 Contig coverage on genome: ~83.5 % 91.7% Contig extension errors: ? ? Mis-assembly errors: ? ?

  12. E.Coli strain 042 Assembly Solexa reads: Number of reads: 7,055,348;Finished genome size: 5.35 Mbp; Read length: 2x36bp; Estimated read coverage: ~95X; Insert size: 170/50-300 bp; Assembly features: - contig statsTotal number of contigs: 168; Total bases of contigs: 5.19 Mbp N50 contig size: 85,886; Largest contig: 337,768 Averaged contig size: 30,886; Contig coverage over the genome: ~99 %; Contig extension errors: 1 Mis-assembly errors: 2

  13. Mouse Chromosome 17 Assembly Solexa reads: Number of reads: 86.5 million;Finished genome size: 95.2 Mbp; Read length: 2x36bp; Estimated read coverage: ~65X; Insert size: 120/50-200 bp; Assembly features: - contig statsTotal number of contigs: 55,802; Total bases of contigs: 75.8 Mbp N50 contig size: 2,322; Largest contig: 17,859 Averaged contig size: 1,358; Contig coverage over the genome: ~80 %; Contig extension errors: ? Mis-assembly errors: ?

  14. Pooled Clones: Zfish 9, Pig 3

  15. Mapping of Solexa Reads On the Reference

  16. Insert extended long reads of 1-2Kb ~300 bp 30-70 bp 30-70 bp Solexa assembly Genome/Chromosome Assembly WGS Reads 5X Fishing WGS Reads FuzzyPath Combined Reads Phusion or Phrap Phusion

  17. Acknowledgements: • Yong Gu • James Bonfiled • Helen Beasley • Siobhan Whitehead • Daniel Turner • Michael Quail • Tony Cox • Richard Durbin

More Related