1 / 13

Assembling the Glanville fritillary genome

CSC Conference 2.6.2010 Next generation sequencing data analysis. Assembling the Glanville fritillary genome. Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab. Next generation sequencing. Roche 454 Illumina Solexa ABI SOLiD. Assembly pipeline. Newbler

sophie
Download Presentation

Assembling the Glanville fritillary genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC Conference 2.6.2010 Next generation sequencing data analysis Assembling the Glanville fritillary genome Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab

  2. Next generation sequencing • Roche 454 • Illumina Solexa • ABI SOLiD

  3. Assembly pipeline Newbler 320Mbp 220K contigs N50: 1700nt • 454 • 10M single reads 400bp • Illumina Solexa • 52M 2*101 pairend (insertsize 600bp) • 102M 2*76 pairend (insertsize 600bp) • error correction, soap denovo scaffolds 2M 2*75 matepairs, span 1500 at every 25bp • SOLiD • 420M 2*50 matepairs (insertsize 1Kbp)filtering 96M • EST • 26K 27M unique mapping SOLiD: 40K scaffolds

  4. Assembly validation 1: contigs vs nr contig BLASTXhits top5 contig00008 216 Bombyx mori (domestic silkworm), Bombyx mori (domestic silkworm), Aedes aegypti (Stegomyia aegypti), Nasonia vitripennis (jewel wasp), Nasonia vitripennis (jewel wasp) contig00077 2 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid) contig00084 63 Apis mellifera (honey bee), Forficula auricularia (European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig) contig00094 2 Tribolium castaneum (red flour beetle), Apis mellifera (honey bee) contig00198 203 Tribolium castaneum (red flour beetle), Tribolium castaneum (red flour beetle), Nasonia vitripennis (jewel wasp), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee) contig00208 68 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Tribolium castaneum (red flour beetle), Strongylocentrotus purpuratus contig00216 163 Pediculus humanus corporis (human body louse), Culex quinquefasciatus (southern house mosquito), Aedes aegypti (Stegomyia aegypti), Culex quinquefasciatus (southern house mosquito), Tribolium castaneum (red flour beetle) contig00229 39 Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee), Drosophila pseudoobscura pseudoobscura contig00251 76 Acyrthosiphon pisum (pea aphid), Pediculus humanus corporis (human body louse), Nematostella vectensis (starlet sea anemone), Strongylocentrotus purpuratus, Strongylocentrotus purpuratus contig00278 90 Aedes aegypti (Stegomyia aegypti), Anopheles gambiae str. PEST, Nasonia vitripennis (jewel wasp), Drosophila willistoni, Drosophila virilis contig00279 43 Bombyx mori (domestic silkworm), Culex quinquefasciatus (southern house mosquito), Culex quinquefasciatus (southern house mosquito), Anopheles gambiae str. PEST, Tribolium castaneum (red flour beetle) contig00302 250 Acyrthosiphon pisum (pea aphid), Salmo salar (Atlantic salmon), Branchiostoma floridae (Florida lancelet), Ciona intestinalis, Ciona intestinalis contig00310 26 Tribolium castaneum (red flour beetle), Acyrthosiphon pisum (pea aphid), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti) contig00321 218 Acyrthosiphon pisum (pea aphid), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti), Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito) contig00471 91 Drosophila virilis, Drosophila mojavensis, Drosophila ananassae, Drosophila yakuba, Drosophila grimshawi contig00507 3 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer) contig00525 250 Bombyx mori (domestic silkworm), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti), Apis mellifera (honey bee), Apis mellifera (honey bee) contig00533 8 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Bombyx mori (domestic silkworm), Strongylocentrotus purpuratus

  5. Assembly validation 2: Genomic contigs vs EST contigs   52 13

  6. rev_contig310 1 --TTCAGAGAAACAAGTGAATTGAAATTTGATTATTTAtTTTCGTTTCAG 48 |||||||||||||||.|||||||||||||||||||||||||||||.|| contig402106 1 TTTTCAGAGAAACAAGTAAATTGAAATTTGATTATTTATTTtCGTTTTAG 50 rev_contig310 49 TATGAAGCAGCAGCGAGAGGTGCAGAAGCACTTGGAAACAGATATGGTAC 98 |||||||||||.||||||||||||||||||||||||||.||||||||||| contig402106 51 TATGAAGCAGCCGCGAGAGGTGCAGAAGCACTTGGAAAAAGATATGGTAC 100 rev_contig310 99 AAAtTATAGAGTAGGAGtTGCCGCAGATATTCtTTGTAAGtTGTTTTTTT 148 |||||||||||||||||||||||||||||||||||||||||||||||||| contig402106 101 AAATTATAGAGTAGGAGTTGCCGCAGATATTCTTTGTAAGTTGTTTTTTT 150 rev_contig310 149 AATCAGTTTAGCtTGCAGCtTTAAGACTATTATTATATATTTTTTTATCG 198 ||||.|||||.||||||||||||||||||||||||||| ||||||||||| contig402106 151 AATCGGTTTATCTTGCAGCTTTAAGACTATTATTATAT-TTTTTTtATCG 199 rev_contig310 199 TTGTACAGTAAGAAGCTACATAAtTTTTcCTACCGcCTA--TT-----gg 241 ||||||||||||||||||||||||||||||||||||||| || .| contig402106 200 TTGTACAGTAAGAAGCTACATAATTTTTCCTACCGCCTATTTTGGGGGAG 249 rev_contig310 242 GGGGGGGGATTGTTGAATCAGTTAAGAATTAAAAGATGATGCTAtTTCAG 291 ||||||||||||||.|||||||.||||||| ||||||||||||||||||| contig402106 250 GGGGGGGgATTGTTAAATCAGTCAAGAATT-AAAGATGATGCTATTTCAG 298 rev_contig310 292 aATACtTaAACttTTTTTAAGAC--GAC---------T-A-TAA-GTTTA 327 ||.||||.||||||||||||||| ||| | | ||. ||||| contig402106 299 AAAACTTCAACTTTTTTtAAGACTAGACTATTTTTAATAATTAGTGTTTA 348 rev_contig310 328 AATAACACTAATTATTaAAAACTTGGTCTATCTTGGTCTTGGtTTTAGGt 377 |||||||||||||||||||||||||.||||||||.||||||||.|.|||| contig402106 349 AATAACACTAATTATTAAAAACTTGATCTATCTTCGTCTTGGTCTAAGGT 398 rev_contig310 378 TTTTCCTCTAGTTAATATTACTGTTACAACTACATAAAAACAATAAAATA 427 ||.|||||||||||||.|||||||||||||||||||||||||||||..|| contig402106 399 TTGTCCTCTAGTTAATCTTACTGTTACAACTACATAAAAACAaTAAGGTA 448 rev_contig310 428 CTGTATCTTTGCAGATCCTATGAGCGGAACCACTTTTGACTGGGCGAAGA 477 |||||||||||.|||||||||||||||||||||||||||||||||||||| contig402106 449 CTGTATCTTTGTAGATCCTATGAGCGGAACCACTTTtGACTGGGCGAAGA 498 478 ATACAACAAATGTCCCATTTTCTTACCTGATTGAATTAAGAGACTTGGGG 527 ||.||||||||||||||||||||||||||||||||||||||||||||||| 499 ATGCAACAAATGTCCcATTTtCTTACCTGATTGAATTAAGAGACTtGGGg 548 528 CAATACGGTTTCTTGTTACCAGCAGAACAGATTATTCCAACTAATTTAGA 577 |||||||||||||||||||||||||||||||||||.|||||||||||||| 549 CAaTACGGTTtCTTGTTACcAGCAGAACAGATTATACCAACTAATTtAGA 598 578 AATAATGGATGCACTCCTGGAGATGGATAATACCGCAAGAACACTAgGG 626 ||||||||||||||||||||||||||||||.|||||||||||||||||. 599 AATAaTGGATGCACTCcTGGAGATGGATAACACCGCAAGAACACTAGGA 647

  7. ? ? ? ? ? ? ? What now?Still more sequencing needed... • target enrichment: 55K 120nt probes • 5’ SAGE • longer matepairs  longer contigs & scaffolds  annotation

  8. Challenges • no elegant solution for combining SOLiD colorspace reads with other platforms in denovo assembly • read quality: filtering vs error correction • difficulties generating long matepairs • how to finish the assembly project: validation Goal: to get contigs/scaffolds useful for gene prediction

  9. What is the best assembler? • soap, velvet, Newbler, CLC bio, Celera • #contigs, contig lengths, accuracy

  10. Assembling Solexa data • 52M 2*101 pairend (insertsize 600bp) • 102M 2*76 pairend (insertsize 600bp) • error correction (soap denovo) sum of contig lengths number of contigs contig size contig size

  11. Assembling 454 data, 10M single reads 400bp number of contigs sum of contig lengths Newbler: all 454 data + 2M 1500nt matepairs from soap scaffolds CLC bio: all 454 data + all Solexa data contig size contig size

  12. denovo assembler history: Part I • read errors • repetitive elements

  13. denovo assembler history: Part II de Bruijn graph

More Related