1 / 23

Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology

Microbial Genome Assembly. Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy. Outline-summary. 1 . QUICK INTRODUCTION. 2 . GENOME ASSEMBLY. 3 . ASSEMBLY STRATEGIES. 4 . CASE STUDY. DNA packaging. DNA packaging.

liseli
Download Presentation

Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microbial Genome Assembly Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy

  2. Outline-summary 1. QUICK INTRODUCTION 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 4. CASE STUDY

  3. DNA packaging

  4. DNA packaging

  5. Outline-summary 1. QUICK INTRODUCTION 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 4. CASE STUDY

  6. Next Generation Sequencing ACGTAGGCTAGCGTTAGCGA ........ CTGCAT C TCTTATTGTGACC TAGGCTAGCTTAG GCAATGCAGTAAC TCCAGCTAGGTTC

  7. Genome Assembly OVERLAPPING SEQUENCE ALIGMENT GENOME SEQUENCING PRELIMINARY ANALYSIS ASSEMBLY ADVANCED BIOINFORMATIC ANALYSIS

  8. On the feasibility of sequence assembly Sequencing the human genome with shotgun sequencing + assembly is the only feasible strategy Weber, James L., and Eugene W. Myers. "Human whole-genome shotgun sequencing." Genome Research 7.5 (1997): 401-409. Computational assembly of shotgun sequencing data is simply unfeasible, and a bad idea anyway Green, Philip. "Against a whole-genome shotgun.“ Genome Research 7.5 (1997): 410-417. They were both right! (…well, Weber and Myers were a bit more right from the practical viewpoint…)

  9. Outline-summary 1. QUICK INTRODUCTION 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 4. CASE STUDY

  10. Genome assembly strategies • Greedyapproach → SSAKE • De Bruijngraph(DBG) → Velvet, SOAPdenovo • OverlapConsensus Layout (OLC) → MIRA • Mixed approaches → MaSuRCA

  11. Genome assembly strategies • DE BRUIJN GRAPH APPROACH (DBG) • Nodes = overlapping sequences of reads of uniform length • Edges = kmer (unique subsequences within reads) • Velvet, SOAPdenovo2 EULERIAN PATH

  12. Genome assembly strategies • OVERLAP CONSENSUS LAYOUT (OLC) • Nodes =reads • Edges = overlap between reads • MIRA • OVERLAP • LAYOUT • CONSENSUS HAMILTONIAN PATH

  13. Genome assembly strategies

  14. Genome assembly strategies

  15. Genome assembly strategies • Greedyapproach → SSAKE • De Bruijngraph(DBG) → Velvet, SOAPdenovo • OverlapConsensus Layout (OLC) → MIRA • Mixed approaches → MaSuRCA

  16. GenomeAssemblers Average Coverage Number of Contigs Number of Contigs > 1Kb N50 contig size Fraction of reads assembled Total consensus (in nt) Number of scaffolds N50 scaffolds size Ion Torrent PGM → MIRA 3.9 Illumina → MaSuRCA MIRA 3.9 too produced good quality results, but it has a longer execution time and it becomes unstable with large amount of small reads

  17. Outline-summary 1. QUICK INTRODUCTION 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 4. CASE STUDY

  18. Mycobacteria Assembly: Case Study • Responsible for many animal and human diseases • M. tuberculosisand M. leprae (TM) • M. fortuitum (NTM) outbreak(nailsalon, 2002) • M. chelonae (NTM) outbreak(face lifts, 2004) • Illumina HiSeqsequencing (NGS Facility – CIBIO/UNITN) • Twentymycobacterialstrains • From 20 differentMycobacteriaspecies • → MaSuRCA Novelmycobacteriadetectionclinicaltests

  19. Raw data qualityassessment and pre-processing • Fastq-mcftool • poor quality ends of reads • Ns, duplicates and sequencing adapters • reads that are too short • Reduction up to 73%

  20. Assembly parameterssetting K-mers: strings of a particular length k, which are shorter than entire reads Best empirical k-mer length: 91 bases long High coverage

  21. MaSuRCA results of Mycobacteria Genomesizetoo high Abnormal GC content

  22. GC contentbasedqualityanalysis Examples of environmentalcontaminations Staphylococcus epidermidis

  23. Thanks http://gcat.davidson.edu/phast/#methods Photo coming soon

More Related