Microbial Genome Assembly and Finishing Alla Lapidus, Ph.D. Microbial genomics DOE Joint Genome Institute, Walnut Creek, CA. A typical Microbial project. Sequencing. Auto- assembly. Gap closure FINISHING. Annotation. Public release. Sanger only
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Alla Lapidus, Ph.D.
DOE Joint Genome Institute,
Walnut Creek, CA
4x of 3kb plasmids + 4x of 8kb plasmids + 1x of fosmids
~ $50k for 5MB genome draftEvolution of Microbial Drafts
454 + Solexa (current)
-20x coverage 454 standard + 4x coverage 454 paired end (PE) + 50x coverage Illumina shotgun (quality improvement; gaps)
- ~ $10k per 5MB genome
Sanger reads only (phrap, PGA, Arch, etc)
PE readsAssembly (assembler)
Automatically suggest corrections for manual curation
Automatically suggest and implement corrections
x1 – G
x2 – T
x3 – A
AUse of Illumina data
Sanger readsErrors corrected with Solexa (Polisher)
Frame shift detected in this area (454 contig)
CCTCTTTGATGGAAATGATA**TCTTCGAGCATCGCCTC**GGGTTTTCCATACAGAGAACCTTTGATGATGAACCGGTTGAAGATCTGCGGGTCAAA <- Solexa Consensus
PCR - sequenceDraft assembly - what we get
Assembly: set of contigs
New technologies: no clones to walk off even if you can scaffold contigs
What are gaps (Sanger)?
- Genome areas not covered by random shotgun
~ AT-rich DNA clones poorly in bacteria (cloning bias;
promoters like structures )=> uncaptured gaps
~GC rich DNA is difficult to PCR and to sequence and often
requires the use of special chemistry => captured gaps
Thermotoga lettingae TMO (JGI ID 4002278)
Draft assembly +454
- 2 total contigs; 1 contigs >2kb
- 454 – no cloning
- 55 total contigs; 41 contigs >2kb
- 38GC% - biased Sanger libraries
<166bp> - average length of gaps
The presence of small hairpins (inverted repeat sequences) in the DNA that re anneal ether during sequencing or electrophoresis resulting in failed sequencing reactions or unreadable electrophoresis results. (This can be aided by adding modifiers to the reaction, sequencing smaller clones and running gels at higher temperatures in the presence of stronger denaturants).High GC stops (Sanger and Hybrid)
Xylanimonas cellulosilytica DSM 15894 (3.8 MB; 72.1% GC)
PGA assembly - 9x of 8kb +454
PGA assembly - 9x of 8kb
The process of taking a rough draft assembly composed of in the DNA that re anneal ether during sequencing or electrophoresis resulting in failed sequencing reactions or unreadable electrophoresis results. (This can be aided by adding modifiers to the reaction, sequencing smaller clones and running gels at higher temperatures in the presence of stronger denaturants).
shotgun sequencing reads, identifying and resolving miss
assemblies, sequence gaps and regions of low quality to
produce a highly accurate finished DNA sequence.What is Finishing?
Final error rate should be less than 1 per 50 Kb.
No gaps, no misassembled areas, no characters other than ACGT
Currently: >250 individual microbes
Different organisms have different coverage. Non-uniform sequence coverage results in significant under- and over-representation of certain community members
Low coverage for the majority of organisms in highly complex communities leads to poor (if any) assemblies
Chimerical contigs produced by co-assembly of sequencing reads originating from different species.
Genome rearrangements and the presence of mobile genetic elements (phages, transposons) in closely related organisms further complicate assembly.
No assemblers developed for metagenomic data setsMetagenomic assembly
The whole-genome shotgun sequencing approach was used for a number of microbial community projects, however useful quality control and assembly of these data require reassessing methods developed to handle relatively uniform sequences derived from isolate microbes.
To avoid this:
make sure you use high quality sequence
choose proper assembler
None of the existing assemblers designed for metagenomic data but assemblers like PGA work better with paired reads information and produce better assemblies. We are not using pharp for metagenomic projects.Recommendations for metagenomic assembly (Sanger)
Binning: large Which DNA fragment
derived from which phylotype?
(BLAST; GC%; read depth)
Complete genome of Candidatus Accumulibacter phosphatis
Non-CAP readsFinishing approach for metagenomes
Example: Candidatus Accumulibacter phosphatis(CAP)
Candidatus Korarchaeum cryptofilum OPF8 - is the first of this apparently ancient
hyperthermophilic phyletic group to be sequenced
Desulforudis audaxviator - isolated from old water in fissures of a South African gold mine at
a depth of 3000 meters. Finished with Sanger and 454
Candidatus Accumulibacter phosphatis Type IIA (CAP) - from EBPR sludge
Candidatus Endomicrobium trichonymphae - an intracellular symbiont of a flagellate
protist, itself part of the hindgut community of a termite host. It is of interest in the pursuit of the
efficient breakdown of cellulose and lignin necessary in the hoped-for conversion of bulk plant
materials to CO2-neutral fuel