Microbial Genome Assembly
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Microbial Genome Assembly and Finishing Alla Lapidus, Ph.D. Microbial genomics PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

Microbial Genome Assembly and Finishing Alla Lapidus, Ph.D. Microbial genomics DOE Joint Genome Institute, Walnut Creek, CA. A typical Microbial project. Sequencing. Auto- assembly. Gap closure FINISHING. Annotation. Public release. Sanger only

Download Presentation

Microbial Genome Assembly and Finishing Alla Lapidus, Ph.D. Microbial genomics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Microbial genome assembly and finishing alla lapidus ph d microbial genomics

Microbial Genome Assembly

and Finishing

Alla Lapidus, Ph.D.

Microbial genomics

DOE Joint Genome Institute,

Walnut Creek, CA


Microbial genome assembly and finishing alla lapidus ph d microbial genomics

A typical Microbial project

Sequencing

Auto-

assembly

Gap closure

FINISHING

Annotation

Public release


Evolution of microbial drafts

Sanger only

4x of 3kb plasmids + 4x of 8kb plasmids + 1x of fosmids

~ $50k for 5MB genome draft

Evolution of Microbial Drafts

Hybrid Sanger/pyrosequence/Illumina

  • 4x 8kb Sanger + 15 x coverage 454 shotgun + 20x Illumina (quality improvement)

  • ~ $35k for 5MB genome draft

454 + Solexa (current)

-20x coverage 454 standard + 4x coverage 454 paired end (PE) + 50x coverage Illumina shotgun (quality improvement; gaps)

- ~ $10k per 5MB genome


Assembly assembler

Sanger reads only (phrap, PGA, Arch, etc)

---------40kb--------

  • Hybrid Sanger/pyrosequence/Solexa (no special assemblers; use PGA and Arachne)

454 contig

454 shreds

--3kb--

--3kb--

--8kb--

--8kb--

--8kb--

--8kb--

--8kb--

--8kb--

--8kb--

--8kb--

--8kb--

  • 454/Solexa (Newbler, PCAP?) – 454 reads only

Shotgun reads

PE reads

Assembly (assembler)


Use of illumina data

Align solexa reads

Identify errors

Automatically suggest corrections for manual curation

Automatically suggest and implement corrections

List Disc

x1 – G

x2 – T

x3 – A

etc

G

T

A

Use of Illumina data

x1

x2

x3

Polisher


Errors corrected with solexa polisher

Finished consensus

454 contig

Sanger reads

Errors corrected with Solexa (Polisher)

Frame shift detected in this area (454 contig)

CCTCTTTGATGGAAATGATA**TCTTCGAGCATCGCCTC**GGGTTTTCCATACAGAGAACCTTTGATGATGAACCGGTTGAAGATCTGCGGGTCAAA <- Solexa Consensus

CCTCTTTGATGGAAATAATA**TATTCGAGCATC

TTAGTGGAAATGATA**TCTTCGAGCATCGCCTC

CGAGCNTCGCCTC**GGGCTTTCCCT

CGAGCATCGCCTC**GGGTTCTCCATACACAGA

GCATCGCCTC**GGGTTTTCAATACAGAGAACCT

CAGCGCCTC**GGGTTTTCCATACAGAGAACCTT

ATCGCCTC**GGGTTTTCCAGACAGAGAACCTTT

GGTTC**GGGTTTTCCATACAGAGAACCTTTGAT

GTTTTCCATACAGAGAACATTTGATGATGAAC

GTTGTCCATACAGAGAACTTTTGATGATGAAC

TATANCATACAGAGAACCTTTGATGATGAACC

ATTTCCAGACAGAGAACCNTTGATGATGAACC

CAAACAGAGAACCTTTGAGGATGAACCGGTTG

ACAGGGAACCTTAGATGATGAACCGGTTGAAG

ACAGAGAACCTTAGATGATGAACCGGTTGAAG

ACCGTTGATGATGAACCGGTTGAAGATCTGCG

GATGGTGAACGGGTTGAAGATCTGCGGGTCAA

GGTTTGAAGATCTGCGGGTCAAACCAGTCCTC

GGTGGAAGATCTGCGGGTAAAACCAGTCCTCT

GGT.GNAGAGCTGCGGGTCAAACCAGTCCTCTG

TGAAGATCTGCGGTTCAAACCAGTCCTCTCCC

GATCGGCGTGTCAAACCAGTCCTCTGCCTCGT

TCTGCGGGTCAAACCAGTACTCTGCCTCGTTC


Draft assembly what we get

Ordered sets of contigs (scaffolds)

PCR product

pri1

pri2

10

21

16

Clone walk

(Sanger lib)

PCR - sequence

Draft assembly - what we get

Assembly: set of contigs

10

16

21

New technologies: no clones to walk off even if you can scaffold contigs


Why do we have gaps

Why do we have gaps

What are gaps (Sanger)?

- Genome areas not covered by random shotgun

  • Sequencing coverage may not span all regions of the genome, thus producing gaps in the assembly.

  • Assembly results of the shotgun reads may produce misassembled regions due to repetitive sequences.

  • A biased base content (this can result in failure to be cloned, poor stability in the chosen host-vector system, or inability of the polymerase to reliably copy the sequence):

    ~ AT-rich DNA clones poorly in bacteria (cloning bias;

    promoters like structures )=> uncaptured gaps

    ~GC rich DNA is difficult to PCR and to sequence and often

    requires the use of special chemistry => captured gaps


Microbial genome assembly and finishing alla lapidus ph d microbial genomics

454 (pyrosequence) and low GC genome

Thermotoga lettingae TMO (JGI ID 4002278)

Draft assembly +454

- 2 total contigs; 1 contigs >2kb

- 454 – no cloning

Draft assembly:

- 55 total contigs; 41 contigs >2kb

- 38GC% - biased Sanger libraries

<166bp> - average length of gaps


High gc stops sanger and hybrid

The presence of small hairpins (inverted repeat sequences) in the DNA that re anneal ether during sequencing or electrophoresis resulting in failed sequencing reactions or unreadable electrophoresis results. (This can be aided by adding modifiers to the reaction, sequencing smaller clones and running gels at higher temperatures in the presence of stronger denaturants).

High GC stops (Sanger and Hybrid)


454 and high gc project

454 and High GC project

Xylanimonas cellulosilytica DSM 15894 (3.8 MB; 72.1% GC)

PGA assembly - 9x of 8kb +454

PGA assembly - 9x of 8kb


Genome closure issues

Genome closure issues

  • Resolve repeats and mis-assemblies

    • Repeats within or in vicinity of other repeats

    • Large repetitive regions

    • Complex repetitive regions (tandems)

  • Fill in gaps

    • DNA region lethal to E.coli (Sanger libraries)

    • Hairpins, GC rich, hard stops or other 2° structure/physical premature termination

    • Hard to PCR (new technologies)

  • Other issues

    • Homopolymeric tracts and other polymorphisms (SNPs, VNTRs, indels)


What is finishing

The process of taking a rough draft assembly composed of

shotgun sequencing reads, identifying and resolving miss

assemblies, sequence gaps and regions of low quality to

produce a highly accurate finished DNA sequence.

What is Finishing?

Final quality:

Final error rate should be less than 1 per 50 Kb.

No gaps, no misassembled areas, no characters other than ACGT


Jgi microbial finishing

JGI Microbial Finishing

Currently: >250 individual microbes


Metagenomic assembly

Typically size of metagenomic sequencing project is very large

Different organisms have different coverage. Non-uniform sequence coverage results in significant under- and over-representation of certain community members

Low coverage for the majority of organisms in highly complex communities leads to poor (if any) assemblies

Chimerical contigs produced by co-assembly of sequencing reads originating from different species.

Genome rearrangements and the presence of mobile genetic elements (phages, transposons) in closely related organisms further complicate assembly.

No assemblers developed for metagenomic data sets

Metagenomic assembly

The whole-genome shotgun sequencing approach was used for a number of microbial community projects, however useful quality control and assembly of these data require reassessing methods developed to handle relatively uniform sequences derived from isolate microbes.


Qc annotation of poor quality sequence

QC: Annotation of poor quality sequence

To avoid this:

make sure you use high quality sequence

choose proper assembler


Recommendations for metagenomic assembly sanger

Use Trimmer (Lucy etc) to treat reads PRIOR to assembly

None of the existing assemblers designed for metagenomic data but assemblers like PGA work better with paired reads information and produce better assemblies. We are not using pharp for metagenomic projects.

Recommendations for metagenomic assembly (Sanger)


Finishing approach for metagenomes

Binning:Which DNA fragment

derived from which phylotype?

(BLAST; GC%; read depth)

Lucy/PGA

Complete genome of Candidatus Accumulibacter phosphatis

CAP reads

~ 45%

+

Non-CAP reads

Finishing approach for metagenomes

Example: Candidatus Accumulibacter phosphatis(CAP)


Metagenomic finishing projects

Metagenomic finishing: projects

Completed Projects:

Candidatus Korarchaeum cryptofilum OPF8 - is the first of this apparently ancient

hyperthermophilic phyletic group to be sequenced

Desulforudis audaxviator - isolated from old water in fissures of a South African gold mine at

a depth of 3000 meters. Finished with Sanger and 454

Candidatus Accumulibacter phosphatis Type IIA (CAP) - from EBPR sludge

community, US

In progress:

Candidatus Endomicrobium trichonymphae - an intracellular symbiont of a flagellate

protist, itself part of the hindgut community of a termite host. It is of interest in the pursuit of the

efficient breakdown of cellulose and lignin necessary in the hoped-for conversion of bulk plant

materials to CO2-neutral fuel


The end

The end


  • Login