metagenome definitions a refresher course n.
Skip this Video
Download Presentation
Metagenome definitions: a refresher course

Loading in 2 Seconds...

play fullscreen
1 / 11

Metagenome definitions: a refresher course - PowerPoint PPT Presentation

  • Uploaded on

Metagenome definitions: a refresher course. Natalia Ivanova. MGM Workshop September 12, 2012. Metagenome is a collective genome of microbial community, AKA microbiome (native, enriched, sorted, etc.).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Metagenome definitions: a refresher course' - ted

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
metagenome definitions a refresher course

Metagenome definitions:a refresher course

Natalia Ivanova

MGM Workshop

September 12, 2012

Metagenome is a collective genome of microbial community, AKA microbiome (native, enriched, sorted, etc.).

Metagenomic library (or libraries) is constructed from isolated DNA (native, enriched, etc.).

Metagenomic library can be single-end (AKA standard)

or paired-end

Metagenome definitions

Single-end (standard) metagenomic library will produce contigs upon assembly (i. e. longer sequences based on overlap between reads)

Any Ns found in contigs correspond to low quality bases

Paired-end metagenomic library will produce scaffolds upon assembly (non-contigous joining of reads based on read pair information)

Ns found in scaffolds correspond either to low quality bases or to gaps of unknown size





Metagenome definitions





Amplified and Unamplified Libraries

Amplified Library

Unamplified Library

Fragmentation (1ug)

Fragmentation (1ug)

Double SPRI

End repair / Phosphorylation

End repair / Phosphorylation

SPRI Clean

Double SPRI

A-tailing with Klenow exo-

A-tailing with Klenow exo-

SPRI Clean

DNA Chip

Heat Inactivation

DNA Chip

Adaptor Ligation

Adaptor Ligation

SPRI Clean

PCR 10-cycle Amplification

SPRI Clean

DNA Chip

SPRI Clean

DNA Chip

qPCR Quantification

qPCR Quantification

Unless the community has very low complexity (i. e. dominated by one or a few clonal populations), assembly at 100% nucleotide identity will be very fragmented.

What to do with k-mer based assemblies?

Use multiple k-mer settings, combine assemblies with an overlap-layout consensus assembler like minimus2 using minimal % identity of 95%. Tradeoff between overlap length and % identity.

Metagenome definitions (contd):

overlap = alignment of reads at x% identity

assembly pipeline v 0 9
Assembly Pipeline v.0.9

Trimming does not appear to be ideal for this process

CPU time intensive, no known metagenomic Kmer prediction algorithm

A snapshot of older (454-Illumina) metagenome assembly pipeline

Picking best kmer – manual process


Assembly of sequences at less than 100% identity =>

population contigs and scaffolds representing a consensus sequence of species population

isolate contigspecies population contigs

Metagenome definitions (contd):

overlap = alignment of reads at x% identity

2 more important definitions
Sequence coverage (AKA read depth)

How many times each base has been sequenced => needs to be considered when calculated protein family abundance

Per-contig average coverage

Per-base coverage => per-gene coverage

2. Bins

Scaffolds, contigs and unassembled reads can be binned into sets of sequences (bins) that likely originated from the same species population or a population from a broader taxonomic lineages

2 more important definitions
what img does and doesn t do
Scaffolds and contigs are generated by assembly – not provided in IMG/M

Sequence coverage can be computed by the assembler based on alignments it generates (preferable) or can be added later by aligning reads to contigs – the latter can be provided in IMG/M

Bins are generated by binning software – not provided in IMG/M

Scaffolds, contigs and unassembled reads are annotated with non-coding RNAs, repeats (CRISPRs), and protein coding genes (CDSs); the latter are assigned to protein families (COGs, Pfams, TIGRfams, KEGG Orthology, EC numbers, internal clusters) – is provided in IMG/M

What IMG does and doesn’t do
what s the difference between img and mg rast img and camera
We prefer to assemble the data

longer sequences -> better quality of gene prediction and functional annotation

longer sequences -> chromosomal context and binning -> population-level analysis

But we don’t provide assembly services except for metagenomes sequenced at the JGI

we may be able to help with assembly of 454

we’re not equipped to assemble massive amounts of Illumina data

Contact person: Ed Kirton,

IMG does not provide tools for analysis of 16S data from the metagenome itself

we do assembly -> assembled 16S sequences are generally not very reliable

BLASTn of reads matching conserved regions is misleading

we do pyrotags or i-tags for every metagenome sequenced at the JGI

What’s the difference between IMG and MG-RAST, IMG and CAMERA?