The Integrated Microbial Genome (IMG) systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

The Integrated Microbial Genome (IMG) systems PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

The Integrated Microbial Genome (IMG) systems. Nikos Kyrpides. Reddy. Bahador. Iain. Denis. Amrita. Billis. Peter. Marcel. OMICS GROUP. STANDARDS GROUP. ANNOTATION GROUP. Natalia. Dino. Kostas. Ioanna. Biological Data Management. Victor Markowitz. Yuri Grechkin. Ken Chu.

Download Presentation

The Integrated Microbial Genome (IMG) systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The integrated microbial genome img systems

The Integrated Microbial Genome (IMG) systems

Nikos Kyrpides


The integrated microbial genome img systems

  • Reddy

  • Bahador

  • Iain

  • Denis

  • Amrita

  • Billis

  • Peter

  • Marcel

  • OMICS GROUP

  • STANDARDS GROUP

  • ANNOTATION GROUP

  • Natalia

  • Dino

  • Kostas

  • Ioanna

Biological Data Management

Victor

Markowitz

Yuri

Grechkin

Ken

Chu

Ernest

Szeto

Krishna

Palaniappan

Amy

Chen

Biju Jacob


The integrated microbial genome img systems

Science driven

data generation and analysis

ANALYSIS

  • User

  • Facility


The integrated microbial genome img systems

Science driven

data generation and analysis

ANALYSIS

  • User

  • Facility


The integrated microbial genome img systems

Data analysis

Comparative Analysis

Data Integration


The integrated microbial genome img systems

What is the Matrix?

Data management system for comparative analysis of biological data

Genomes

Functions

Genes

IMG

Clusters

Metadata

I

SNPs

M

Proteomics

G

Regulons

Transcriptomes


The integrated microbial genome img systems

Become the HOME of

Microbial Genomes and Metagenomes

  • support comparative genome analysis

  • support community functional annotation

provide a user friendly interface

IMG’s Mission


Integrated microbial genomes img it s easier to analyze 1000 genomes than a single one

Integrated Microbial Genomes (IMG)[It’s easier to analyze 1000 genomes than a single one]

Bacteria: 2780

Archaea: 107

Eukarya: 121

Plasmids: 1186

Viruses: 2697

http://img.jgi.doe.gov/

  • What is IMG:

  • IMG is a data management system for comparative analysis and annotation of all publicly available genomes from three domains of life in a uniquely integrated context.

  • Mission:

  • To become the Home of Microbial Genome and Metagenome Analysis

  • Background:

  •  Launched on March 2005

  •  3 Releases/Year, 20 releases so far

  • >5,000 unique visitors per month

  •  >350 citations

  • Current Status:

  • 6891 Genomes

  • 11.6 Million Genes

  • http://img.jgi.doe.gov/

  • http://img.jgi.doe.gov/

  • USERS CAN

  • Search data

  • Browse data

  • Compare data

  • Export data


Why more data are needed faster and more accurate function prediction

Why more data are neededfaster and more accurate function prediction

Fructokinase family

Ribokinase family

2-dehydro-3-deoxyglucokinase family


Metagenomic analysis

Metagenomic Analysis

Binning

?

Soil

Sargasso Sea

Termite Hindgut

Human Gut

Acid Mine Drainage

Reference Genomes

Species complexity

110 1001000 1000s 10000

The road to success in Metagenomics is through Microbial Genomics

Source: Susannah Tringe, JGI


Availability of reference genomes

Availability of Reference Genomes

?

Soil

Human gut

Termite Gut

Marine

Acid Mine Drainage

Reference Genomes

100%60% 50% 40% 20% 1%


Data model abstraction example img operations

Genes present inG1

and absent fromG2, G3, G4 and G5

Gene occurrence profile across genomes

Gene occurrence profiles across pathways

g1

+ + + + +

g2

+ + - + +

g3

+ - - - -

G1 G2G3 G4 G5

Pathways shared by genomes

Data Model Abstraction Example: IMG Operations

Genes

Genomes

Functions/ Pathways


Img data integration

IMG Data Integration

Genes

  • RNAs, Proteins

  • Sequence Clusters

  • Positional clusters

  • Regulatory clusters

  • Fusions

  • Operons

  • Expression

  • COG

  • GO

  • Pfam

  • TIGRfam

  • InterPro

  • KEGG

  • BioCyc

  • SEED

  • Protein product

  • MyIMG

  • IMG Terms

  • IMG Pathways

  • IMG Networks

Genomes

Functions

  • Groupings

  • Phylogenetic

  • Phenotypic

  • Ecotypic

  • Disease

  • Geographical

  • Isolation

11.6M

6891

1.1M


Img toolkit

IMG Toolkit

Gene

Synteny

Functional

Categories

Projects

Map

Function

Profile

Abundance

Profiles

Chromosome

Map

Genome

Clustering

IMG Pathway

Profile

Metadata

Search

Compare

Annotations

Phylogenetic

Profile

VISTA

KEGG

Maps

Phylogenetic

Distribution

Chromosomal

Map

Recruitment

Plot

Fragment

Recruitment

Artemis

WRITE PAPER


The integrated microbial genome img systems

  • USERS CAN

  • Search data

  • Browse data

  • Compare data

  • Export data

UNIQUE VISITS

~ 5,000 / month

  • USERS CAN

  • Submit data

  • Annotate data


Informatics steps services support of a new user community

Informatics Steps & Servicessupport of a new user community

INTEGRATION & COMPARATIVE ANALYSIS

2012

ASSEMBLY

2005

IMG

2008

IMG-ER


The integrated microbial genome img systems

Data Challenges & Opportunities

  • Metadata

  • Gene calling

  • Annotation

  • Quantity

  • Quality

Data

Analysis

Integration

  • Number of Genes

    • All vs all Blast

  • Number of Datasets

    • How do we navigate through a sea of data


The integrated microbial genome img systems

Challenges we face

  • DATA SIZE

  • DATA QUALITY

  • DATA STANDARDS


The integrated microbial genome img systems

Challenges we face

  • 1. DATA SIZE

  • Number of Genes

  • Number of Datasets

    • How do we compare data

    • How do we find data

    • How do we navigate through data


The integrated microbial genome img systems

ii. Method dev for data reduction & comparison- Computation of Similarities

Use clusters

2. Computation of similarities

Reference genomes

Metagenome

Metagenome

Metagenome

Clusters

  • Common/unique genes

  • Rapid identification of best hit(s)

  • ….


Scaling computation of similarities

SCALINGComputation of Similarities


The integrated microbial genome img systems

Strain / species diversity


The integrated microbial genome img systems

10

Prochlorococcus marinus Pangenome

17

Listeria monocytogenes Pangenome

Staphylococcus aureus Pangenome

15

Pangenomes

  • We need better ways to

    • represent and browse through thousands of genomes

    • represent an organism


The integrated microbial genome img systems

Metagenome Analysiswith Pangenomes

Best Blast Hit

Reference Genome

Pangenome


The integrated microbial genome img systems

Challenges we face

  • 2. DATA QUALITY

  • Did we generate enough data to support biological conclusions?

  • Did we introduce any biases during sequencing?

  • Is the quality of assembly comparable between different datasets?

  • Is the quality of predicted genes comparable between different datasets?

  • Is the quality of functional annotation comparable between different datasets


The integrated microbial genome img systems

Microbial Genomes

Gene Prediction Quality Assurance

GenePRIMP

http://geneprimp.jgi-psf.org

Gene Prediction Improvement Pipeline

GenePRIMP is a pipeline that consists of a series of

computational units that identify erroneous gene

calls and missed genes and correct a subset of the

identified defective features.

APPLICATIONS

  • Identify gene prediction anomalies

  • Benchmark the quality of gene prediction algorithms

  • Benchmark the quality of combination / coverage of sequencing platforms

  • Improve the sequence quality

Pati A. et al, (2010) Nature Methods

Amrita

Natalia


The integrated microbial genome img systems

Challenges we face

  • 3. DATA STANDARDS

    • Assembly

    • Gene Finding

    • Functional Annotation

    • Metadata


The integrated microbial genome img systems

Project Catalog & Metadata

Genomes OnLine Database

I. Pagani

D. Liolios


The integrated microbial genome img systems

COMPUTATIONSM5: Pilot Project with ANL

innovation through collaboration

Building a roadmap for a scaleable and sustainable computing MetaInfrastructure for the metagenomics community

  • develop standards to share and process data more effectively

  • run data-intensive workflows once (reduce wasted cycles)

  • Develop a single QC data processing pipeline

  • Develop a single data submission entry

  • Develop a single data processing pipeline

  • Develop a common project catalog


Standards in genomic sciences http standardsingenomics org

Standards in Genomic Scienceshttp://standardsingenomics.org


The integrated microbial genome img systems

Ongoing Developments

New Data & Tools for Visualization & Analysis of

  • Integration of Expression data

  • Integration of Regulatory Data

  • Resequencing data (strain variation)

  • Pangenomes

    Data Processing

  • Short Read annotation

  • Bypass the all vs all Blast bottleneck


  • Login