1 / 16

IMG terms and pathways

IMG terms and pathways. Krishna Palaniappan Amy Chen Frank Korzeniewski Yuri Grechkin Ernest Szeto Victor Markowitz. Natalia Ivanova Iain Anderson Thanos Lykidis Nikos Kyrpides. MGM Workshop May 16, 2012. New: SEED subsystems Transport DB, Phenotypes. Why so many?

yukio
Download Presentation

IMG terms and pathways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IMG terms and pathways Krishna Palaniappan Amy Chen Frank Korzeniewski Yuri Grechkin Ernest Szeto Victor Markowitz Natalia Ivanova Iain Anderson Thanos Lykidis Nikos Kyrpides MGM Workshop May 16, 2012

  2. New: SEED subsystems Transport DB, Phenotypes Why so many? What’s the difference? Which one should I use?

  3. Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … Where it all comes from This information is recorded in a structured way: • ontologies (e.g. Gene Ontology) • pathway collections(metabolic and protein-protein interaction) • other (reasoning rules, like TIGR Genome Properties)

  4. Genes are connected to phenotypes via a multi-step process, with many parameters We have very vague ideas about the steps/parameters for the majority of genes/phenotypes If we design a relational database for gene/phenotype connections, most tables will be empty Modeling the data properly – why nobody does that phenotype gene pathway transcript reaction protein enzyme compounds evidence

  5. KEGG http://www.genome.jp/kegg/ MetaCyc http://metacyc.org/ What it looks like in real life – KEGG vs MetaCyc

  6. Plus 4 more entries: for 1.14.99.39 for each subunit Ammonia oxidation pathway in KEGG

  7. Similar problems to KEGG: multifunctional enzymes multisubunit enzymes differences in reaction recording The same pathway/reaction in MetaCyc

  8. Which subunit has which cofactor? Type of Cu2+ cluster, type of Fe2+ cluster? One of the subunits is a cytochrome c, yet the enzyme is cytosolic? Does it require any help with maturation of metal clusters? • Pseudomonas sp. PB16 was shown to have only 1 enzyme from the pathway, hydroxylamine reductase. Does it have the entire pathway? Even MetaCyc record is still incomplete

  9. Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … Even bigger mess: bioinformatics inference What about gene B in genome Y, which is similar to gene A?

  10. If GenBank record says nothing about gene B annotation protocol, the annotation must be correct If GenBank record says the gene was manually annotated, the annotation must be correct If GenBank record says gene B was manually annotated, and it has a bi-directional best BLAST hit to gene A with e-value of 1.0e-5, the annotation must be correct … “True or false?” game

  11. Orthology detection: fails on many families with deviation from vertical transmission BLAST is agnostic of which amino acids are more important for protein function Using consensus sequence (either as PSSM or HMM) with family-specific bit score cutoffs would be much better, but cannot be used in current implementation of KEGG Weaknesses

  12. Pathway collections: KEGG, MetaCyc and others Which particular set of interactions is a pathway? (i. e. how do we define pathway boundaries within the network?)

  13. All pathway collections share a common skeleton of reactions, which consist of reactants (compounds) All reactions share the common base of proteins annotated as catalysts Can we merge the information from different collections, using the best features of all of them? Ideal solution: pathway NR

  14. A B Not an IMG term! R1 Enzyme (EC x.x.x.x) IMG term of the type “Protein complex” Enzyme (EC x.x.x.x) monomeric, needs cofactor C Enzyme (EC x.x.x.x) heterotrimeric, needs cofactor D C R2, spontaneous R4, chaperone Enzyme (EC x.x.x.x) heterotrimeric, subunit C IMG term of the type “Modified protein” Enzyme (EC x.x.x.x) monomeric precursor Enzyme (EC x.x.x.x) heterotrimeric, subunit B Enzyme (EC x.x.x.x) heterotrimeric, subunit A IMG term of the type “Gene product” IMG term of the type “Gene product” D R3, spontaneous Enzyme (EC x.x.x.x) heterotrimeric, subunit A precursor IMG terms: 3 types • IMG terms of 3 types:1. gene product2. multi-subunit protein complex3. modified protein

  15. Protein-protein interaction pathways: same model

  16. You’ve been warned!

More Related