1 / 101

Towards the virtual organism

Towards the virtual organism. PART I: Databases and tools for biochemical pathways PART II: Relating expression data and pathways PART III: Guided Tour: elucidate organelle-related pathways. Pathway diagram. WIT database. Major contributions of Pathways databases.

thalia
Download Presentation

Towards the virtual organism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards the virtual organism • PART I: Databases and tools for biochemical pathways • PART II: Relating expression data and pathways • PART III: Guided Tour: elucidate organelle-related pathways

  2. Pathway diagram WIT database

  3. Major contributions of Pathways databases Without context and purpose, information is mere data . - Clement Mok • Information Resource - Literature compilation • Gene Ontology • Sequence and Genome Annotation • Relationship between pathways (function) and chromosomal position • Analysis of Gene Expression Arrays • Understanding Cellular Dynamics • Disease Process Modeling

  4. As when a highly connected node in the internet breaks down, the disruption of p53 has severe consequences. Jeong et al. 2001 Nature

  5. Towards the virtual organism Introduce biochemical pathways resources • What Is There (WIT/PUMA/EMP/ERGO) • Kyoto Enzyclopedia of Genes and Genomes (KEGG) • Signalling Databases • Pathways Database (PathDB) Focus on • Accessability • Database contents and models • Query features • Gene/Protein/Pathway analysis • Visualization Why do all these projects the same thing?

  6. Why do all these projects seem to do the same thing? • Data model is a view of the world • Different database management systems • Tools particular to data model and database management systems • Different content • Analogous to model system approach to biology • E.coli, yeast, C.elegans, Drosophila, Mouse, etc. are all used to provide understanding of human biology • No one system does everything, but concepts and data can often be shared He may have stole that song from me, but I steal from everybody. - Woody Guthrie

  7. WIT/PUMA/EMP System • Argonne National Lab and Integrated Genomics Inc, USA • http://wit.mcs.anl.gov/WIT2/ • Ross Overbeek, Evgeni Selkov, Natalia Maltsev • Team: 7 • WIT is freely downloadable (ftp://ftp.mcs.anl.gov/pub/Genomics/WIT2/)

  8. WIT/PUMA/EMP System Focus on: sequence analysis, annotation of genomes with respect to metabolism • Annotation/Literature database • Blast, PSI-Blast • ClustalW • COG • ProtScale • Transmembrane helices/topology • Prodom • ProSite • Operons (Pairs of close bidirectional best hits)

  9. Ways to go: from genes to pathways Starting from - • Gene/protein sequence • Gene/protein name • Organism/Genome (‘Metabolic reconstruction’) To Pathways of - • Metabolism • DNA • Regulation of metabolism

  10. From Blast results to genes

  11. From genes to pathways

  12. WIT Pathway diagrams:Tabular format

  13. WIT Pathway Diagrams:Picture Links to further information

  14. WIT Detail pages:Enzyme Name, Reaction EC, Description 4788 3304 Specific Activity 6502 Preparative Protocol 6306 Substrates, Coenzymes, Inhibitors, Modification, Kinetics, Genomes …. 6914 39 9500

  15. Kyoto Encyclopedia of Genes and GenomesKEGG • Institute for Chemical Research, Kyoto University • http://www.genome.ad.jp/kegg/ • Minoru Kanehisa • System development: 9 • Data entry and curation: 18 • Academic users may freely download the package • ftp://kegg.genome.ad.jp/mirror/

  16. KEGG: Data content and statistics • 3705 EC numbers • 11132 Enzyme names • 3794 Substrates • 5284 Metabolic reactions • 113 Pathways • mostly metabolic • 36 Organisms

  17. KEGG: Query capabilities Focus on: display gene-centric data in the context of predefined pathways • Reconstruct pathway maps using blast • Search and color genes, enzymes and compounds in pathway diagrams and ortholog tables • Sequence: blast and fasta • Genome Maps • Generate reaction paths between compounds

  18. ´State of the Art´ static Network manually compiled manually drawn textbook knowledge KEGG picture of the glycolysis genes present in E. coli

  19. Representation of Networks static Network manually compiled manually drawn textbook knowledge dynamic Network features complete knowledge restriction of content is up to the user experimental data can be reflected in net structure include user-owned data versus

  20. Pathway related projects KEGG Metabolic Pathways EMP - Enzymes and Metabolic Pathways WIT - Metabolic Reconstruction UM-BBD - Microbial Biocatalysis/Biodegradatation EcoCyc - E. coli Genes and Metabolism SoyBase - Soybean Metabolism Metalgen - Genes and Metabolism Boehringer Mannheim - Biochemical Pathways IUBMB-Nicholson Minimaps PathDB - Plant Metabolic Pathways Metabolic Pathways Regulatory Pathways • KEGG Regulatory Pathways • SPAD - Signal Transduction • CSNDB - Cell Signaling Networks • Yeast Pathways in MIPS • Interactive Fly - Drosophila Genes • GIF_DB - Drosophila Gene Interactions • FlyNets - Drosophila Molecular Interactions • GeNet - Gene Networks Database • HOX-Pro - Homeobox Genes Database • Wnt Signaling Pathway • TRANSPATH - Gene Regulatory Pathways • GenMapp - Mostly mouse pathways Protein-Protein Interactions • BRITE Database for Biomolecular Relations • DIP - Database of Interacting Proteins • BIND - Biomolecular Interaction Network Database

  21. LIGAND - Chemical Database for Enzyme Reactions ENZYME - Enzymes BRENDA - Comprehensive Enzyme Information System Worthington Enzyme Manual Klotho - Biochemical Compounds ChemFinder - Searching Chemicals ChemIDplus at NLM PROMISE - Prosthetic Groups and Metal Ions GlycoSuiteDB - Glycan Structure Database CarbBank - Complex Carbohydrate Structure Database WebElements - Periodic Table Enzymes, Compounds Transcription Factors • TRANSFAC - Transcription Factor Database • RegulonDB - E. coli Transcriptional Regulation • DBTBS - B. subtilis Transcription Factors • DPInteract - DNA binding proteins Nomenclature - General • IUBMB - Nomenclature • IUPAC - Nomenclature • SWISS-PROT - Documents • GO - Gene Ontology (FlyBase/SGD/MGD/TAIR/WormBase)

  22. Simulation of biochemical reactions and cellular process • BioKin - Enzyme kinetic software • BioQuest - Metabolic Simulation • BioSpice - still in progess • Bioxml.org - a site collecting together a number of biologically-oriented open-source projects • DBsolve - Software for metabolic, enzymatic and receptor-ligand binding simulation • DMSS - Scalable, Discrete Event Metabolic Simulation System • E-Cell - A simulation platform for the modelling of cells at a molecular level • Electronic Arc - experimental visual simulator • Elementary Modes - has a Java simulation • Gepasi - A software package for modelling systems of biochemical reactions • Jarnac - A language for describing and manipulating cellular system models • StochSim - A general-purpose stochastic simulator of biological reaction networks. • Systems Biology Workbench - An XML based integration system • Virtual Cell - A general computational framework for modeling cell biological processes

  23. Signal transduction browser (Transpath)

  24. Signal transduction browser (Transpath)

  25. Signal transduction browser (Transpath)

  26. PathDB • National Center for Genome Resources • http://www.ncgr.org/software/pathdb/ • Jeff Blanchard • Software Development: 5 • Literature Curation: 4 • The software is freely available (Client) • The database server can be installed at the site of cooperation partners

  27. PathDB data model • Compounds • Macromolecules: lipids, polysaccharides • Information molecules: DNA, RNA • States: development, disease, genotype, phenotype, environment • metabolic reactions • protein modifications and interactions • Regulation: transcriptional, translational, posttranslational • Transport • biological hierarchies, ontologies • incomplete and conflicting knowledge

  28. PathDB datamodel Location BiolProcess Genotype Phenotype Environment Attributes Subunit Protein Compound DNA RNA Building Blocks Construction of Entities Mediator Biochemical Entity Substrate Step Product Transition of Entities

  29. Platform for Network Analysis Focus on: building custom networks, compare to large scale experiments • Relational database for metabolic reactions, regulation and states (disease, genotype, phenotype) • QueryTool • Query the database, e.g. to collect a set of reactions • transform between types: proteins, compounds, steps • restrict to attributes: organism, location, states • PathwayViewer • Visualize the results of the search

  30. Query window showing “Proteins involved in Biological process DNA repair”

  31. Transform to ‘Phenotype’ • Select ‘Caffeine Sensitivity’ and get all Proteins • Do Intersection and get all Steps

  32. PathwayViewer • Inspect and manipulate pathways or routes between metabolites. • Alternate topological representations of a pathway: primary and secondary metabolites • Manipulate layout on screen • Control how much data is displayed • Automatically lays out pathways • hierarchical or circular algorithm • Visualization of gene expression and metabolic profiling data

  33. Visualize Steps involved in DNA synthesis and Caffeine sensitivity

  34. Exploring the network neighborhood- build pathways on the fly 2 1 3

  35. CSNdb KEGG Medline aMAZE BRENDA Knowledge BIND BRITE WIT PathDB DIP Metabolism Regulation Ontologies Sequences Annotation Large-Scale Experiments What datasources are out there ? GO UMLS/MESH MBO EcoCyc Gene expression MIPS SW Protein-Protein GenBank Protein expression Protein-SmallMol EMBL Metabolic profiling

  36. Translation/Mapping between: Cellular Location Anatomy Biological Process Molecular Function GO Gene Ontology, 2000 Ontology: Bind genes to hierarchies

  37. Browsing the ontology

  38. Hierarchy of Complexity disease states development states phenotype macro micro organelles cell types, tissues protein, RNA, DNA, compounds molecular molecular micro macro mitosis apoptosis transcription disease development environment metabolic reactions protein-protein Interactions conformation change Entities or States Processes

  39. PathDB Complete Wiring Diagram Reference experimental support Processes/Entities and experimental support Knowledge Metabolism Regulation Ontologies Protein-Protein Annotation Sequences Protein-SmallMol Gene expression Protein expression Metabolic profiling Large-Scale Experiments

  40. Questions • What is the difference between between a normal and a cancer cell? • What is the effect of a knockout mutation on the cellular network? • What “classical” pathways are up or down regulated in my gene expression data? • How well does my set of gene expression arrays support my model of cellular processes? • How does a drug perturb a cellular network as judged through gene expression data? • What experiment promises to distinguish between contradictory hypotheses?

  41. PART II Relating gene expression and pathways

  42. Analysis of Expression Data Clustering of time courses Iyer et.al., Science, 1999 „Scatter plot“ comparing two experiments Roberts et.al., Cell, 2000

  43. Using pathways to contextualize gene expression arrays Miki et al. PNAS, 2001

  44. Expression Pattern Clustering J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway

  45. Mapping of Jexpress Cluster onto Pathways sce00051 Fructose and mannose metabolism EC 3.1.3.46 Fructose-2,6-bisphosphate 2-phosphatase; Fructose-2,6-bisphosphatasesce00190 Oxidative phosphorylation EC 1.9.3.1 Cytochrome-c oxidase; Cytochrome oxidase; Cytochrome a3; Cytochrome aa3 EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase EC 3.6.1.38 Ca2+-transporting ATPase; Calcium pumpsce00251 Glutamate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00252 Alanine and aspartate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00410 beta-Alanine metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00640 Propanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce00650 Butanoate metabolism EC 2.6.1.19 4-Aminobutyrate transaminase; beta-Alanine--oxoglutarate transaminasesce03110 ATP Synthase EC 3.6.1.34 H+-transporting ATP synthase; H+-transporting ATPase; Mitochondrial ATPase; Coupling facotrs (F0-F1 and C0-F1); Chloroplast ATPase; Bacterial Ca2+/Mg2+ ATPase Cluster represents genes of different contexts

  46. Clustering and Incremental Pathway Construction A pathway (10 genes) from five clusters with 57 EC-annotated genes • Genes mapped to reactions • dynamically build networks from reaction DB and clustered genes Fellenberg&Mewes, 99 24 (out of 54) gene clusters (6153 ORFs, 694 EC-annotated) Pathway represents 10 genes out of 500

  47. Principal Component Analysis (PCA) • Eigen Analysis • solve for eigenvalues and eigenvectors of a square symmetric matrix • pure sums of squares and cross products (SSCP) • scaled sums of squares and cross products (Covariance) • sums of squares and cross products (Correlation)

  48. Principal componentsand visualization J-Express B. Dysvik / I. Jonassen, U.Bergen, Norway

  49. Data driven vs hypotheses driven approach • Basic Assumptions ( Pathways Cluster ) • Expression time courses for pathways do not necessarily cluster together • Clustered genes do not necessarily form pathways Expression Data and Pathways • Erroneous and noisy expression data • Many genes, measurements • Many spurious hits/clusters of expression patterns • Incomplete data (measurements, kinetic parameters) • Cost of regulation: partially regulated pathways The data driven approach to Genome and Expression Analysis

  50. Outline of a Hypothesis Driven Approach GPE-Score(Pathway) Biological Knowledge

More Related