1 / 52

SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology

Katy Wolstencroft University of Manchester. SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology. SysMO-DB. DB. A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of Data, Models and experimental protocols

hume
Download Presentation

SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Katy Wolstencroft University of Manchester SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology

  2. SysMO-DB DB A data access, model handling and data integration platform for Systems Biology: • To support and manage the diversity of • Data, Models and experimental protocols • Local data management systems • That promotes shared understanding • Using a common platform and common technologies

  3. Systems Biology Challenges • Interdisciplinary work • Heterogeneous data and models • Modellers and experimentalists have different skills, training, experience • Modellers and experimentalists have different vocabularies and jargon • Working together

  4. Systems Biology of Microorganisms http://www.sysmo.net • Pan European collaboration • Eleven individual projects, 91 institutes • Different research outcomes • A cross-section of microorganisms, incl. bacteria, archaea and yeast • Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way • Present these processes in the form of computerized mathematical models • Pool research capacities and know-how • Already running since April 2007 • Runs for 3-5 years • This year, 2 new projects join and 6 leave

  5. The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

  6. Types of data • Multiple omics • genomics, transcriptomics • proteomics, metabolomics • fluxomics, reactomics • Images • Molecular biology • Reaction Kinetics • Models • Metabolic, gene network, kinetic • Relationships between data sets/experiments • Procedures, experiments, data, results and models • Analysis of data

  7. Started in June 2008 Web-based solution to facilitate: exchange of data, models and processes (intra- and inter- consortia) search for data, models and processes across the initiative maximisation of the "shelf life" and utility of the data, models and processes generated dissemination of results DB SysMO-DB

  8. SysMO-DB Team Hits, Germany Stuart Owen Wolfgang Müller Carole Goble Isabel Rojas Olga Krebs SABIO-RK Katy Wolstencroft University of Manchester, UK Finn Bacall Taverna myExperiment Jacky Snoep JWS Online University of Stellenbosch, South Africa University of Manchester, UK

  9. SysMO-DB PALS team Power Contributors. 21 Postdocs and PhD students Design and technical collaboration team Intense collaboration UK and Continental PALS Chapters Audits and Sharing. Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. 20 questions Deployment into Projects

  10. Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Incremental development Fitting in with normal lab practices

  11. The Lowest Hanging Fruit SysMO SEEK – a catalogue of assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics analyses Publications The catalogue references assets held elsewhere

  12. SEEK screenshot?

  13. Harvesters JERM SysMOLab Wiki COSMIC Alfresco MOSES Wiki BaCell-SysMO Alfresco ANOTHER A DATA STORE

  14. Why not a central Warehouse? • Protective of models • in progress vs published models. • Access and Version management • Curator-Rival conflict • Reluctant to share data • Even within their own projects • Legacy spreadsheets dominate • Curation practices vary • Centralised archive take-up • Point to Point Exchange • People don’t mind sharing methods • People want to advertise publications Nature461, 145 (10 Sept09)

  15. Just Enough Sharing Access Permissions Reusing myExperiment

  16. SysMO-DB Architecture Models Assets and Yellow Pages Catalogues SysMO-SEEK web interface SysMO DB JERM Data Processes

  17. Making use of the Assets • Understanding the content of the data • Linking assets together • Linking assets to experimental context • Running comparisons between data files • Running model simulations • Running data analysis pipelines

  18. What is the JERM? JERM • JERM “Just Enough Results Model” • Minimum information to exchange data • What type of data is it • Microarray, growth curve, enzyme activity… • What was measured • Gene expression, OD, metabolite concentration…. • What do the values in the datasets mean • Units, time series, repeats…. • Which experiment does it relate to? • How does it relate to models? • How was the data created • SOPs and protocols

  19. CIMRCore Information for Metabolomics Reporting MIABEMinimal Information About a Bioactive Entity MIACAMinimal Information About a Cellular Assay MIAMEMinimum Information About a Microarray Experiment MIAME/EnvMIAME / Environmental transcriptomic experiment MIAME/NutrMIAME / Nutrigenomics MIAME/PlantMIAME / Plant transcriptomics MIAME/ToxMIAME / Toxicogenomics MIAPAMinimum Information About a Phylogenetic Analysis MIAPARMinimum Information About a Protein Affinity Reagent MIAPEMinimum Information About a Proteomics Experiment MIAREMinimum Information About a RNAi Experiment MIASEMinimum Information About a Simulation Experiment MIENSMinimum Information about an ENvironmental Sequence MIFlowCytMinimum Information for a FlowCytometry Experiment MIGenMinimum Information about a Genotyping Experiment MIGSMinimum Information about a Genome Sequence MIMIxMinimum Information about a Molecular Interaction Experiment MIMPPMinimal Information for Mouse Phenotyping Procedures MINIMinimum Information about a Neuroscience Investigation MINIMESSMinimal Metagenome Sequence Analysis Standard MINSEQEMinimum Information about a high-throughput SeQuencing Experiment MIPFEMinimal Information for Protein Functional Evaluation MIQASMinimal Information for QTLs and Association Studies MIqPCRMinimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMinimal Information Required In the Annotation of biochemical Models MISFISHIEMinimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDAStandards for Reporting Enzymology Data TBCTox Biology Checklist BioPAX : Biological Pathways Exchangehttp://www.biopax.org/ FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions Minimum Information Models http://www.mibbi.org/index.php/MIBBI_portal

  20. The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data ISA-TAB 1 Define a JERM….. • Top down analysis of standards • Bottom up analysis of practice 2 Generate and apply…. • JERM template • JERM extractor for data host • Subset registered in SEEK • Access / export through JERM interface / template 3

  21. SEEK + JERM JERM Experimental Data Metadata People Investigation Homogenised terminology and values in the datasets themselves Study Projects Assay Models Experimental conditions SOPs Factors studied Workflows Based on ISA-TAB

  22. For publishing JERM data needs to be related to SOPs, experimental context (ISA) and other data JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant

  23. ISA-TAB Relating data and its experimental context Investigation, Study, Assay TAB = tabular A format suitable for spreadsheets http://isatab.sourceforge.net/

  24. ISA Provides.... A common framework for relating different types of data e.g. microarrays and proteomics Facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies

  25. Identifying Biological Objects • What do you have in your data? • Proteins/enzymes, genes/expression levels, metabolites • Where/how do these objects interact? • Pathways, flux, experimental conditions • What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies

  26. BioPortal Integration for Searching • Repository for submitting and sharing Biological ontologies http://bioportal.bioontology.org/ • Search for concepts across all or selected ontologies • BioPortal provides a number of Restful Webservices • Search • Concept lookup • Visualisation • Integrated within SEEK as a plugin

  27. Tools to help manage data:Annotation standards by stealth Controlled vocabulary plug in BioPortal

  28. Following Standards • We recommend formats but we do not enforce them • Protocols and SOPs – Nature Protocols • Data – JERM models and community minimum information models • Models – SBML and related standards • Publications – PubMed and DOI • If you follow the prescribed formats, you get more out, but if you don’t, you can still participate • lowering the adoption barrier

  29. Off the shelf • Except for the JERM, we have only used community resources, vocabularies and services You can get a long way by implementing community practices and providing ways to integrate them

  30. SysMO-DB and Models

  31. Nicolas Le Novere, Data Integration in the Life Sciences, Manchester, 2009

  32. Models: Incentives for using Standards Models can be shared in SysMO-SEEK in any format SBML is the recommended format We also recommend MIRIAM compliance and SBO annotation If you use SBML, you can use JWS Online to run simulations in SEEK

  33. Screenshot of JWS Online • JWS Online Plugin • online simulator, runs in your browser • upload models in SBML format • Web Service enabled • SBGN schemas, with annotations and external links

  34. http://www.semanticsbml.org/aym Falko Krause, Humboldt-University, Berlin

  35. Models Resources Models can be published in public repositories JWS-Online, BioModels Models can be annotated SBML, MIRIAM, SBO No public resources currently for sharing models with associated data, or for loading new data into models

  36. Linking Data to Models Relating data and models Where did the data come from for developing the model? Where did the data come from for validating the model? What were the results of model simulations?

  37. Current Functionality in SEEK • Show all data used for construction together with the model, such that process can be repeated • Uploaded models loaded with this data by default • Manually alter parameters and run simulations

  38. Next Steps: Model Validation • Test/compare model with experimental data for complete system • Find data in SEEK • Upload data from elsewhere • Automatically load into model • Run simulations and compare with original results • JERM for models • Mapping tools – allows you to identify columns/rows in spreadsheets containing the right information

  39. ISA for Models • Modelling and experimental work intersect • Investigations, Study, Assay.....or modelling analysis..... • Modelling analysis types • Metabolic models, gene networks • Modelling type • ODE, algebraic • Studies – combinations of experimental assays, modelling analyses, and informatics analyses

  40. SysMO-DB the e-Laboratory An e-Laboratory is an information system for bringing together people, data and analytical methods at the point of investigation or decision-making

  41. Current Status Finding things so that we can compare them Understanding who has what Understanding what can be compared with what – the experimental context

  42. Where we are going… A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models

  43. Workflows from myExperiment • Data preparation, annotation and analysis • Systems Biology workflow Pack on myExperiment Microarray analysis and text mining Created by Afsaneh Maleki-Dizaji from SUMO, University of Sheffield Based on previous work by Paul Fisher, University of Manchester http://www.myexperiment.org/workflows/187

  44. SEEK as a data analysis and meta analysis service • SBML model construction and population • Calibration workflow • Data requirements • Parameterised SBML model • Experimental data • Metabolite concentrations from key results database • Calibration by COPASI web service Peter Li

  45. Data analysis and meta analysis SEEK Analysis Service with pre-cooked analysis tools. • Calibration workflow • Data requirements • Parameterised SBML model • Experimental data • Metabolite concentrations from key results database • Calibration by COPASI web service Load model: Load data: GO Peter Li

  46. New Directions

  47. Opening SysMO Out • Using SysMO as a dissemination space for the SysMO consortium • Supplementary material in publications • Data citation • Packaging software so that others can use it • Easy to install a SEEK for yourself • Packaging and exchanging JERM Templates • Helping with standardisation • Promotion and example work with SBRML and data and models linkage

  48. SysMO-DB Approach in Other projects • SysMO2 – new projects and legacy • EraSysBio+ • Lungsys and SBCancer • Virtual Liver

  49. New Considerations • Eukaryotic organisms • Interactions between host and pathogen • Human disease • multicellular interactions, tissues, organs • multiscale modelling

  50. Outstanding Issues Keeping data at project sites has responsibilities • Reliability - Sites available continuously and promptly • Support - Must be proof against virus attacks, etc. • Archiving - Beyond the lifetime of the project.

More Related