1 / 36

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium. Katy Wolstencroft, University of Manchester, UK. Systems Biology of Microorganisms. http://www.sysmo.net. Pan European collaboration Eleven individual projects, 91 institutes Different research outcomes

baker-avila
Download Presentation

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

  2. Systems Biology of Microorganisms http://www.sysmo.net • Pan European collaboration • Eleven individual projects, 91 institutes • Different research outcomes • A cross-section of microorganisms, incl. bacteria, archaea and yeast • Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way • Present these processes in the form of computerized mathematical models • Pool research capacities and know-how • Already running since April 2007 • Runs for 3-5 years

  3. The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

  4. Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia) search for data, models and processes across the initiative dissemination of results SysMO-DB

  5. Own data solutions and collaboration environments. Wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians Many do not follow standards that exist or know who is doing what. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

  6. Types of data • Multiple omics • genomics, transcriptomics • proteomics, metabolomics • Images • Reaction Kinetics • Models • Relationships between data sets/experiments • Procedures, experiments, data, results and models • Analysis of data The same across many Systems Biology projects

  7. Principles… • A series of small victories • Realistic • Don‘t reinvent • Sustainable and extensible • Migrate to standards • Provide instant gratification • Address doubt and anxiety • Incremental development

  8. The Lowest Hanging Fruit A Catalogue of SysMO assets • SysMO Yellow Pages • The people and their expertise • The institutions and their facilities • Data – experimental data sets • Data – analysed results • Data – external reference data sets • Models • Processes – laboratory protocols and bioinformatics analyses The catalogue references assets held elsewhere

  9. Technical Approach Models Assets and Yellow Pages Catalogues SysMO-SEEK web interface SysMO DB JERM Data Processes

  10. Social Approach • PALS • 21 Postdocs and PhD students • Experimentalists, modellers and bioinformaticians • Our design and technical collaboration team • Very intense face to face and virtual collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

  11. Communication via PALs Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team PALS Projects

  12. Discovery SysMO-SEEK • Single, web based, access point • Access control & Versioning management • Yellow pages (“who is who”) • People, Expertise, Equipment • Assets catalogue (“who has what”) • SOPs, Spreadsheets, pre-published models • Metadata about Data held by projects • Access to other repositories • Models (JWS Online), • Workflows (myExperiment), • Public web services (BioCatalogue) • Call out to external resources • e.g. PubMed Does not hold data and results Holds metadata on results and links to results A component for SysMO groups to incorporate in their own environments and applications

  13. Sharing Policies • Default private until you say otherwise • Project defaults • Private • Share with the group • Share with project • Share with sysmo

  14. “Just Enough” Exchangeof SysMO Assets

  15. Experimental Processes • Protocols and SOPs • Nature Protocols format recommendation • You can upload Protocols in any format, but if you use this one, we will index it and make searching easier • Encouraging standardisation Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

  16. Bioinformatics Processes: Workflows • Data preparation, annotation and analysis pipelines • SBML model construction and population • Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) • Workflows as a mechanism for linking inside SEEK Workflow Management System Free and Open Source

  17. Libraries of SysMO workflows

  18. Models SBML is the recommended format Not all models are SBML JWS online allows storing and simulation of SBML models But - all models need to be shared JWS Online doesn’t have version and access control Models can be shared in SEEK instead of directly in JWS online Can still connect to JWS online and run simulations

  19. Models JWS online – a database of curated models and a model simulator Web service enabled to run from workflows Used and accessed through SEEK…. Special instance of JWS Online for SysMO Store, validate and run models from SysMO-SEEK and publish later Access to other models resources Biomodels, Copasi and Semantic SBML

  20. Data Comparison and Exchange Microarray Metadata Metabolomics Proteomics Proteomics Single Cell Data • Public data sources • model organism databases – (e.g. SGD) • BRENDA …. • Data produced by SysMO • SABIO-RK, iChiP, MeMo …. • Local databases & Files • Excel Spreadsheets • The most common form of experimental data format. Variable descriptions of data Little adoption of community controlled vocabulary terms

  21. JERM JERM • JERM “Just Enough Results Model” • Minimum information to exchange data • What type of data is it • Microarray, growth curve, enzyme activity… • What was measured • Gene expression, OD, metabolite concentration…. • What do the values in the datasets mean • Units, time series, repeats…. • Which experiment does it relate to • How was the data created • SOPs and protocols • Harvesting standards, current practice and consortium schemas and spreadsheets • Inspired by MCISB Key Results initiative and SBRML [Paton]

  22. The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data 1 ISA-TAB Define a JERM….. • Top down analysis of standards • Bottom up analysis of practice 2 Generate and apply…. • JERM template • JERM extractor for data host • Subset registered in SEEK • Access / export through JERM interface / template 3

  23. JERM Adaptors JERM SysMOLab Wiki COSMIC Alfresco MOSES Wiki BaCell-SysMO Alfresco ANOTHER A DATA STORE

  24. Just Enough Results Model Tools Access Control JERM Web Service Access Interface JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor BRENDA SABIO-RK Metadata myDB mySpread Sheet • JERM Source Extractor Generator • New spreadsheets adopt JERM templates • Legacy spreadsheet JERM mapper • Databases have JERM mapper • Spreadsheet Ontology Annotator • Restrict the values that a range of fields can have

  25. SEEK + JERM JERM Experimental Data Metadata People Investigation Homogenised terminology and values in the datasets themselves Study Projects Assay Models Experimental conditions SOPs Factors studied Workflows Based on ISA-TAB

  26. Incremental Annotation Metadata can be added to assets at any time • Extracted from JERM templates • Added by the data owner through SEEK • Added by another SysMO consortium member with editing permission

  27. In Practice for Spreadsheets JERM Native + JERM Template JERMed + +

  28. Now + + browse search Register Extract Matched to the JERM Adding metadata Whole record

  29. Near future JERM + + + browse search Register Extract Matched to the JERM Adding metadata here Whole record Filtered record Enriched record

  30. Future Collections of Records + + + browse search Register Extract Matched to the JERM Adding metadata here Meta-analysis

  31. What we have done.. SBML JERM Workflow Management System Yellow Pages SysMO-SEEK web interface Assets Catalogue Search JERM SysMO DB Workflow Repository JERM Nature Protocols SOP Repository Consortium Data Public data Models Processes Sops and Workflows Models Repository JWS Online Spreadsheet Repository

  32. Outstanding Issues Keeping data at project sites has responsibilities • Reliability - Sites available continuously and promptly • Support - Must be proof against virus attacks, etc. • Archiving - Beyond the lifetime of the project. What happens when a project is no longer part of the SysMO consortium

  33. Lessons • Find a solution that fits in with current practices • Start simple, show benefits, add more • Engage with the people actually doing the work • PhD students, Post-docs • Let the scientists retain control over their data and who can see it • Don’t reinvent. Use available vocabularies, minimal model standards • Help prevent people duplicating work by linking the people as well as the resources

  34. Acknowledgements • SysMO-DB Team • SysMO-PALS • myGrid, EML and JWS Online teams • OMII-UK, Uni Southampton • EMBL-EBI, MCISB

More Related