1 / 31

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium. Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany. Pan European collaboration. Systems Biology of Microorganisms.

galvin
Download Presentation

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany

  2. Pan European collaboration. • Systems Biology of Microorganisms. • The transition from growing to non-growing Bacillus subtilis cells • Energy and Saccharomyces cerevisiae • Biology of Clostridium acetobutylicum • Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae http://www.sysmo.net

  3. Eleven individual projects, 91 institutes • Different research outcomes • A cross-section of microorganisms, incl. bacteria, archaea and yeast. • Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way • Present these processes in the form of computerized mathematical models. • Pool research capacities and know-how. • Already running since April 2007. • Runs for 3-5 years. http://www.sysmo.net BaCell-SysMO  COSMIC  SUMO  KOSMOBAC  SysMO-LAB  PSYSMO  Valla  MOSES  TRANSLUCENT  STREAM  SulfoSYS

  4. The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

  5. Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. • Many do not have data, or follow the standards that exist or know who is doing what. • Much of the data cannot be compared • Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

  6. Started July 2008, 3 years, 3+3 people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. DB SysMO-DB

  7. Principles… • A series of small victories Low hanging fruit and early wins • Realistic Ease real pressure points and concerns • Don‘t reinvent (1) Borrow, link up, spread around what the consortiums already have. • Don‘t reinvent (2) Use what is already available in the open community and off the shelf • Sustainable Flexible, extensible and open • Migrate to standards Encourage standards adoption

  8. Modellers Experimentalists Minimum exchange Minimum exchange Minimum exchange Bioinformaticians Minimum exchange

  9. Social Approach • Questionnaires • Ranked projects Bronze, Silver, Gold and Platinum • PALS • 18 Postdocs and PhD students • All three kinds of people • Our design and technical collaboration team • Very intense face to face and virtual collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

  10. Technical Approach Models Assets and Yellow Pages Catalogues SysMO-SEEK web interface SysMO DB Experimental data Processes Consortium Datasets Spreadsheets SOPs Public Datasets JWS Online Workflows

  11. Discovery SysMO-SEEK • Single, web based, access point • Single sign-on access control & versioning management • Single search point over yellow pages and assets catalogue • People, Expertise, SOP, Equipment • Metadata about Data – spreadsheets and databases • Models (JWS Online), workflows (myExperiment), public web services (BioCatalogue) • Call out to external resources (e.g. PubMed) Does not hold results; holds metadata on results and links to results – pilot COSMIC consortium A component for SysMO groups to incorporate in their own environments and applications

  12. SysMO SEEK (20 questions) Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?

  13. Models Publish, manage, run, validate SBML models • Database of curated models and a model simulator • Web service enabled to run from workflows • Separate password protected websites for each project • Through SEEK…. • Special instance of JWS Online for SysMO • Validate and run models from SysMO-SEEK and publish later. • Access control as do for other assets • Access to other resources (Biomodels, Copasi) • Semantic SBML from TRANSLUCENT project • SBML and MIRIAM education

  14. Experimental Processes • Protocols and SOPs • SOPs assets deposited or linked to • SOP gathering • Nature Protocols format recommendation • High level classification for indexing and tagging • Got a few, need more.

  15. Experimental Processes • Protocols and SOPs • SOPs assets deposited or linked to • SOP gathering • Nature Protocols format recommendation • High level classification for indexing and tagging • Got a few, need more. Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

  16. Experimental Processes Deposition

  17. Bioinformatics Processes: Workflows • Automated, repeatable and shareable specification for linking and running multiple computational tasks. • Transparent provenance log of execution and results. • Chaining together distributed analysis tools and data sources: Annotation pipelines, data analysis pipelines, text mining, data integration, simulation sweeps • SBML model construction and population • Data sets and tools accessible to a workflow engine – Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Workflow Management System Free and Open Source

  18. Manipulation of SBML models in workflows • libSBML: data integration & constructing and annotating SBML models

  19. Already in use by individual groups for Research • Ramp up when more data resources become workflow accessible • Libraries of SysMO workflows

  20. Experimental Data Comparison and Exchange myDB • Public data sources • model organism databases – (e.g. SGD) • BRENDA …. • Data produced by SysMO • SABIO-RK, iChiP, MeMo …. • Local databases & Files • Remain at the sites and retain control in the groups. • Excel Spreadsheets • The most common form of experimental data format. • SEEK repository asset BRENDA SABIO-RK Metadata mySpread Sheet

  21. Just Enough Results Model Access Control SysMO SEEK JERM Web Service Access Interface JERM Extractor and Access Wrapper myDB • Minimum metadata for SysMO exchange; what an experiment is. • Extract metadata from datasets for the Assets catalogue - exchange • Ontologies and controlled vocabularies for annotation • Expose data results through a JERM interface – access • Access controlled by consortiums, groups and individuals • Harvesting standards, current practice and consortium schemas and spreadsheets • Inspired by MCISB Key Results initiative and SBRML [Paton] BRENDA SABIO-RK Metadata mySpread Sheet

  22. JERM First Cut General What type of data is it: Microarray, growth curve, enzyme activity… What was measured: Gene expression, OD, metabolite concentration…. What do the values in the datasets mean: Units, time series, repeats… Each data type has a different “minimal model” Phase 1 - Microarray and Metabolomics Careful mapping to the MIBBI standards (e.g. MIAME) Data Type Specific Experiment binding Each individual results set is bound to an experiment/ investigation for exchange across different types of data

  23. Local Spreadsheet respository Controlled deposit in spreadsheet repository Corresponding JERM schema SysMO Seek; Assets catalogue Controlled vocabulary plug-in Tag Metadata of the file and Information about what is measured XML User's local file store Source and sink for workflows

  24. JERM Exchange Pilot Spring 2009 BaCell-SysMO COSMIC SysMO-LAB MOSES “20 questions”

  25. Discovery, Access Annotation & Collaboration Results Cache Access Control SysMO SEEK Integration Taverna Workflows Bio Catalogue Access Control JERM Web Service Access Interface Service Interface Web Service Access Interface JERM Ext & Wrap Assets JWS Online Repositories & Resources Metadata Metadata Metadata Metadata Metadata myExperiment Yellow Pages SABIO-RK Models SysMO Data Workflows External Resources

  26. Related initiatives and sources • OpenWetWare • Cold Spring Harbor Protocols • MIBBI • National Centre for BioOntologies • OBO Foundary • Wikipathways • Pathway commons • Straininfo • ONDEX • Pubmed

  27. Training and Know-how • SysMO-DB • Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata. • Kick-starting toolkits, workflows and SOP templates • Summer schools • SysMO consortium (esp. PALS) • Social networking for shared content, know-how and best practice • Contribution • Best of breed solutions in place already

  28. Summary • SysMO-DB is an exercise in: • Sensitively retrofitting a data access, model handling and data integration platform. • Supporting the diversity of data, models and competencies • Social mediation and manipulation • Towards Just Enough™ exchange

  29. Acknowledgements • SysMO-DB Team • SysMO-PALS • myGrid, EML and JWS Online teams • OMII-UK, Uni Southampton • EBI, MCISB

  30. Links • myExperiment: http://www.myexperiment.org • Taverna: http://www.mygrid.org.uk • JWS Online: http://jjj.biochem.sun.ac.za/ • SABIO-RK http://sabio.villa-bosch.de/

More Related