1 / 32

SysMO-DB: A Community-Based Approach to Data Sharing

SysMO-DB: A Community-Based Approach to Data Sharing. Dr Katy Wolstencroft University of Manchester. SysMO-DB. A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding

caraf
Download Presentation

SysMO-DB: A Community-Based Approach to Data Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SysMO-DB: A Community-Based Approach to Data Sharing Dr Katy Wolstencroft University of Manchester

  2. SysMO-DB A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding Using a common platform and common technologies Started July 2008 DB

  3. SysMO-DB Dev Team Carole Goble Sergejs Aleksejevs Wolfgang Müller Heidelberg Institute for Theoretical Studies Germany Olga Krebs University of Manchester, UK Katy Wolstencroft Finn Bacall Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK Franco B du Preez

  4. Pan European collaboration Eleven individual projects, 89 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave Systems Biology of Microorganisms http://www.sysmo.net

  5. Challenges Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different skills, training, experience Scientists want to remain in control Social and technical challenges

  6. Social Challenge: Focus Group Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team Focus Group Projects

  7. Focus Group SysMO-DB PALS 21 Postdocs and PhD students Modellers, experimentalists and bioinformaticians Design and technical collaboration team Intense collaboration UK and Continental PALS Chapters Audits and Sharing. Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. 20 questions Deployment into Projects

  8. Technical Challenge Rapid and incremental development Just enough and just in time , not Just in case No reinvention Driven by the PALs Sustainable and extensible Migrate to standards Fitting in with normal lab practices

  9. What do we share Nature Protocols Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References + + Results Methods Data All SysMO Assets

  10. What do we share Protocols for Models Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References + + Methods Models Data + Results All SysMO Assets

  11. A Tree View of Assets SOP SOP SOP Investigation Studies Assay ISA infrastructure provides a directory structure for experiments http://isatab.sourceforge.net/ Construction Validation

  12. Expertise, tools Coordinates, data

  13. How do we share “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Based on: Minimum information models e.g. MIAME, MIAPE, MIRIAM Biological ontologies e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for: Concept lookup and visualisation JERM

  14. How do we share • Share JERM templates developed by SysMO-DB, PALs and consortium • Spreadsheet templates • Database Schemas • Encourage uptake throughout SysMO • transcriptomics • metabolomics • proteomics etc….

  15. Tools to help manage data:Annotation standards by stealth Controlled vocabulary plug in BioPortal

  16. JERM Model SysMO JERM a ‘MIBBI’ for the SysMO-SEEK What do we need to help you find stuff? Title, person, filename, class What is experiment specific? What is experiment specific, but helps us map between them? Common biological elements chemicals, genes, proteins, organisms, strains

  17. Identifying Biological Objects What do you have in your data? Proteins/enzymes, genes/expression levels, metabolites Where/how do these objects interact? Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies

  18. Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate Lowering the adoption barrier

  19. Just Enough Sharing Access Permissions ...we don’t talk about security

  20. Just Enough sharing JERM SOP SysMOLab Wiki Fetch on Request COSMIC Alfresco MOSES Wiki ANOTHER Direct Upload A DATA STORE

  21. When do People Share SysMO Aims : sharing sooner • Suspicion and fear of scooping • Reputation

  22. Incentives for sharing • Safe haven for data • Credit and attribution • Help with exporting to public repositories (e.g. One-click export to ArrayExpress, PRIDE etc) • A repository for “supplementary materials” in publications • Linking publications and data • Access other resources through a SEEK gateway

  23. SEEK as a Gateway • JWS Online Plugin • online simulator, runs in SysMO-SEEK • upload models in SBML format • SBGN schemas, with annotations and external links

  24. Incentives for sharing • Credit and attribution • SEEK records who owns what. If data, models, or protocols are reused, scientists get recognition • Accountability • SEEK records who owns what. If you take credit for others work, they will see Data citation – formal credit for data published in SEEK

  25. Data Citation • Persistent identifiers and URLs for the data • Linking people to the data • Safe haven for the data • Guarantees of sustainability • Data MUST be uploaded and archived • If cited, it must be public

  26. SEEK as a Safe Haven • HITS can archive SysMO data for 10 years • All SysMO software is open source and available • Distinction between sustaining the service and the software

  27. Governance and Policy • What is required by SysMO members? • When should they share during their projects? • How long after the project can they keep data private to finish publications? • If their data is stored locally, what is the archive process? • Policy from DMG and funding agencies and NOT SysMO-DB

  28. Governance and Policy • Proposals under discussion: • All data registered in SEEK should be uploaded and archived at the end of a SysMO project • All data from finished projects should be shared • How long after the end? 1 day, 6 months, 1 year? • Scientists can invoke “creator’s privilege” on SysMO assets produced near the end of the project • Extra time to write-up and publish before release to the general public – respecting publication cycles

  29. SysMO So Far… • People ARE sharing • Over 300 assets in SEEK • SOPs: 102, Models: 17, DataFiles: 95 ,Investigations: 13, Studies: 26, Assays: 53 • PALs – a network of young SysBio researchers • Training and education in data and metadata management spreading through the consortium • Modellers and experimentalists communicating

  30. SysMO Methods Spreading • Virtual Liver • Mueller, via HITS • Lungsys • SBCancer • EraSysBio+ • Eukaryotic organisms • Interactions between host and pathogen • Human disease • Multi scale modelling

  31. Why it works for us • A solution that fits in with current practices • Start simple, show benefits, add more • Engage with the people actually doing the work • PhD students, Post-docs • Build to the PALs requirements • Respect publication cycles • Respect cultural differences • Scientists stay in control

  32. Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online EMBL-EBI, MCISB http://www.sysmo-db.org

More Related