1 / 43

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models. Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller , O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch.

Download Presentation

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch MS eScience Workshop, Pittsburgh, PA

  2. SysMO=SYStems biology of Micro Organisms 11 projects, 91 partners, 9 countries, started 2007 (4) (1) (9) (2) (22) (29) (2)

  3. Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB

  4. SysMO-DB Team EML Research gGmbH, Germany Sergejs Aleksejevs Wolfgang Müller Carole Goble Isabel Rojas Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK

  5. Connect projects, connect to outside Public Outside data and tools SysMO-DB, inter-project Project Project specific solutions Internally used tools & data Personal My Disk: Data Models Workflows

  6. Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. • Many do not have data, or follow the standards that exist or know who is doing what. • Much of the data cannot be compared • Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

  7. Principles… • Go for a series of small victories • Realistic • Don‘t reinvent • Migrate to standards • Sustainable and extensible • Provide instant gratification • Address doubt and anxiety • Build it

  8. Three types of people Experimentalists Modellers Exchange Bioinformaticians Exchange Exchange Exchange

  9. „Natural“ collaboration within SysMO Short, simplified, black and white: Collaboration during project design Varying methods of collaboration during project Binomes (One modeller, one experimentalist) Groups collaborating with groups (occasional/formalized exchange of information) Varying success Need for a watering hole/meeting point Application where experimentalists/bioinf/ modelers meet ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) Trying to make experimentalists, modellers, bioinformaticians peacefully share resources

  10. Some numbers& Some consequences • 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist • 11 projects, 91 partners • 20 programmer days/year/project • 2.5 programmer days/year/partner • “just in case“ approach impossible • Focus on real needs • “just in time“, “just enough“ • The right 20% • Help people help themselves • Communication! 80-20-rule: 80% of the featureswon‘t be used anyway Useful features

  11. Social Approach • Questionnaires • PALs (Project Area Liaison) • 21 Postdocs and PhD students • Bio/bioinf/modeller • Our design and technical collaboration team • Very intense face to face and virtual collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

  12. Communication via PALs Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team PALS Projects

  13. Outcome of first PALs meeting: Need to find the guy who does xyz: Yellow pages Need to storeStandard Operating Procedures Almost all our data is Excel

  14. What‘s there SysMO-SEEK screenshots

  15. Yellow pages ISA tabs Yellow pages tabs Bookmarks Tag clouds

  16. Standard Operation Procedures

  17. JWS connection for modellers

  18. View Study

  19. New Assay (ISA)

  20. Rights and sharing

  21. Rights and sharing: create group

  22. So much for the webapp Rights+Sharing Connection to modelers‘ tools Yellow pages SOPs

  23. Almost there: Improved excel support Matthew Horridge

  24. Towards Just-Enough Exchange Incremental steps from beta to beta

  25. Towards Just-Enough Exchange Largely a story about how to handle Excel sheets for user‘s benefits

  26. SysMO Just Enough Exchange SysMO-LAB BaCell-SysMO Spread sheets Wiki Spread sheets Alfresco SABIO-RK COSMIC Spread sheets MOSES Spread sheets Wiki Alfresco BASE SABIO-RK Public Resources

  27. Need for tradeoff • Huge number of systems • Huge number of standards (MIBBI, OBO…) • Some of them big standards Too much to cope with a few people, but: • Comparison needs standardisation • Search needs standardisation • Need to move incrementally to just-enough standard implementation

  28. Path = goalThe journey is part of the reward • Let people use what they use anyway • If changes necessary, be as unintrusive as possible • Be aware of legacy data • Nudge people towards best practises • Give instantly useful added value to as many users as possible: Simple search, simple exchange, simple tool use

  29. A roadmap • Provide convincing Web 2.0 functionality for use and as appetizer • Yellow pages • SOPs • Upload service: • Hand-triggered upload of link/file • Hand-added metadata • Harvesting+change detection service • Automatic download • Hand-added metadata • Support for Excel templates • Promote internal standards by use + tooling • Mappers + parsers • Classifiers • Use other data types where appropriate • SBML, Matlab, Mathematica…

  30. Stability hierarchy Increasing stability Template for a group of experiments Use mappers where needed Single group Parsers/ annotators Project-level template Single SysMO project Enter into that More stable JERM data model Template best practise Whole SysMO

  31. JERM Extraction Architecture Metad. Data Parser Parser Extractor Extractor Mapper Mapper Extractor Extractor Mapper Mapper Metad. Data Classifier/Dispatcher Classifier/Dispatcher Template recognizer Template recognizer Template recognizer Template recognizer Metad. Data Harvester Harvester Data handler Data handler Data handler Data handler Data Data Project repositories

  32. Oops Some projects not prolongedNeed all project data in the system fast, so…

  33. JERM Extraction Architecture Metad. Data Data Parser Parser Extractor Extractor Mapper Mapper Extractor Extractor Mapper Mapper Metad. Data Data Classifier/Dispatcher Classifier/Dispatcher Template recognizer Template recognizer Template recognizer Template recognizer Metad. Data Data Harvester Harvester Data handler Data handler Data handler Data handler Data Data Project repositories

  34. Lessons we‘re learning Some interesting bits along the way

  35. Subsetting: Don‘t overwhelm Standards need to be comprehensive Goal: „Minimum information“… (MIBBI) Tends to be superset of what is needed for a project Example for non-applicable attributes Tissue of a single cell Gender Useful to use adapted subset-templates Experimental design selection list

  36. From biofolksonomy to ontology Observation: Fast growing set of standards Standards are moving target Incremental approach Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to standard ontologies Provide migration tools Tags + suggestions Home-brewed taxonomy

  37. A word on software • Template tooling • Excel • JAVA • SysMO-SEEK (open source under Apache license) • Ruby on Rails • Convention over configuration • Libraries & plugins • Rails specific (e.g. acts_as_authenticated) • SOLR & Lucene introduce JAVA/Ruby • Database:MySQL also tested with SQLite(exclude db depedencies)

  38. Summary • SysMO-DB as a virtual meeting point for different flavours of systems biologists • SysMO-DB‘s mantra: Just enough just in time • Flexible JERM extracture architecture • Just enough metadata (incremental) • Lot done  still a lot todo 

  39. Challenges ahead… • Social • PALs work great and motivated • Now need moremoremore datadatadata • Technical • Publishing into public repositories • Search + exploration: The test for data quality • Hierarchical Faceted Search • Distributed search via Taverna workflows • More workflows via SysMO-SEEK • Improve modelling support

  40. Bonus track: what if… …the average data quality is below par? • „Nagging functionality“ • Remind people of potentially faulty metadata • Give suggestions what to improve and how • Give possibility to create automatic mappings

  41. Thanks EML People: Isabel Olga UMAN People: Carole Katy Finn Stuart Sergejs Jacky at Stellenbosch BBSRC BMBF KTF …and Microsoft for sponsoring this workshop

  42. www.sysmo-db.org End + questons

  43. END

More Related