430 likes | 527 Views
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models. Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller , O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch.
E N D
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch MS eScience Workshop, Pittsburgh, PA
SysMO=SYStems biology of Micro Organisms 11 projects, 91 partners, 9 countries, started 2007 (4) (1) (9) (2) (22) (29) (2)
Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB
SysMO-DB Team EML Research gGmbH, Germany Sergejs Aleksejevs Wolfgang Müller Carole Goble Isabel Rojas Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK
Connect projects, connect to outside Public Outside data and tools SysMO-DB, inter-project Project Project specific solutions Internally used tools & data Personal My Disk: Data Models Workflows
Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. • Many do not have data, or follow the standards that exist or know who is doing what. • Much of the data cannot be compared • Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping
Principles… • Go for a series of small victories • Realistic • Don‘t reinvent • Migrate to standards • Sustainable and extensible • Provide instant gratification • Address doubt and anxiety • Build it
Three types of people Experimentalists Modellers Exchange Bioinformaticians Exchange Exchange Exchange
„Natural“ collaboration within SysMO Short, simplified, black and white: Collaboration during project design Varying methods of collaboration during project Binomes (One modeller, one experimentalist) Groups collaborating with groups (occasional/formalized exchange of information) Varying success Need for a watering hole/meeting point Application where experimentalists/bioinf/ modelers meet ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) Trying to make experimentalists, modellers, bioinformaticians peacefully share resources
Some numbers& Some consequences • 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist • 11 projects, 91 partners • 20 programmer days/year/project • 2.5 programmer days/year/partner • “just in case“ approach impossible • Focus on real needs • “just in time“, “just enough“ • The right 20% • Help people help themselves • Communication! 80-20-rule: 80% of the featureswon‘t be used anyway Useful features
Social Approach • Questionnaires • PALs (Project Area Liaison) • 21 Postdocs and PhD students • Bio/bioinf/modeller • Our design and technical collaboration team • Very intense face to face and virtual collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
Communication via PALs Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team PALS Projects
Outcome of first PALs meeting: Need to find the guy who does xyz: Yellow pages Need to storeStandard Operating Procedures Almost all our data is Excel
What‘s there SysMO-SEEK screenshots
Yellow pages ISA tabs Yellow pages tabs Bookmarks Tag clouds
So much for the webapp Rights+Sharing Connection to modelers‘ tools Yellow pages SOPs
Almost there: Improved excel support Matthew Horridge
Towards Just-Enough Exchange Incremental steps from beta to beta
Towards Just-Enough Exchange Largely a story about how to handle Excel sheets for user‘s benefits
SysMO Just Enough Exchange SysMO-LAB BaCell-SysMO Spread sheets Wiki Spread sheets Alfresco SABIO-RK COSMIC Spread sheets MOSES Spread sheets Wiki Alfresco BASE SABIO-RK Public Resources
Need for tradeoff • Huge number of systems • Huge number of standards (MIBBI, OBO…) • Some of them big standards Too much to cope with a few people, but: • Comparison needs standardisation • Search needs standardisation • Need to move incrementally to just-enough standard implementation
Path = goalThe journey is part of the reward • Let people use what they use anyway • If changes necessary, be as unintrusive as possible • Be aware of legacy data • Nudge people towards best practises • Give instantly useful added value to as many users as possible: Simple search, simple exchange, simple tool use
A roadmap • Provide convincing Web 2.0 functionality for use and as appetizer • Yellow pages • SOPs • Upload service: • Hand-triggered upload of link/file • Hand-added metadata • Harvesting+change detection service • Automatic download • Hand-added metadata • Support for Excel templates • Promote internal standards by use + tooling • Mappers + parsers • Classifiers • Use other data types where appropriate • SBML, Matlab, Mathematica…
Stability hierarchy Increasing stability Template for a group of experiments Use mappers where needed Single group Parsers/ annotators Project-level template Single SysMO project Enter into that More stable JERM data model Template best practise Whole SysMO
JERM Extraction Architecture Metad. Data Parser Parser Extractor Extractor Mapper Mapper Extractor Extractor Mapper Mapper Metad. Data Classifier/Dispatcher Classifier/Dispatcher Template recognizer Template recognizer Template recognizer Template recognizer Metad. Data Harvester Harvester Data handler Data handler Data handler Data handler Data Data Project repositories
Oops Some projects not prolongedNeed all project data in the system fast, so…
JERM Extraction Architecture Metad. Data Data Parser Parser Extractor Extractor Mapper Mapper Extractor Extractor Mapper Mapper Metad. Data Data Classifier/Dispatcher Classifier/Dispatcher Template recognizer Template recognizer Template recognizer Template recognizer Metad. Data Data Harvester Harvester Data handler Data handler Data handler Data handler Data Data Project repositories
Lessons we‘re learning Some interesting bits along the way
Subsetting: Don‘t overwhelm Standards need to be comprehensive Goal: „Minimum information“… (MIBBI) Tends to be superset of what is needed for a project Example for non-applicable attributes Tissue of a single cell Gender Useful to use adapted subset-templates Experimental design selection list
From biofolksonomy to ontology Observation: Fast growing set of standards Standards are moving target Incremental approach Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to standard ontologies Provide migration tools Tags + suggestions Home-brewed taxonomy
A word on software • Template tooling • Excel • JAVA • SysMO-SEEK (open source under Apache license) • Ruby on Rails • Convention over configuration • Libraries & plugins • Rails specific (e.g. acts_as_authenticated) • SOLR & Lucene introduce JAVA/Ruby • Database:MySQL also tested with SQLite(exclude db depedencies)
Summary • SysMO-DB as a virtual meeting point for different flavours of systems biologists • SysMO-DB‘s mantra: Just enough just in time • Flexible JERM extracture architecture • Just enough metadata (incremental) • Lot done still a lot todo
Challenges ahead… • Social • PALs work great and motivated • Now need moremoremore datadatadata • Technical • Publishing into public repositories • Search + exploration: The test for data quality • Hierarchical Faceted Search • Distributed search via Taverna workflows • More workflows via SysMO-SEEK • Improve modelling support
Bonus track: what if… …the average data quality is below par? • „Nagging functionality“ • Remind people of potentially faulty metadata • Give suggestions what to improve and how • Give possibility to create automatic mappings
Thanks EML People: Isabel Olga UMAN People: Carole Katy Finn Stuart Sergejs Jacky at Stellenbosch BBSRC BMBF KTF …and Microsoft for sponsoring this workshop
www.sysmo-db.org End + questons