Glimpses of future research practice: a musical study David De Roure
Overview • Generation 1 – Early adopters • Generation 2 – Embedding • Generation 3 – Radical sharing • Music case study
e-Science • e-Science was defined by John Taylor (Director General of the UK Research Councils) as global collaboration in key areas of science and the next generation of infrastructure that will enable it • e-Science was the name of the destination • It became the name of the journey • When we arrive, the destination is just called science
e-Research “e-research extendse-Science andcyberinfrstructureto other disciplines, including the humanities andsocial sciences.” http://mitpress.mit.edu/catalog/item/default.asp?tid=12185&ttype=2
Generation 1 2000 – 2005
...the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites Tony Hey and Anne Trefethen Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
E. Science laboris • Workflows are the new rock and roll • Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources • The era of Service Oriented Applications • Repetitive and mundane boring stuff made easier Carole Goble
Kepler Triana BPEL Trident Meandre Taverna Galaxy
co-design co-evolution co-shaping co- co-creation co-construction co-constitution co-realisation
humilitythe quality of being modest, reverential, even politely submissive, and never being arrogant, contemptuous, rude
My Chemistry Experiment Box of Chemists
empowerto equip or supply with an ability; enable servicethe performance of duties or the duties performed as or by a waiter or servant
1st Generation Summary Current practices of early adoptors of tools. Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data. Science is accelerated and practice beginning to shift to emphasisein silicowork.
Generation 2 2005 – 2010
Reuse, Recycling, Repurposing • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating Whipworm in mouse. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this.
“A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
Not Facebook for scientists! Facebook for scientists! mySpace for scientists!
The experiment that is Web 2 Social Scientists Social Network Developers Open Repositories Researchers
A probe into researcher behaviour • Open source (BSD) Ruby on Rails app • REST and SPARQL interfaces, Linked Data compliant • Inspiration for: BioCatalogue, MethodBox and SysmoDB • “Facebook for Scientists” ...but different to Facebook! • A repository of research methods • A community social network of people and things • A Social Virtual Research Environment myExperiment currently has 3849 members, 234 groups, 1315 workflows, 349 files and 133 packs
Paul’s Research Object Paul’s Pack Workflow 16 QTL Results produces Included in Published in Included in Feeds into Logs produces Included in Included in Metadata Slides Paper produces Published in Common pathways Workflow 13 Results
The Six Rs of Research Object Behaviours Research Objects enable data-intensive research to be: • Replayable – go back and see what happened • Repeatable – run the experiment again • Reproducible – independent expt to reproduce • Reusable – use as part of new experiments • Repurposeable – reuse the pieces in new expt • Reliable – robust under automation • Referenceable– citable and traceable http://blog.openwetware.org/deroure/?p=56
2nd Generation Summary Projects delivering now. Some institutional embedding. Key characteristic is re-use - of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects. New scientific practices are established and opportunities arise for completely new scientific investigations. Some expert curation.
Generation 3 2010 – 2015
4th Paradigm The Fourth Paradigm: Data-Intensive Scientific Discovery Presenting the firstbroad look at the rapidly emerging field of data-intensive science http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Taverna A Bioinformatics Experiment Scott Marshall Marco Roos “…to discover proteins that interact with transmembrane proteins, particularly those that can be related to neuro-degenerative diseases in which amyloids play a significant role” • Taverna provenance exposed as RDF • myExperiment RDF document for a protein discovery workflow • Mocked-up BioCatalogue document using myExperiment RDF data as example • Provisional RDF documents obtained from the ConceptWiki (conceptwiki.org) development server • An RDF document for an example protein, obtained from the RDF interface of the UniProt web site
http://www.methodbox.org/ • MethodBox
3rd Generation Summary The solutions we'll be delivering in 5 years Characterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Routine use. Key characteristic is radical sharing . Research is significantly data driven - plundering the backlog of data, results and methods. Increasing automation and decision-support for the researcher - the VRE becomes assistive. Curation is autonomic and social.
Find a service & relax • Intellectual ramps Easy and low risk to start Progress to advanced skills For researchers No obligation Go as far as you want Malcolm Atkinson
Datascopes telescopes for the naked mind Malcolm Atkinson NRAO/AUI/NSF From Signal to Understanding
Music and Linked Data 2010 – 2011and beyond
It’s about enabling the join Ben Fields, 6th October 2010
SALAMI: Structural Analysis of Large Amounts of Music Information David De Roure J. Stephen Downie Ichiro Fujinaga
Digital Music Collections 23,000 hours ofrecorded music Music InformationRetrieval Community Community Software Crowdsourced ground truth Supercomputer 250,000 hours NCSASupercomputer time Linked Data Repositories