1 / 23

Digital Curation or Digital Data? The impact of Services and Federation

Digital Curation or Digital Data? The impact of Services and Federation. Phil Lord Newcastle University. Take Home Messages. Curation is important for the CARMEN project and neuroinformatics To enable repeatability and rerunability, curation of both services and data are of equal importance

adrina
Download Presentation

Digital Curation or Digital Data? The impact of Services and Federation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Curation or Digital Data? The impact of Services andFederation Phil Lord Newcastle University

  2. Take Home Messages • Curation is important for the CARMEN project and neuroinformatics • To enable repeatability and rerunability, curation of both services and data are of equal importance • To enable federation and autonomy, data release, license and other policies need to be operated over computationally.

  3. Research Challenge Worldwide >100,000 neuroscientists(~ 5,000 in UK) are generating vast amounts of data Principal experimental data formats:  molecular (genomic/proteomic)  neurophysiological (time-series electrical measures of activity)  anatomical (spatial)  behavioural Neuroinformatics concerns how these data are handled and integrated, including the application of computational modelling Understanding the brain may be the greatest informatics challenge of the 21st century

  4. Need for Cooperation Understanding the brain may be the greatest informatics challenge of the 21st century OECD Neuroinformatics Working Group identified the need to work cooperativelyin order to achieve major advances Cooperation will permit:  development of common processes  best value from data, including long term curation  ‘mega-analysis’ of large data sets  integration of data sets across different scales and different approaches  interdisciplinary research

  5. CARMEN – Focus on Neural Activity • raw voltage signal data collected by patch-clamp and single & multi- electrode array recording • novel optical recording, particularly the activity dynamics of large networks Understanding the brain may be the greatest informatics challenge of the 21st century  resolving the ‘neural code’ from the timing of action potential activity neurone 1 neurone 2 neurone 3

  6. CARMEN is a new e-Science Pilot Project, (UK research council funded) in Neuroinformatics. • To create a grid-enabled, real time ‘virtual laboratory’ environment for neurophysiological data • To develop an extensible ‘toolkit’ for data extraction, analysis and modelling • To provide a repository for archiving, sharing, integration and discovery of data • To achieve wide community and commercial engagement in developing and using CARMEN • CARMEN is a 4 year project: if it is to last longer, it must become financially self-sufficient. • See http://www.carmen.org.uk

  7. CARMEN Active Information Repository Node

  8. Service Repository 2 : service fetch & deploy SR node 1 s 2 , s 5 req node 2 1 3 C WSP res … Web Server node n s 2 Compute Machines Dynamic Service Deployment - Dynasoar R Client CAIRN

  9. Distribution and Federation Initially, we plan to have two CAIRNS

  10. Distribution and Federation

  11. What about digital curation? Courtesy of Wikipedia

  12. CARMEN’s perspective • We wish to store data, store it’s provenance, store it’s usage. • We need release policies, we need retention policies, we need to understand ownership

  13. Replicability Rerunability Old Data New Data What do we get from this? • Replicability: one scientist should be able to repeat another’s experiment, under equivalent conditions, at a different time. • Rerunability: a scientist should be able to apply an equivalent technique under new circumstances. • The addition of services into this mix complicate the issue.

  14. Has the state of the world advanced since previously? Has the world changed, in a comparable way? Has the service changed in a comparable way? Is the specification of what happened actually right? Eager Neuroscientist Rerunability Neurosciensist comparing to existing work Tool Builder New Data New Services Replicability Error-Prone Neuroscientist Old Services Old Data

  15. So, what is problem? • I would like to rerun this experiment and release the results. Can I? • Is the new data available? • Is the new data public? • Does the license allow derived results? • Who owns the derived results? • data license • software license

  16. So, whats the problem? • Can I compare how new data would have changed the results? • Is that data available? (New and Old) • Is that data public? (New and Old) etc… • Is it embargoed – will it become public later? • Do the licenses allow derived results? • Who owns the derived results? • The licenses may conflict

  17. CARMEN Active Information Repository Node

  18. Whose release policy?

  19. Policy Issues • One of the main purposes of the CAIRN is to hide the distribution. • What if the CAIRNs have different release policies? What if they have different licenses? • We cannot inflict these differences on the user. • Therefore, we must be able to compute over policies • We must be able to represent justifications back to the users

  20. An Example: Licensing • Computationally amenable licenses are available • Take, for example, Creative Commons

  21. Take Home Messages • Curation is important for the CARMEN project and neuroinformatics • To enable repeatability and rerunability, curation of services and data are of equal importance • To enable federation and autonomy, data release, license and other policies need to be operated over computationally.

  22. TheUniversity OfSheffield Acknowledgements Professor Colin Ingram, Professor Jim Austin, Professor Leslie Smith,Professor Paul WatsonDr. Stuart Baker,Professor Roman Borisyuk, Dr. Stephen Eglen, Professor Jianfeng Feng, Dr. Kevin Gurney, Dr. Tom JacksonDr. Marcus Kaiser, Dr. Phillip Lord, Dr. Paul Overton, Dr. Stefano Panzeri, Dr. Rodrigio Quian Quiroga, Dr. Simon Schultz, Dr. Evelyne Sernagor, Dr. V. Anne Smith, Dr. Tom Smulders Professor Miles Whittington, Christoph Echtermeyer, Martyn Fletcher, Frank Gibson, Mark Jessop Dr. Bojian Liang, Juan Martinez-Gomez, Dr. Chris Mountford, Agah Ogungboye, Georgios Pitsilis, Dr. Daniel Swan University ofSt Andrews

More Related