1 / 16

Simon J. Coles EPSRC National Crystallography Service School of Chemistry

Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets. Simon J. Coles EPSRC National Crystallography Service School of Chemistry University of Southampton. Data & the Publication Problem. 2,000,000. 25,000,000. 450,000.

jonestony
Download Presentation

Simon J. Coles EPSRC National Crystallography Service School of Chemistry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets Simon J. Coles EPSRC National Crystallography Service School of Chemistry University of Southampton Usability WS, NeSC Jan 06

  2. Data & the Publication Problem 2,000,000 25,000,000 450,000 Usability WS, NeSC Jan 06

  3. A Different Approach to Data Publication? Intellect & Interpretation Underlying data Usability WS, NeSC Jan 06

  4. Capture of all digital data and information generated during the course of an experiment Data validation Adding value Archival system for data with attached bibliographic and chemical metadata Automatic report generation Schema and protocols for publication and dissemination of a dataset Requirements Usability WS, NeSC Jan 06

  5. Open Access Crystal Structure Archive ecrystals.chem.soton.ac.uk Usability WS, NeSC Jan 06

  6. Access to the Underlying Data Usability WS, NeSC Jan 06

  7. Publicising Content Usability WS, NeSC Jan 06

  8. Harvesting, Linking and Aggregating Usability WS, NeSC Jan 06

  9. Different laboratories, practices & instruments present a heterogeneous body of data Publish according to IUCr ratified schema To support publication according to this schema a toolbox add-on to the archive has been developed Toolbox requires 2 mandatory files only & is capable of performing file format conversions and generate value added files Usability: Quality & Uniformity of data Usability WS, NeSC Jan 06

  10. Minimal number of manual metadata entries – many can be hardwired into the system Deposition guidelines initially prepared by students to provide impartial feedback Full documentation and in-line help/examples Restrained lists, e.g. Keywords Data deposited automatically by toolbox Automated generation of metadata for report and OAI interface Usability: Ease of Deposition & Metadata Quality Usability WS, NeSC Jan 06

  11. Peer review removed from self deposit publication Simple checks for consistency made by the toolbox Checks for crystallographic integrity made through a web service (IUCr, ‘CHECKCIF’) Introduction of data ‘editor’ for the archive; a deposition must be signed-off by a recognised professional before going live Quality indicators automatically taken from dataset and presented in HTML jump-off page Usability: Data Validation Usability WS, NeSC Jan 06

  12. URL of deposited dataset provides an identifier Persistent only if the Institutional support model is accepted / adopted Signed-up to an agency to register metadata relating to datasets with a DOI Pay registry to ensure that DOI always resolves to associated dataset (10cents to register 1cent per annum to maintain) InChI chemical identifier - a unique text descriptor for a molecule Usability: Identifiers Usability WS, NeSC Jan 06

  13. OAI metadata schema; ratified by IUCr & chemical community OAI covers bibliographic terms; must introduce chemical terms Both library and subject specific aggregators satisfied Chemical linking; InChI, chemical classifications and restricted keywords list Usability: Dissemination & Aggregation Usability WS, NeSC Jan 06

  14. Feedback during development from technical publishing arm of IUCr Designed for automatic incorporation into CSD (global database operated by CCDC) Accepted by Executive Committee of IUCr Reuse of data achieved in collaboration with Leverhulme Centre for Molecular Informatics Usability: Endorsement Usability WS, NeSC Jan 06

  15. Southampton archive about to publish routinely via the archive Five crystallography laboratories in UK agreed to adopt philosophy, install and populate archives CCDC will harvest required data from all archives IUCr will harvest and curate all data Develop aggregator services in collaboration with IUCr Usability: Community Uptake Usability WS, NeSC Jan 06

  16. Full acceptance by chemical community Validation worries Curation worries The requirement for as many peer reviewed publications as possible (despite quality) Full acceptance by wider chemistry publishing community Loss of control over underlying data Faith in Open Archives replacing experimental descriptions in articles Development of fully functional aggregator services Usability: The Next Challenges Usability WS, NeSC Jan 06

More Related