120 likes | 129 Views
Building a Data Repository to Meet an Institution’s Needs. University of Bath – JISC – Research360. blogs.bath.ac.uk/research360. Catherine Pink Data Scientist, UKOLN Open Repositories 2012. UKOLN is supported by:. Existing Research Infrastructure.
E N D
Building a Data Repository to Meet an Institution’s Needs University of Bath – JISC – Research360 blogs.bath.ac.uk/research360 Catherine Pink Data Scientist, UKOLN Open Repositories 2012 UKOLN is supported by:
Existing Research Infrastructure • Applied science and engineering research focus • ‘Small Science’ in collaboration with industry • Publications Repository – ‘Opus’ • ePrints, >28,000 journal papers & theses • Research Information Management (CRIS) • Pure • Links finance, publications, HR and postgraduate student databases • File storage for current research data • Unstructured, short term storage • Data not accessible by third parties
Why build an Institutional Data Repository? • Demand for access to publically funded research • “Science as an open enterprise” – Royal Society 2012 • “Innovation and Research Strategy for Growth” – UK government 2011 • UK funding councils data policy • RCUK Common Principles on Data Policy • EPSRC expectations for data (compliance by May 2015) • Ability to respond to UK HEI assessment exercises • Research Excellence Framework (REF) 2014 and 2020 • Linking funding with research outputs and impact
Enable researchers to comply with data policies: Funder, publisher & institutional • Requires archive of data • Publication in data journals • Deposit in a disciplinary data repository • Publish on researcher’s own websites • ‘Bridge the gap’ where no other data repository is suitable • At Bath: Evaluate a range of data repository options • Modify existing eprints or Pure? Use an external solution e.g. Dataflow? Build a bespoke new system? • Plan for long term needs or short term solutions?
Manage the University’s research data assets • Link research inputs to research outputs • Link publications to supporting data At Bath: Embed data repository in existing infrastructure • Integrate with CRIS & publications repository • As the CRIS develops, will the data repository be superseded? • Maintain a register of research data held elsewhere • Maintain a register of non-digital research data At Bath: Enable deposit of metadata stubs • Can we capture metadata from external data repositories? (Metadata crosswalk)
Enable data to be discoverable, intelligible & reusable At Bath: Developing web interface • Inward facing for data deposit • Outward facing for data searching At Bath: Developing a core set of mandatorymetadata • What schema to use/adapt? • Harvest metadata from the CRIS, enabling researchers to focus on descriptive metadata (title, summary, keywords) • How to capture sufficient detail to enable re-use?Metadata? Accompanying file? Data publication?
Ensure data can have Impact • Data must be persistent and citable • Enable researchers to gain recognition for data publication and reuse At Bath: The repository will generate persistent URLs for archived data At Bath: A recommended data citation will be produced for each dataset At Bath: Link with DataCite to produce Digital Object Identifiers (DOIs) • What format should the institutional component of the DOI take? • If/when to mint DOIs for embargoed data? • How to ensure that multiple DOIs are not issued for data deposited elsewhere?
Retain data for mandated periods • Should all data be retained forever? • UK funders vary in their requirements for duration of data archive • At Bath: Use the repository to facilitate compliance • At Bath: Developing data disposal guidelines • Need to capture publication date and retention periods in metadata • Need to log the date of third party access to data • When retention periods expire, is deletion automatic or flagged for review
Protect the interests of subjects of research • Living individuals covered under the UK Data Protection Act (1998) • Names and personal details must be removed before data can be published • At Bath: Data deposit will include a check box to confirm published data has been anonymized • At Bath: Restrict access to underlying data for published metadata • How to capture/publish conditions for access to data? • At Bath: Investigating how to archive consent forms with the data they accompany • Can these be digitised? • How to secure access to them?
Protect the interests of our research partners • Ability to publish collaborative research data determined by project specific contracts • Some data my require embargo periods to enable commercialisation of research • It may be necessary to prevent publication of metadata if the potential for commercialisation would be damage by its release • At Bath: Investigating the specific requirements of our industrial partners • How to select whether data and/or metadata can be published during data deposit • Can we manage access of restricted data so that only key researchers and their external collaborators can (re)use it? • Can we automate production of licences for re-use if data ownership is set out in specific contracts?
What can’t the institutional data repository do? • No ability to query individual datum • No ability to align multiple datasets • No peer review of data quality • Should we issue disclaimers with published data? • Difficulty handling ‘big data’
Find out more: www.ukoln.ac.uk/projects/research360 blogs.bath.ac.uk/research360