1 / 28

Looking to the longer term: some perspectives on data curation and preservation

Looking to the longer term: some perspectives on data curation and preservation. Dr Liz Lyon , DCC Associate Director Outreach Director, UKOLN, University of Bath, UK. Funded by: . This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0. About UKOLN .

carrieann
Download Presentation

Looking to the longer term: some perspectives on data curation and preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Looking to the longer term: some perspectives on data curation and preservation Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK Funded by: This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

  2. About UKOLN • “a centre of expertise in digital information management” • Funding: Joint Information Systems Committee (JISC) + Museums, Libraries & Archives Council (MLA) • Portfolio of R&D projects Delos, DRIVER, Grand Challenge • 29+ staff based at the University of Bath • Inform the library, information, education and cultural heritage communities • Policy, advocacy at national level, build innovative Web-based systems & services, R&D, e-journal Ariadne, workshops and conferences. • http://www.ukoln.ac.uk/ Acknowledgement: Alex Ball, Grand Challenge Project

  3. UK Digital Curation Centre • Digital Curation Centre • Funded by JISC & EPSRC • Development activities • Research agenda • Delivering services • Outreach Programme • http://www.dcc.ac.uk/

  4. Overview • Data curation and digital preservation issues • Draw on research and scholarship perspectives • Data / information flows and the “business process” • UK Digital Curation Centre activities “maintaining and adding value to a trusted body of digital information for current and future use”

  5. Reference datasets as infrastructure? Data-centric 2020 vision

  6. (Very simple) Product Research Cycle & Data Curation (New) knowledge extraction: data mining, modelling, analysis, synthesis Formulate ideas / hypothesis, test, experiment, observe, design: data creation, collection & capture Data processing Data processing Data processing Data management storage & validation: description, deposit, self-archiving, preservation, certification e-Infrastructure Open ?? access Collaboration Adding value: Data linking, annotation, visualisation, simulation Data processing Data processing Scholarly communications & Business transactions: data disclosure, publication, citation, discovery, re-use This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

  7. RepoMMan: Repository Metadata and Management (Hull) using WS-BPEL • Are your engineering workflows identified and described? Workflow e-Scientist desktop? Slide: Carole Goble

  8. Research outputs in institutional repositories: engineering

  9. repository repository repository repository repository heterogeneous - metadata formats, content formats, identifiers, packaging standards fusion layer ‘repository federator’ homogeneous - metadata formats, content formats, identifiers, packaging standards portal portal portal portal portal “JISC Vision”: a global landscape of federated repositories • e-Framework and Information Environment context • Define common + domain-specific + repository “services” • Interoperability based on open standards, software tools • Multi-disciplinary, cross-sectoral • National, institutional • Different platforms • Many format types: data, eprints, images, geospatial From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/presentations/jiie-jcs-2005/

  10. Pilot Engineering Repository Xsearch PerX http://www.engineering.ac.uk/

  11. Interoperability??? STEP ISO10303

  12. Repositories and OAIS Reference Model“an archive consisting of an organisation of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community..an identified group of potential consumers who should be able to understand a particular set of information”

  13. Assuring permanence: digital preservation • Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005 Defined criteria: • Organisation • Functions, processes & procedures • Designated community & usability • Technologies & technical infrastructure • Revised Checklist based on feedback and pilot audits (KB, BADC) • Self-certification: DINI-Zertifikat: requirements & recommendations: • Server policy / Guidelines • Author support • Legal issues • Authenticity and integrity • Cataloguing • Access statistics • Long-term sustainability • Has your repository / PLM been audited?

  14. Interdisciplinary discovery • Validation, publication & discovery of data models & schema • Harmonisation and normalisation of metadata and semantics • Packaging standards: METS, MPEG-21 DIDL • Formal high-level and domain ontologies • ePrints DC Application Profile http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile • eBank Application Profile crystallography data http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ • What data models and metadata schema are in place?

  15. Persistent identifiers for data citation • How will they be used? We need use cases: depositor, author, service provider, researcher, publisher? • Schemes: DOI, Handle, ARK, PURL • Global identification: express as http URIs • Data citation (human and machine-actionable) • Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de • Is there a data citation policy? • What persistent identifiers have been assigned to your data?

  16. Discovering data: eBank Project • Domain identifier: International Chemical Identifier (INChI) code • Google molecule using INChI • Slide from Simon Coles Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k Domain identifiers for engineering?

  17. Format migration challenges? CAD Program Compatibility Chart http://www.okino.com/conv/filefrmt_cad.htm

  18. Registry development

  19. Development: Representation Information Registry Repository • “DCC Approach to Digital Curation” based on OAIS • Representation Information Registry Repository • Prototype demonstrator: based on 2 key concepts to facilitate sharing of the curation effort • Curation Persistent Identifier (CPID) • Descriptive “label” (structural, semantic, other metadata) • Development of (M2M) tools and interfaces for creating, using and re-using representation information • http://dev.dcc.ac.uk Wiki and email list • EU CASPAR Integrated Project • Task Force on the Permanent Access to the Records of Science http://www.casparpreserves.info/pages/1/index.htm http://tfpa.kb.nl/

  20. Allows applications to talk to many different registry implementations e.g. GDFR, PRONOM, UDDI Registry API • GUI Access and via Web browser http://registry.dcc.ac.uk

  21. Research at the University of Edinburgh • Scientific databases: Annotation scoping report • New annotation model + prototype MONDRIAN • Intuitive visual interface iMONDRIAN • Annotate sets of values • Support for querying annotations Adding value through annotation

  22. NaCTeM http://www.nactem.ac.uk/ Emerging tools: TerMine, GENIA, Cafetiere • Knowledge extraction: • Mining (data, text, structures) • Modelling (economic, climate, mathematical, biological…) • Analysis (statistical, lexical, gene….) Nature 23 March 2006 OTMI: Open Text Mining Interface

  23. Supporting the community: Services • HELPDESK@dcc.ac.uk • legal - technical guidance • Curation Manual 45 chapters planned • Metadata (umbrella) • Open Source • Archival metadata • Preservation metadata • Selection & appraisal • Curating emails • Briefing Papers • Curating emails • Digital repositories • Geospatial data • Data protection • eScience data • Case studies

  24. DCC Case Study published: Wide Field Astronomy Unit

  25. Supporting the community: Outreach & Services • Workshops: • Geospatial data, NeSC, 27 October • OAIS 5 year Review, October • Audit & Certification Forum, October • Records Management, L’pool 30 Nov • Curation & Preservation Training, Dec • 2007 Preservation of journals tbc • 2007 Legal environment tbc • 2007 Preparing for audit tbc • Information Days British Library L’pool UCL • 2nd International DCC Conference 21-22 November, Glasgow • Keynotes: Hans F. Hoffmann, CERN, Clifford Lynch, CNI

  26. DCC Phase 2: 2007-2010 • Working more closely with data centres, e-Science Programmes and Research Councils • SCARP Project: disciplinary approach • JISC Digital Repository Programme collaboration • RepInfo Registry service migration • Define self-assessment procedures and tools • Collaborate with CASPAR, DPE and PLANETS (EU-funded Digital Preservation Projects) • Workshop Programme, International Conference 2007

  27. Thank you.Questions? e.lyon@ukoln.ac.uk Join the DCC Associates Network at www.dcc.ac.uk

More Related