1 / 33

Beyond text: New roles for libraries DataCite –

Beyond text: New roles for libraries DataCite – Persistent links to scientific data using the DOI system. Jan Brase, DataCite. Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch using models, generalizations

bbuckley
Download Presentation

Beyond text: New roles for libraries DataCite –

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond text: New roles for libraries DataCite – Persistent links to scientific data using the DOI system Jan Brase, DataCite

  2. Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch using models, generalizations Last few decades: a computational branch simulating complex phenomena Today:data exploration (eScience) unify theory, experiment, and simulation Jim Gray, eScience Group, Microsoft Research Science Paradigms

  3. Consequences for Libraries • Scientific Information is more than a journal article or a book • Libraries should open their cataolgues to any kind of information • The catalogue of the future is NOT ONLY a window to the library‘s holding, but • A portal in a net of trusted providers of scientific content

  4. We do not have it We know where you can find And here is the link to it! BUT

  5. Scientific Films 3D Objects Software Simulation Research Data Grey Literature Including non-classical publications 5

  6. Why is this a role for libraries? • Libraries have a history in bringing scientific information to the public • Libraries have a tendency to be persistent • A project will be forgotten in 40 years, the library will very likely still exist then • Library are very trustworthy organisations

  7. DataCite

  8. What if any kind of scientific content would be citable? • High visability of the content • Easy re-use and verification. • Scientific reputation for the collection and documentation of content (Citation Index) • Encouraging the Brussels declaration on STM publishing • Avoiding duplications • Motivation for new research

  9. DOI names for citations URLs are not persistent • (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).  Digital Object Identifiers (DOI names) offer a solution • Mostly widely used identifier for scientific articles • Researchers, authors, publishers know how to use them • Put datasets on the same playing field as articles  Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi:10.1594/PANGAEA.587840

  10. DataCite • Global consortium carried by local institutions • focused on improving the scholarly infrastructure around datasets and other non-textual information • focused on working with data centres and organisations that hold content • Providing standards, workflows and best-practice • Initially, but not exclusivly based on the DOI system • Founded December 1st 2009 in London

  11. Member Institution Member Institution Data Centre Data Centre Data Centre Data Centre Data Centre Data Centre DataCite structure International DOI Foundation Member Managing Agent(TIB) DataCite Carries AssociateStakeholder Works with …

  12. What if any kind of scientific content would be citable? • High visability of the content • Easy re-use and verification. • Scientific reputation for the collection and documentation of content (Citation Index) • Encouraging the Brussels declaration on STM publishing • Avoiding duplications • Motivation for new research

  13. DOI names for citations • URLs are not persistent • (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).  Digital Object Identifiers (DOI names) offer a solution • Mostly widely used identifier for scientific articles • Researchers, authors, publishers know how to use them • Put datasets on the same playing field as articles  • Dataset • Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. • doi:10.1594/PANGAEA.587840

  14. How to achieve this? • Science is global • it needs global standards • Global workflows • Cooperation of global players • Science is carried out locally • By local scientist • Beeing part of local infrastrucures • Having local funders

  15. DataCite • Global consortium carried by local institutions • focused on improving the scholarly infrastructure around datasets and other non-textual information • focused on working with data centres and organisations that hold content • Providing standards, workflows and best-practice • Initially, but not exclusivly based on the DOI system • Founded December 1st 2009 in London

  16. DataCite member • Technische Informationsbibliothek (TIB) • Canada Institute for Scientific and Technical Information (CISTI), • California Digital Library, USA • Purdue University, USA • Office of Scientific and Technical • Information (OSTI), USA • Library of theTU Delft, • The Netherlands • Technical Information • Center of Denmark • The British Library • ZB Med, Germany • ZBW, Germany • Gesis, Germany • Library of the ETH Zürich • L’Institut de l’Information Scientifique • et Technique (INIST), France • Swedish National Data Service (SND) • Australian National Data Service (ANDS) • Conferenza dei Rettori delle Università Italiane (CRUI) • National Research Council of Thailand (NRCT) • Hngarian Academy of Sciences Affiliated member: Digital Curation Center (UK) Microsoft Research Interuniversity Consortium for Political and Social Research (ICPSR) Korea Institute of Science and Technology Information (KISTI) Bejiing Genomic Institute (BGI) Institute of Electrical and Electronics Engineers (IEEE) Harvard University Library World Data System (WDS) GWDG

  17. Earth quake events => doi:10.1594/GFZ.GEOFON.gfz2009kciu Climate models => doi:10.1594/WDCC/dphase_mpeps Sea bed photos => doi:10.1594/PANGAEA.757741 Distributes samples => doi:10.1594/PANGAEA.51749 Medical case studies => doi:10.1594/eaacinet2007/CR/5-270407 Computational model => doi:10.4225/02/4E9F69C011BC8 Audio record => doi:10.1594/PANGAEA.339110 Grey Literature => doi:10.2314/GBV:489185967 Videos => doi:10.3207/2959859860 What type of data are we talking about? Anything that is the foundation of further reserach is research data Data is evidence

  18. DataCite in 2013 • Over 2,00,000 DOI names registered so far. • 262 data centers. • 5,600,000 resolutions in 2013 so far. • DataCite Metadata schema published (in cooperation with all members) http://schema.datacite.org • DataCite MetadataStore • http://search.datacite.org

  19. DataCite search • Searchterm: * • Searchterm: uploaded:[NOW-7DAY TO NOW] • Searchterm: relatedIdentifier:* • Searchterm: relatedIdentifier:issupplementto\:10.1029* • Searchterm:relatedIdentifier:*\:10.1055*

  20. OAI and Statistics • OAI Harvester • http://oai.datacite.org • DataCite statistics (resolution and registration) • http://stats.datacite.org

  21. DataCite Content Service • Service for displaying DataCite metadata • Different formats (BibTeX, RIS, RDF, etc.) • Content Negotation (through MIME-Typ) • Access through DOI proxy (http://dx.doi.org) • First implemented by CNRI and CrossRef: • Documentation: • http://www.crosscite.org/cn/

  22. Content negotiation Optimized for m2m communication using the accept header of the http protocol curl -L -H "Accept: MIME_TYPE" http://dx.doi.org/DOI Try a shortcut out in any webbrowser: http://data.datacite.org/MIME_TYPE/DOI • http://data.crossref.org/DOI

  23. Resolving to the citation • http://data.datacite.org/application/x-datacite+text/10.5524/100005 Li, j; Zhang, G; Lambert, D; Wang, J (2011): Genomic data from Emperor penguin. GigaScience. http://dx.doi.org/10.5524/100005

  24. Resolving to the RDF metadata http://data.datacite.org/application/rdf+xml/10.5524/100005 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:j.0="http://purl.org/dc/terms/" > <rdf:Description rdf:about="http://dx.doi.org/10.5524/100005"> <j.0:identifier>10.5524/100005</j.0:identifier> <j.0:creator>Li, J</j.0:creator> <j.0:creator>Zhang, G</j.0:creator> <j.0:creator>Wang, J</j.0:creator> <owl:sameAs>doi:10.5524/100005</owl:sameAs> <owl:sameAs>info:doi/10.5524/100005</owl:sameAs> <j.0:publisher>GigaScience</j.0:publisher> <j.0:creator>Lambert, D</j.0:creator> <j.0:date>2011</j.0:date> <j.0:title>Genomic data from the Emperor penguin (Aptenodytes forsteri)</j.0:title> </rdf:Description></rdf:RDF>

  25. Example of use This allows persistent identification of RDF statements! Implemented for all over 45 million CrossRef and DataCite DOI names • Example of use: • DOI Citation Formatter • http://www.crosscite.org/citeproc/

  26. 2012: STM, CrossRef and DataCite Joint Statement • To improve the availability and findability of research data, the signers encourage authors of research papers to deposit researcher validated data in trustworthy and reliable Data Archives. • The Signers encourage Data Archives to enable bi-directional linking between datasets and publications by using established and community endorsed unique persistent identifiers such as database accession codes and DOI's. 3. The Signers encourage publishers and data archives to make visible or increase visibility of these links from publications to datasets and vice versa 26

  27. Example The dataset: Storz, D et al. (2009): Planktic foraminiferal flux and faunal composition of sediment trap L1_K276 in the northeastern Atlantic. http://dx.doi.org/10.1594/PANGAEA.724325 Is supplement to the article: Storz, David; Schulz, Hartmut; Waniek, Joanna J; Schulz-Bull, Detlef; Kucera, Michal (2009): Seasonal and interannual variability of the planktic foraminiferal flux in the vicinity of the Azores Current. Deep-Sea Research Part I-Oceanographic Research Papers, 56(1), 107-124, http://dx.doi.org/10.1016/j.dsr.2008.08.009

  28. Next steps • ODIN project with ORCID. • http://datacite.labs.orcid-eu.org/ • MoU with Thomson reuters to cooperate on data citation index • DataCite plugin for next D-Space release (early 2014)

  29. Let us get back to libraries

  30. The wave Growth of Information – User requirements – e. g. : Science 2.0, collaborative networks, social media Diversity of media types and formats

  31. A threat? • Information overload is only a problem for manual curation. • Google is not complaining about data deluge—they’re constantly trying to get more data. • The more data you throw, the better the filter gets. • To develop and maintain these tools is a classical tasks for libraries! • Don’t turn off the taps, build boats.

  32. It is not only a challenge … • … it is an opportunity • Libraries should ride the wave …

  33. Thank you!

More Related