1 / 12

ORNL DAAC Experience With Digital Object Identifiers ( DOIs )

ORNL DAAC Experience With Digital Object Identifiers ( DOIs ). Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010. Acknowledgements and Sources. Bob Cook, ORNL DAAC Scientist DataONE Core CI team, particularly Matt Jones (UCSB) and Dave Vieglais (U Kansas)

barth
Download Presentation

ORNL DAAC Experience With Digital Object Identifiers ( DOIs )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Managerfor NASA Data Center Managers telecon 22 Feb 2010

  2. Acknowledgements and Sources • Bob Cook, ORNL DAAC Scientist • DataONE Core CI team, particularly Matt Jones (UCSB) and Dave Vieglais (U Kansas) • ESIP Product & Stewardship, particularly Ruth Duerr (NSIDC) and Bob Downs (SEDAC) • Note: ORNL’s CDIAC has started assigning DOI’s for all of their finalized data sets.

  3. ORNL DAAC Citation Policy • http://daac.ornl.gov/citation_policy.html • http://daac.ornl.gov/citation_style.html • Citation is in the name of the investigators • Example (with DOI): • Turner, D.P., W.D.Ritts, and M. Gregory. 2006. BigFoot NPP Surfaces for North and South American Sites, 200-2004. Data set. Available from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. [http://daac.ornl.gov]. doi:10.3334/ORNLDAAC/750

  4. What Problem Are We Addressing? • ORNL DAAC has used data citations for many years • Track use of data in literature (impact) • Provide credit to investigators • Create incentives for publishing and sharing data • Some journal editors rejected URL citations • Regarded as transient (very valid concern) • Some scientists didn’t see data as “publication” • We want data sets listed on CV’s • Strong way to measure impact of data set for tenure

  5. What Is a DOI? • Technically, it’s a particular Handle implementation • Limited number of registrars • Each publisher gets a prefix (e.g. 10.3334) • Publisher assignsan identifier after the prefix • Publisher registers the DOI with a URL and metadata • Endpoint URL can be updated as systems evolve • Registration can include back-links (documents cited) • Enables citation chain • Can help establish dependence of data sets (future use) • DOI resolves at use time to current endpoint URL • http://dx.doi.org/10.3334/ORNLDAAC/945 • doi:10.3334/ORNLDAAC/945

  6. ORNL Experience • Working with CrossRef as a registrar • $500/year membership fee, ~$250 to register 900 DOIs • Our DOIs resolve to a web page about the dataset • Very positive reaction from investigators • Makes usage metrics somewhat easier • Haven’t implemented backlinking yet, but should • It’s a social contract that we don’t change the data • Updated dataset ==> new DOI (if “significant”) • Minor updates (spelling corrections, clarifications) OK • Adding a new data format file is harder to decide

  7. Different types of update operations • Correct reference or spelling in documentation • No change in DOI, but still should show provenance • Augment documentation for clarity • No change in DOI, but still should show provenance • Add copy of data in new format • Probably no change in DOI, but still should show provenance • Correct error in data • New DOI; show provenance • Append new data • New DOI; show provenance

  8. DOIs work well for some things • Finalized datasets (ones that don’t change) • Datasets that change occasionally • Global Fire emission dataset updated annually • Documents (best practices, product documentation) • Could work for Remote Sensing at the product level

  9. DOI’s lessappropriate for other things • Cost (primarily) prohibits assigning for granules • Unique ID’s needed, but may be data center-internal • DOIs are a publishing standard, adapting for data ID • Dynamically generated and stream data • One DOI per MODIS product probably makes sense • Desirable to be able to reproduce data, but hard • MODIS subsetter (particularly considering reprocessed granules) • Would have to have a separate identifier for each request • Other processing tools, like OGC web services • Possibly use data citations with workflow provenance • Partition data citation from data reproducibility

  10. Good citations help assess dependence • Synthesis is increasingly important science • Are all of the data used in the study independent? • Example: Luyssaertet al Net Primary Productivity (NPP) • Data at ORNL DAAC (doi:10.3334/ORNLDAAC/949) • Article at doi:10.1111/j.1365-2486.2007.01439.x • Drawn from many sources (very well documented) • ftp://ftp.daac.ornl.gov/data/global_vegetation/forest_carbon_flux/comp/appendix_a_database_sources.pdf • Future work using Luyssaert dataset can’t compare it to any of the underlying data • Also an issue in cal-val for remote sensing • What data was used for this RS product?

  11. Data Identifiers are evolving • DataCite.org (German Library + others, including CDL) • Particularly focused on research data • Life Science Identifiers (LSID) • Heavily used in oceans community • Some concerns about URN versus URI • Seehttp://en.wikipedia.org/wiki/LSID • Globally Unique Identifiers (GUIDs) • Need some type of resolution mechanism • Big challenge to support something “forever”

  12. Impact Metrics • “Cited” means formal citation in reference list • “Referred” means the data was acknowledged somewhere in the body of the paper

More Related