1 / 17

DataCite , DataONE, Dryad and UC3

DataCite , DataONE, Dryad and UC3. William Michener DataONE and University of New Mexico John Kunze and Patricia Cruse University of California Curation Center (UC3), California Digital Library and DataONE Ryan Scherle Dryad (National Evolutionary Synthesis Center) and DataONE.

adelle
Download Presentation

DataCite , DataONE, Dryad and UC3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DataCite, DataONE, Dryad and UC3 William Michener DataONE and University of New Mexico John Kunze and Patricia Cruse University of California Curation Center (UC3), California Digital Library and DataONE Ryan Scherle Dryad (National Evolutionary Synthesis Center) and DataONE

  2. A Choice Roberto Rizzato If the scientific record is at risk • Results can’t be reproduced • Science fails, global catastrophe ensues The choice: Better data publishing, sharing, and archiving OR Planetary destruction?

  3. A Vision for Change: DataONE Providing universal access to data about life on earth and the environment that sustains it 1. Build on existing cyberinfrastructure engaging the scientist in the data curation process supporting the full data life cycle encouraging data stewardship and sharing promoting best practices engaging citizens developing domain-agnostic solutions 2. Create new cyberinfrastructure 3. Support new communities of practice

  4. Coordinating Nodes • retain complete metadata catalog • subset of all data • perform basic indexing • provide network-wide services • ensure data availability (preservation) • provide replication services DataONE Cyberinfrastructure Flexible, scalable, sustainable network • Member Nodes • diverse institutions • serve local community • provide resources for managing their data

  5. DataONE Wish List for Data Citation • Precise identification of a dataset • At level of version, file, table, cell, etc., or groups thereof • So that readers can find and understand the data • Credit to data producers and data publishers • Vital incentive for data sharing and archiving • A link from the traditional literature to the data • Gives intellectual legitimacy to creation of data sets • Research metrics for datasets • Sponsors want publication and retention numbers • Coordinated citation support for local data producers, regional archives, and global end-users

  6. Identifier Requirements • To accommodate a diverse set of member nodes that hold a wide variety of content, the DataONE system must adhere to the following principles: • Agnosticism – DataONE supports all identifier schemes where the ID can be represented as a Unicode string. • Opacity – DataONE does not attach any meaning or resolution protocol based on the identifier. • Authority – The identifier first assigned by a member node is authoritative. Other identifiers may be assigned by other nodes for internal use.

  7. Identifier Requirements • To participate in the DataONE network, a node must be able to meet the following requirements: • Uniqueness – Identifiers must be unique across the space of DataONE. • Granularity – Every item must be assigned an identifier (metadata as well as data). • Immutability – The object referenced by an identifier cannot change. If an object is modified, it must receive a new identifier.

  8. Think Big, Start Small CDL leading 2 projects involving DataONE: • EZID for simple identifier management • Creates ids, stores metadata and resolver target URLs • Supports DataCiteDOIs and lower-cost ids (ARKs, URLs) • First customer is DataONE member, Dryad • Excel “add-in” project with MS Research • Extend Excel to support data sharing, archiving, and access • E.g., ability to export to data archive in a standard format with column headings drawn from a shared vocabulary

  9. DataONE/DataCite Example DOI resolver and TIB registration 5. URL plus id EZID resolver and registration service 4. save full citation DataCite Member (eg, CDL) 3. citation + URL + id 6. full citation DataONE Coordinating Node metadata catalog (eg, UNM or UCSB) DataONE Member Node data archive (eg, Dryad) 2. metadata + URL + id 7. full citation get unique id string data + metadata Research scientist (opt) CDL-hostedEZID id minting service get unique id string

  10. A Repository of Data Underlying Journal Articles

  11. The Goal ccaattggctgttcttcgattctggcgagt GenBank TreeBASE Dryad Store all data underlying publications in evolutionary biology, ecology, and related disciplines, at the time of publication.

  12. Identifiers and Versioning Each “data package” receives a DOI, which refers to the most recent version of the file. doi:10.5061/dryad.20 When repository content is modified, a version indicator will be appended to the original DOI doi:10.5061/dryad.20.2 To specify a particular file within the data package, a slash is used. doi:10.5061/dryad.20.2/3

  13. Identifiers and Versioning Metadata and particular formats of the files are not given “true” DOIs. They are reachable by appending a parameter to the DOI. doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=dc doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=xls

  14. Citation • When using data from Dryad, please cite the original article. • Sidlauskas, B. 2007. Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Evolution 61: 299–316. • Additionally, please cite the Dryad data package. The citation should include the following elements: • Author(s) • The date on which the data was deposited • The name of the data file, if applicable • The title of the data package, which in Dryad is always "Data from: [Article name]" • The name "Dryad Digital Repository" • The data identifier • For example: • Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20

  15. Challenges/Questions • Dealing with dynamic streaming data? • How do versions enter into the identifiers scheme? • Resolving to human or machine-interpretable description of object? • Need for a registry of name spaces? • Can metadata stds support multiple globally unique identifiers?

More Related