Dataone preserving data and enabling data intensive biological and environmental research
Download
1 / 10

DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research. Bob Cook Environmental Sciences Division Oak Ridge National Laboratory February 6, 2013 NACP All-Investigator Meeting. The DataONE Vision and Approach:.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research' - tejana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dataone preserving data and enabling data intensive biological and environmental research

DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research

Bob Cook

Environmental Sciences Division

Oak Ridge National Laboratory

February 6, 2013

NACP All-Investigator Meeting


The dataone vision and approach
The DataONE Vision and Approach:

Providing universal access to data about life on earth and the environment that sustains it, as well as the tools needed by researchers.

2. Developing sustainable data discovery and interoperability solutions

3. Supporting researcher tools and services

1. Building community


The long tail of orphan data
The long tail of orphan data

Specialized repositories (50%)

Characteristics

Big Science

Large Volume

Automated sensos

Well described

Well curated

Easily Discovered

Volume

Orphan data (50%)

Rank frequency of datatype

(B. Heidorn)

Characteristics

  • Small Science

  • Small Volume

  • Poorly described

  • Rarely Indexed

  • Invisible to scientists

  • Rarely Used

  • Dark Data

  • High spatial resolution

  • Process based

  • Theory Development

  • Model Development

  • Benchmarking


Data &

Metadata (EML)

https://dataone.org

http://dataup.cdlib.org/


Model data fusion harnessing observations
Model-Data Fusion: Harnessing Observations

  • Sponsor Requirements for Data Management

  • Credit for data through citation, DOI, and Data Citation Index

  • Training in Data Management

  • Improved tools for data preparation – DataUp

  • Developing a metadata editor


Model data fusion data system characteristics 1
Model-Data Fusion:Data System Characteristics (1)

  • Dedicated financial support for data management is essential

  • Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products

  • Based on a data management plan and a data policy

  • Integrated system that delivers a suite of diverse products

  • Establish standards (file, workflow, network) and promote interoperability

  • Processes to assure and document data quality to allow proper interpretation and use


Model-Data Fusion:

Data System Characteristics (2)

  • Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data

  • Promote the use of best practices to prepare and document data to share and archive

  • Make efficient use of existing data management infrastructure and resources

  • Ensure that finalized data and associated documentation are transferred to an appropriate archive

  • Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005)


Interoperability
Interoperability

Coordinating Nodes

EML, ISO FGDC

KNB

EML

LTER

Internal

Metadata

Index

FGDC, ISO

ORNL DAAC

Member Nodes

Metadata Extraction

FGDC

CDL

  • Virtual Portals

  • Numerous search capabilities

  • Metadata has link to data, which reside at Member Nodes

FGDC, ISO

USGS CSAS

METS

Future

DRYAD


The long tail of orphan data1
The long tail of orphan data

“Most of the bytes are at the high end, but most of the datasets are at the low end”– Jim Gray

Specialized repositories

(e.g. Remote Sensing, NEON)

Volume

Orphan data

(B. Heidorn)

Rank frequency of datatype


Data intensive science and the 80 20 rule
“Data intensive science” and the “80:20 rule”

Intensive science sitesand experiments

Decreasing Spatial Coverage

Increasing Process Knowledge

Extensive science sites

Volunteer &

education networks

Remote

sensing

Adapted from CENR-OSTP


ad