30 likes | 128 Views
Explore how Earth science data production is linked to provenance, societal implications, and legal proof of custody. Discover the complexity of organizing time series data and the challenges of metadata standards in data management.
E N D
Provenance, Production, and Planning Bruce R. Barkstrom NOAA’s National Climatic Data Center Asheville, NC
Basic Facts • Much of Earth Science (and some space science) data results from discrete production • Files • Jobs • Files and Jobs are • Denumerable • Indexable as time series • Connection between jobs files and jobs is a graph • Societal importance of climate data will require legal-strength proof of chain of custody and production
Key Consequences • Jobs may use previously produced data to guide next step in production • Provenance graphs may include millions of objects • Cannot expect provenance to fit within files • Current metadata standards are (provably) incomplete • New versions of data products may be produced by 4 kinds of changes: • Input data, source code, coefficients, connectivity + hardware/infrastructure code • Time series organization of files gives a reasonable basis for a hierarchical permanent registration schema for files and file contents