1 / 35

Data Lifecycle Workshop

Data Lifecycle Workshop. Overview - R. Duerr. Agenda. Intro to topics Data Stewardship - Al Fleig HDF Maps - C. Lynnes HDF archive format - R. Duerr Object identifiers paper - R. Duerr Discussion and Work plan development. HDF Archive Format. Project Goals.

morse
Download Presentation

Data Lifecycle Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Lifecycle Workshop Overview - R. Duerr

  2. Agenda • Intro to topics • Data Stewardship - Al Fleig • HDF Maps - C. Lynnes • HDF archive format - R. Duerr • Object identifiers paper - R. Duerr • Discussion and Work plan development 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  3. HDF Archive Format 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  4. Project Goals • Prototype development of Archive Information Packages for HDF data: • For entire data sets • For individual “granules” • Test usability of digital library standards with geospatial data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  5. Original NOAA SDS Program Plan CDM/NetCDF4 NOAA:NMMR FGDC NOAA:CLASS ECS to FGDC HDF5-AIP NSIDC/ECSMetadata NetCDF4 / HDF5 Data METS ECS to METS NSIDC/ECS HDF4-data H4toH5 NetCDF4/HDF5-data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 5

  6. Current Program Plan ISO-19115 CDM/NetCDF4 ECS to METS (Data Set) HDF5-AIP NSIDC/ECSMetadata ECS to METS (Granule) NetCDF4 / HDF5 Data METS NSIDC/ ECS HDF4-data H4toH5 NetCDF4/HDF5-data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  7. HDF5 File Level Archive Information Packages Data file HDF5 METS Metadata file Primary SchemaExtension Schema |<mets> |---<dmdSec>----------------<ISO 19115> |---<amdSec>--------------|--<techMD> | |--<rightsMD> PREMIS | |--<sourceMD> |----<fileGrp> |----<structMap> HDF5 AIP Components http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  8. Data Set Level Archive Information Package METS Metadata file Primary SchemaExtension Schema |<mets> |---<dmdSec>----------------<ISO 19115> |---<amdSec>--------------|--<techMD> | |--<rightsMD> PREMIS | |--<sourceMD> |----<fileGrp> |----<structMap> HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  9. Contextual Information: • Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics, etc.) • Instrument/sensor calibration data and method • Processing algorithms and their scientific basis, including complete description of any sampling or mapping algorithm used in creation of the product (e.g., contained in peer-reviewed papers, in some cases supplemented by thematic information introducing the data set or derived product) • Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 9

  10. Contextual Information (continued): • Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive • Quality assessment information • Validation record, including identification of validation data sets • Data structure and format, with definition of all parameters and fields • In the case of earth based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors which could influence the long-term record • A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set • Information received back from users of the data set or product Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10

  11. Backup Materials - PREMIS & METS 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  12. Metadata Standards - PREMIS • Provide a core preservation metadata set with broad applicability across the digital preservation community • Developed by an OCLC and RLG sponsored international working group • Representatives from libraries, museums, archives, government, and the private sector. • Based on the OAIS reference model 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  13. Metadata Standards - PREMIS • Maintained by the Library of Congress • Editorial board with international membership • User community consulted on changes through the PREMIS Implementers Group • Version 1 was released in June 2005 • Version 2 was just released 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  14. PREMIS - Entity-Relationship Diagram Intellectual Entities “an action that involves atleast one object or agentknown to the preservationrepository” e.g., created, archived,migrated Rights “a person, organization, orsoftware program associatedwith preservation events inthe life of an object”e.g., Dr. Spock donated it “a discrete unit of information in digital form” For example, a data file “a coherent set of contentthat is reasonablydescribed as a unit” For example, a web site, data set or collection of data sets Objects Agents “assertions of one or more rights or permissionspertaining to an objector an agent” e.g., copywrite notice, legalstatute, deposit agreement Events 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  15. PREMIS - Types of Objects • Representation - “the set of files needed for a complete and reasonable rendition of an Intellectual Entity” • File • Bitstream - “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes” 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  16. Metadata Standards - METS • Metadata Encoding and Transmission Standard • An initiative of the Digital Library Federation • Based on the Making of America II project 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  17. METS - What’s Its Purpose? • Provides the means to convey the metadata necessary for • management of digital objects within a repository • exchange of objects between repositories (or between repositories and their users) • Designed to facilitate • shared development of information management tools/services • interoperable exchange of digital materials 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  18. METS - What’s its status? • Version 1.6 was released in Sept. 2007 • Maintained by the Library of Congress • International Editorial Board • NISO registration as of 2006 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  19. Backup Materials - MODIS Contextual Info 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  20. Instrument/sensor characteristics Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 20

  21. Instrument/sensor calibration data and method Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 21

  22. Processing Algorithms & Scientific Basis Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 22

  23. Ancillary Data Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 23

  24. Processing History including Source Code Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 24

  25. Quality Assessment Information Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 25

  26. Validation Information Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 26

  27. Other Factors that can Influence the Record Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 27

  28. Bibliography Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 28

  29. Information from users • Data Errors found • Quality updates • Things that need further explanation • Metadata updates/additions? • Community contributed metadata???? 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  30. Backup Materials - HDF AIP Challenges 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

  31. Challenges to do the conversion 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group • Retrieve geo-location information from HDF-EOS2 data • Conform to NetCDF4 data model in the existing H4toH5 conversion tool • …… 10/16/2008 HDF and HDF-EOS Workshop XII 31

  32. Grid lacks geolocation fields Use predefined projections Geographic Sinusoidal Polar stereographic … New converter creates geolocation fields HDF-EOS2 API GDij2ll() Challenges: Handle EOS - Grid Data [4][12] Lon[12] Data [4][8] Lon[4][8] Geographic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sinusoidal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 32

  33. The size of geolocation fields can be different from data fields New converter has to handle geolocation fields correctly Challenges: Handle EOS - Swath . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 33

  34. Follow CF conventions Create two variables: NewLongitude and NewLatitude Add to the data field an attribute coordinates=“NewLongitudeNewLatitude” Keep the original Latitude and Longitude Challenges in conforming to NetCDF4 Longitude field has two columns Data field has three columns New longitude has three columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 34

  35. Object Identifiers Paper 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group

More Related