1 / 24

Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

Geoscience Data Repository in Digital Object Model and Open-Source Frameworks: Provenance Applications (ESDORA Project). Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook NASA ORNL DAAC Environmental Science Division Oak Ridge National Laboratory. Agenda.

morty
Download Presentation

Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geoscience Data Repository in Digital Object Model and Open-Source Frameworks:Provenance Applications (ESDORA Project) Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook NASA ORNL DAAC Environmental Science Division Oak Ridge National Laboratory

  2. Agenda • Geoscience Data Curation • System Components & Digital Object Model • Capabilities & OAIS Mapping • Provenance Applications • Conclusion Remarks

  3. Digital Data Curation Maintaining and adding value to a trusted body of digital information for current and future use throughout its lifecycle

  4. Important Aspects of Data Curation • Auditing: What changed, when (contextual environment, status) • Lineage and provenance: The derivation history of data formally recorded, and is both machine and human understandable now and in the future. • Versioning: Keep earlier versions a data stream in a data system, such that we can revert to an earlier version if needed. • Identifier: Data is identifiable and citable, using a standardized scheme, e.g., the Digital Object Identifier (DOI) system. • Integrity: The integrity of data files at any time of its lifecycle is verifiable. • Interoperable/accessible for long term: Accessible with ease by users and software.

  5. The Challenge • Tremendous amount of data in Geosciences is being generated, digital curation needs to be in place for preservation and reuse. Yet, there is not a generic, interoperable system to manage, preserve, and deliver relevant metadata and data processing lineage information along with the actual content.

  6. ESDORA: A Complete Data System Built on Fedora Digital Object Model User Interface: Drupal & Islandora Search & Discovery: Apache Solr & Fedora Semantic Store Archive Management: Fedora Repository http://esdora2.ornl.gov/

  7. Fedora Digital Object Model Content Digital Object ID Object Info Semantics XML Encoding Audit Metadata 1 Metadata 2 Content 1 Content 2 … ( Payette, S. and C. Lagoze, 1998 )

  8. ESDORA Capabilities: • Metadata and data managed together in one logic unit • Integrity checks, versions, and auditing trails • Machine-readable semantics for provenance knowledge • XML-encoding for long-term storage, access, and recovery • Search, discovery, metadata publishing • Multiple standards (FGDC, ISO, EML, etc…) accommodated (we use FGDC)

  9. OAIS Reference Architecture

  10. Information Unit Logical information units (packages) for ingestion, management, dissemination OAIS – SIP: Submission Information Package OAIS – AIP: Archival Information Package OAIS – DIP: Dissemination Information Package

  11. ESDORA SIP Data set (folder) -- Metadata (folder with structured and non-structured metadata files) -- Data (folder with actual data files) ID Object Info Semantics Audit Policy FGDC Metadata Free Text Metadata Data Content …

  12. ESDORA AIP (data/metadata coexist) ID Object Info Semantics Audit Policy FGDC Metadata Free Text Metadata Data Content …

  13. Inline Metadata Editor

  14. ESDORA DIP • REST Web Services • Data Objects • Collection Objects • Datastreams • Metadata in OAI-PMH • Indexing & Search • http://esdora2.ornl.gov/oaiprovider/?verb=ListRecords&metadataPrefix=fgdc

  15. Solr-Enabled Indexing & Search • Simple Keyword Search • Faceted Search • Spatial/Temporal Search • Result linked to data objects 2011 NASA ESDSWG Meeting, Newport News, VA

  16. Provenance in ESDORA 2011 NASA ESDSWG Meeting, Newport News, VA

  17. Where should provenance be stored? In software applications: BAD In accompanying files: BAD In structured metadata records: BAD if not linked to data Semantically a part of the content system: GOOD Internal metadata sources (often file system) Application Structured metadata stores (database or indexing engine) 2011 NASA ESDSWG Meeting, Newport News, VA users External metadata sources on the Web

  18. DOI FGDC ISO Read me Guide docs Datastream 1 ESDORA: Metadata & semantic relations are stored in the same digital object as the data content Application Semantics Application uses semantic queries for knowledge stored in objects Datastream x

  19. Synthetic Land Cover Data Chain (SYNMAP)(Modeling and Synthesis Thematic Data Center, MAST-DC) Analyzed_SYNMAP Analyzed Potential_SYNMAP Original_SYNMAP To provide the standardized land cover map for Multi-scale Synthesis and Terrestrial Model Intercomparison Project, the Original SYNMAP is assembled from four independent products, which is in-turn reprocessed (common resolution, extent, CF-Compliant NetCDF) to produce the Analyzed SYNMAP and Potential SYNMAP at global and North American scales. AVHRR_CFTC MODIS_GLC GLCC GLC2000

  20. Provenance: Data derivation history Object: Analyzed_SYNMAP Semantics (RDF): This object is “DerivedFrom” Original_SYNMAP Processing info … Data derivation history information are recorded and stored in Fedora RDF semantic store. The semantic store are indexed, and can be queried using SPARQL and iTQL

  21. Provenance: Granule checksums

  22. Provenance: Auditing trail and versioning history

  23. Conclusion Remarks • The digital object model abstraction reduces the complexity of data curation. • Object semantics and XML encoding can be used to preserve provenance knowledge as well descriptive metadata. • The integrated system addresses many metadata and provenance issues and can be used as an archive system for Geoscience data content.

  24. http://esdora2.ornl.gov/ • Acknowledgement: This work is funded by NASA ACCESS Grant # 09-ACCESS09-8 • The team would like to thank Stephen Berrick for progress reviews and guidance • Contact: Jerry Pan, pany@ornl.gov

More Related