1 / 30

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation. Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall. Overall Principles. Consistent with the Open Archival Information System (OAIS) model Distributed, secure ingestion Use of web/grid technologies – platform independent

skip
Download Presentation

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall

  2. Overall Principles • Consistent with the Open Archival Information System (OAIS) model • Distributed, secure ingestion • Use of web/grid technologies – platform independent • Minimal client-side requirements • Ease of integration with archival storage or data grid systems.

  3. Producer

  4. Producer • Provides data to an Archive based on a prior agreement. • Consists of a management/metadata server and an ingestion client. • Provides initial arrangement, context, and metadata.

  5. Archive - receiving

  6. Archive – receiving • Receives data from a Producer • Validates bitstreams and metadata, and sends acknowledgement to Producer. • Arranges into collections and specifies preservation policy. • Publishes bitstreams into a digital archive.

  7. Archive – Long term preservation • Implemented using grid technologies. • Use the existing prototype NARA/UMD/SDSC site. • Automated replication and integrity checking. • Enforces access control and preservation policy

  8. Ingestion Workflow • Negotiate Submission Agreement. • Workflow Initialization and Submission Information Packet (SIP) creation. • Transfer of SIPs to archive. • Validation of SIP transfer • Organization of data into collections and transfer into persistent archive.

  9. Submission Agreement • Based on data appraisal and record schedule, including format and metadata. • Create machine actionable set of rules describing items. • Final Submission Agreement is composed of: • METS document for application defaults • METS Constraint document to limit METS form to submission parameters

  10. METS Overview • Provides a framework for linking structural organization of objects with metadata. • Using XML namespace, metadata from various XML schema can be attached to objects • Ie, dublin core, FGDC, etc • Extensible for more complex metadata • http://www.loc.gov/standards/mets/

  11. Sample METS Document

  12. Why METS Constraints? • METS doesn’t provide a way to create machine interpretable rules describing a collection • Ie: allow only JPEG files in certain structural areas • METS profiles allow for developer interpretable rules, not machine interpretable

  13. METS Constraints • Allows structural, metadata, and file constraints. • Structural Constraints: • Restrict child div’s and restrict pointers to div, file, and other mets documents • File Constraints: • Restrict files by mime-type or validation tests • Metadata Constraints: • Restrict allowed metadata schema.

  14. METS Constraints Example • <techMD ID="WORD97"> • <mdWrap LABEL=”MS Word 97”> • <arc:valgrp required="yes"> • <arc:valtest class="wordextension" required="no"> • <arc:valtest class="wordparser" required="yes"> • </arc:valgrp> • </mdWrap> • </techMD> • ... • ... • <structMap TYPE="logical" > • <div ID="DIV1" LABEL="Toxic Chemical Release Inventory System" > • ID="DIV2" ORDER="1" LABEL="Reports for 1997“ DMDID="tree97"> • </div> • <div ID="DIV3" ORDER="2" LABEL="Meeting Notes for 1997" • DMDID="tree98"> • </div> • ... • </div> • ... • </structMap> • … • <divrule ID="DIV1" FILEALLOW="no" DIVALLOW="NO"></divrule> • <divrule ID="DIV2" FILEALLOW="yes" DIVALLOW="yes"></divrule> • <divrule ID="DIV3" FILEALLOW="yes" DIVALLOW="NO"> • <filegrp> • <file ID="GIF98"> • <file ID="WORD97"> • </filegrp> • </divrule>

  15. Ingestion Workflow • Negotiate Submission Agreement. • Workflow Initialization and Submission Information Packet creation. • Transfer of SIPs to archive. • Validation of SIP transfer • Organization of data into collections and transfer into persistent archive.

  16. Initialize Ingestion workflow • Instantiate Producer management server to track registered objects • Establish a working trust relationship with the Archive • Issue clients.

  17. Create SIP • Each client registers objects stored locally with producer management server • Register file types, validation tests, etc • Client follows rules in Submission Agreement • Producer-wide agents can arrange registered object to give a broader context

  18. METS Handles all areas of a SIP except Physical Object and Descriptive Information Descriptive Information can be embedded into METS as 3rd party XML schema SIP Example

  19. Client Interface

  20. Ingestion Workflow • Negotiate Submission Agreement. • Workflow Initialization and Submission Information Packet creation. • Transfer of SIPs to archive. • Validation of SIP transfer • Organization of data into collections and transfer into persistent archive.

  21. Transfer SIP to archive • Retrieve previously registered SIP from producer management server • Authenticate to archive • Update provenance information in METS document with file structure of SIP • Transfer METS document describing SIP and container for SIP physical objects • Archive acknowledges transfer completion to producer management server

  22. Ingestion Workflow • Negotiate Submission Agreement. • Workflow Initialization and Submission Information Packet creation. • Transfer of SIP to archive. • Validation of SIP transfer • Organization of data into collections and transfer into persistent archive.

  23. Validation of SIP transfer • Check incoming SIP against constraints documents. • Ensure object integrity by verifying checksums/cryptographic digest • Validate bitstreams against tests described in METS document • Update METS document with validation results and movement of objects on receiving server

  24. Ingestion Workflow • Negotiate Submission Agreement. • Workflow Initialization and Submission Information Packet creation. • Transfer of SIP to archive. • Validation of SIP transfer • Organization of data into collections and transfer into persistent archive.

  25. Final transfer to archive • Transfer objects to digital archive • Update provenance information in METS document with handle to object in archive • Transfer METS document into archive • Return accept/reject messages to producer metadata server

  26. Component Overview

  27. Producer Components • Database to track registered objects • Certificate Authority management • Web service for archive security check • Management server supplies web service interfaces to ingestion clients and management operations. • Clients are designed to be standalone, with security certificates issued by producer

  28. Archive Components • Receiving servers validate connecting clients and validate SIPs • Validation Services are simple webservice calls. • Abstract I/O layer into digital archive. • All components are scalable using standard load balancing techniques.

  29. Recap • Implemented using web technologies • Architecture independent • OAIS compliant • XML based metadata • METS based SIPs • Add-on constraints describing Submission Agreement

  30. Questions?? • For more information • http://www.umiacs.umd.edu/research/adapt

More Related