preservation environments research group n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Preservation Environments Research Group PowerPoint Presentation
Download Presentation
Preservation Environments Research Group

Loading in 2 Seconds...

play fullscreen
1 / 32

Preservation Environments Research Group - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Preservation Environments Research Group. Organizers: Reagan Moore ( moore@sdsc.edu ) Richard Marciano (marciano@sdsc.edu) Goals: Analyze capabilities required by a preservation environment Define rule-based preservation environment - iRODS

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Preservation Environments Research Group' - tad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
preservation environments research group
Preservation Environments Research Group
  • Organizers: Reagan Moore (moore@sdsc.edu)

Richard Marciano (marciano@sdsc.edu)

  • Goals:
    • Analyze capabilities required by a preservation environment
      • Define rule-based preservation environment - iRODS
      • RLG/NARA assessment criteria for a Trusted Digital Repository
        • CASPAR - representation information
        • SHAMAN - migration micro-services
    • Demonstrate creation of a preservation environment based on data grid technology
      • Demonstrate creation of preservation rules controlling a preservation environment
  • Participants:
    • CASPAR - Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval
    • SHAMAN - Sustaining Heritage Access through Multivalent ArchiviNg
    • NCRIS - National Collaborative Research Infrastructure Strategy
    • PLANETS - Preservation and Long-term Access through Networked Services
    • MIT - DSpace digital library
    • NARA Transcontinental Persistent Archive Prototype
    • U Md - Producer Archive Workflow Network
    • UK Digital Curation Centre
    • Taiwan National Archives

OGF-22

intellectual property policy
Intellectual Property Policy
  • I acknowledge that participation in OGF22 is subject to the OGF Intellectual Property Policy.
  • Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session,
  • any OGF working group or portion thereof,
  • the GFSG, or any member thereof on behalf of the GFSG,
  • the GFAC, or any member thereof on behalf of the GFAC,
  • any OGF mailing list, including any working group or research group list, or any other list functioning under OGF auspices,
  • the GFD Editor or the GWD process
  • Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions.
  • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.

OGF-22

data management applications
Data Management Applications
  • Data grids
    • Share data - organize distributed data as a collection
  • Digital libraries
    • Publish data - support browsing and discovery
  • Persistent archives
    • Preserve data - manage technology evolution
  • Real-time sensor systems
    • Federate sensor data - integrate across sensor streams
  • Workflow systems
    • Analyze data - integrate client- & server-side workflows
  • Coalescence of requirements into generic infrastructure

OGF-22

generic infrastructure
Generic Infrastructure
  • Data grids organize distributed data into shared collections
    • Persistent name spaces for files, users, storage
    • Collection attributes
      • Provenance, descriptive, system metadata
  • Data grids manage heterogeneous storage systems
    • Standard operations across file systems, tape archives, object ring buffers
    • Enable management of technology evolution
      • At the point in time when new technology is available, both the old and new systems can be integrated

OGF-22

preservation requirements
Preservation Requirements
  • Authenticity
    • Maintain information about provenance of data
    • Assertions made about the file at the time of ingestion
  • Integrity
    • Maintain information about the management of the data
    • Assertions made by the archivist
      • Access controls, audit trails, checksums, replication, synchronization, federation
  • Infrastructure independence
    • Management of properties of records independently of choice of storage system
  • Scalability
    • Management of large collections (billions of records, petabytes of data, thousands of attributes)

OGF-22

national archives and records administration transcontinental persistent archive prototype
National Archives and Records Administration Transcontinental Persistent Archive Prototype

NARA I

NARA II

Rocket Center

U Md

U NC

Georgia Tech

SDSC

MCAT

MCAT

MCAT

MCAT

MCAT

MCAT

MCAT

Federation of Seven Independent Data Grids

Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.

OGF-22

extremely successful
Extremely Successful
  • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections
  • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, …
    • Astronomy Data grid
    • Bio-informatics Digital library
    • Earth Sciences Data grid
    • Ecology Collection
    • Education Persistent archive
    • Engineering Digital library
    • Environmental science Data grid
    • High energy physics Data grid
    • Humanities Data Grid
    • Medical community Digital library
    • Oceanography Real time sensor data, persistent archive
    • Seismology Digital library, real-time sensor data
  • Goal has been generic infrastructure for distributed data

OGF-22

data grid evolution
Data Grid Evolution
  • Data grids
    • Management of preservation environment properties
      • Data and trust virtualization
    • Infrastructure independence
      • SRB - Storage Resource Broker
  • Rule-based data grids
    • Automation of management policies
      • Management virtualization
    • Open source software
      • iRODS - integrated Rule-Oriented Data System
      • http://irods.sdsc.edu

OGF-22

using a data grid details
Using a Data Grid - Details

DB

iRODS Server

Rule Engine

Metadata Catalog

Rule Base

iRODS Server

Rule Engine

  • User asks for data
  • Data request goes to iRODS Server
  • Server looks up information in catalog
  • Catalog tells which iRODS server has data
  • 1st server asks 2nd for data
  • The 2nd iRODS server applies rules

OGF-22

requirements driving evolution
Requirements Driving Evolution
  • Observe that as the size of the shared collections grow, the administrative tasks can become onerous.
    • Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment
  • Need to minimize labor support through automation of administrative functions
    • File ingestion tasks
    • Verification of desired collection properties
    • Integrity checks and replica management

OGF-22

requirements driving evolution1
Requirements Driving Evolution
  • Observe that each preservation environment has unique management policies
    • User administration
    • File retention & deletion
    • Time-dependent access controls
    • Data distribution and replication
    • File update (versions, backups)
    • Descriptive metadata

OGF-22

requirements driving evolution2
Requirements Driving Evolution
  • Socialization of collections
    • The archivists have specific properties that they assert the collection will possess
      • Completeness
      • Authoritative sources
      • Authenticity
    • The creators of the records have their own criteria for the properties they expect
  • Socialization is the mapping from creator assertions to archivist expectations
    • Extract records from the environment in which they were created and migrate into the preservation environment
    • Extract records from the preservation environment and deliver to users of the archive
    • Maintain assertions about the records during both extraction processes

OGF-22

data management
Data Management

iRODS - integrated Rule-Oriented Data System

OGF-22

rules
Rules
  • Rule classes
    • System enforced rules
    • Administrator controlled rules
    • User defined rules
  • Rule execution
    • Atomic rules - executed on each operation invoked by a client
    • Deferred rules - executed at a future time
    • Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity)

OGF-22

irods rule syntax
iRODS Rule Syntax
  • Event | Condition | Action-set | Recovery-set
    • Event - triggered by operation or queued rule
    • Condition - composed of tests on any attributes in

the persistent state information

    • Action-set - composed from both micro-services

and rules

    • Recovery-set - used to ensure transaction semantics

and consistent state information

  • Executed by a rule engine installed at each storage location - server side workflows

OGF-22

micro services
Micro-Services
  • Challenge is that storage systems do not provide desired processes
    • Have “minimal” set of standard operations that are performed at the storage system
    • Have actions required by clients such as replication, metadata extraction, format migration
    • Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes.

OGF-22

data virtualization
Data Virtualization

Access Interface

Map from the actions requested bythe access method to a standard set of micro-services. The

standard micro-services are mapped to the operations supported bythe storage system

Standard Micro-services

Data Grid

Standard Operations

Storage Protocol

Storage System

OGF-22

integrated rule oriented data system
integrated Rule-Oriented Data System

Service

Manager

Consistency

Check

Module

Rule

Engine

Client Interface

Admin Interface

Rule Invoker

Rule

Modifier

Module

Config

Modifier

Module

Metadata

Modifier

Module

Rule

Base

Current State

Consistency

Check

Module

Consistency

Check

Module

Confs

Resources

Metadata-based

Services

Resource-based

Services

Metadata

Persistent

Repository

Micro

Service

Modules

Micro

Service

Modules

OGF-22

distributed management system
Distributed Management System

Data

Transport

Metadata

Catalog

Rule

Engine

Persistent

State

information

Virtualization

Policy

Management

Execution

Engine

Execution

Control

Server

Side

Workflow

Messaging

System

Scheduling

OGF-22

digital preservation
Digital Preservation
  • Preservation community is defining the rules need to assert trustworthiness of a digital repository
    • RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist.

http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf

  • Defined 105 rules that are being implemented in iRODS

OGF-22

rlg nara assessment
RLG/NARA Assessment
  • Example TRAC assessment criteria

OGF-22

classes of assessment criteria
Classes of Assessment Criteria
  • Collection properties
    • List properties of associated name spaces
    • Verify properties
    • Compare properties with assertions
  • Collection operations
    • Transform file formats
    • Migrate data
    • Generate audit trails
  • Structured information
    • Parse audit trails to generate compliance reports
    • Apply templates to extract information
    • Apply templates to format state information

OGF-22

which comes first
Which Comes First?
  • Specification of required provenance metadata
    • PREMIS - defines metadata that should be maintained about events associated with record
    • Definition of the procedures left to each preservation environment
  • Specification of required management policies
    • Define explicitly the management procedures
    • Derive the required state information needed to track outcomes
    • Implies provenance metadata is defined by management policies
    • Observe this leads to multiple classes of preservation metadata associated with each preserved name space

OGF-22

persistent state information
Persistent State Information
  • User name space
    • Identity of archivists
    • Qualifications of archivists
  • Record (file) name space
    • Provenance metadata
    • Transformative migrations
    • Chain of custody (storage locations)
    • Integrity
    • Representation information (OAIS)
  • Storage resource name space
    • Archival properties
    • Error rates

OGF-22

persistent state information1
Persistent State Information
  • Representation information for preservation environment
  • Rule name space
    • Management policies that control operations within preservation environment
    • Versions of rules
    • Verification criteria
  • Micro-service name space
    • Management procedures that quantify operations on records
    • Versions of micro-services
    • Verification criteria
  • Persistent State name space
    • State information created by each version of a micro-service

OGF-22

preservation requirements1
Preservation Requirements
  • What are your required preservation management policies?
  • What are your required preservation processes?
  • What are your required preservation assessment criteria?
  • What preservation systems are you using, and how can the preservation systems interoperate?
  • Can a set of records be migrated from your preservation environment into another system while maintaining authenticity, integrity, and chain of custody?

OGF-22

theory of digital preservation
Theory of Digital Preservation
  • Given the set of preservation policies
  • Given the set of preservation procedures
  • Given the set of persistent state information
  • Does the system have demonstrable closure and consistency properties?
    • Is the required persistent state information generated that is needed to make assertions about trustworthiness, authenticity, integrity?
    • Can assertions be made about the set of preservation procedures that have been applied to the records (no missing steps)?
    • Do the applied preservation procedures enforce all preservation policies?

OGF-22

irods application
iRODS Application
  • NSF - SDCI grant “Adaptive Middleware for Community Shared Collections”
    • iRODS development, SRB maintenance
  • NARA - Transcontinental Persistent Archive Prototype
    • Trusted repository assessment criteria
  • NSF - Ocean Research Interactive Observatory Network (ORION)
    • Real-time sensor data stream management
  • NSF - Temporal Dynamics of Learning Center data grid
    • Management of Institution Research Board approval

OGF-22

irods development status
iRODS Development Status
  • Current release is version 1.0
    • January 23, 2008
    • http://irods.sdsc.edu
  • International collaborations
    • SHAMAN - University of Liverpool
      • Sustaining Heritage Access through Multivalent ArchiviNg
    • CASPAR
      • Representation information, TRAC assessment criteria
    • UK e-Science data grid
    • IN2P3 (Lyon, France) data grid migration
    • DSpace policy management integration
    • Fedora user middleware integration
    • LStore distributed metadata catalog integration

OGF-22

planned development
Planned Development
  • In progress:
    • GSI support
    • Audit trails - mechanisms to record and track iRODS persistent state changes
    • Structured information interface based on mounted collection driver (tar file)
    • GUI Browser (AJAX)
    • Driver for HPSS
    • Porting to additional versions of Unix/Linux (Ubuntu completed)
  • Planned
    • Time-limited sessions via a one-way hash authentication
    • Python Client library
    • Driver for SAM-QFS
    • Porting to Windows
    • Support for MySQL as the metadata catalog
    • MCAT to ICAT migration tools
    • Extensible Metadata including Databases Access Interface
    • Zones/Federation
    • Cheshire / Multivalent Browser micro-service

OGF-22

for more information
For More Information

Reagan W. Moore

San Diego Supercomputer Center

moore@sdsc.edu

http://www.sdsc.edu/srb/

http://irods.sdsc.edu/

OGF-22