1 / 39

OGF21 Preservation Environments Research Group

OGF21 Preservation Environments Research Group. Organizers: Richard Marciano (marciano@sdsc.edu) Reagan Moore ( moore@sdsc.edu ) Goals: Analyze capabilities required by a preservation environment Define rule-based preservation environment - iRODS

kay
Download Presentation

OGF21 Preservation Environments Research Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OGF21 Preservation Environments Research Group • Organizers: Richard Marciano (marciano@sdsc.edu) Reagan Moore (moore@sdsc.edu) • Goals: • Analyze capabilities required by a preservation environment • Define rule-based preservation environment - iRODS • NARA Electronic Records Archive capability requirements • RLG/NARA assessment criteria for a Trusted Digital Repository • Demonstrate creation of a preservation environment based on data grid technology • Demonstrate creation of preservation rules controlling a preservation environment • Analyze capabilities that can be based on grid technology • iRODS rule-oriented data system • Participants: • CASPAR - Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval • SHAMAN - Sustaining Heritage Access through Multivalent Archiving • NCRIS - National Collaborative Research Infrastructure Strategy • PLANETS - Preservation and Long-term Access through Networked Services • MIT - DSpace digital library • SDSC - NARA Transcontinental Persistent Archive Prototype • U Md - Producer Archive Workflow Network • UK Digital Curation Centre OGF-21 Software Forum

  2. Intellectual Property Policy • I acknowledge that participation in OGF21 is subject to the OGF Intellectual Property Policy. • Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session, • any OGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any OGF mailing list, including any working group or research group list, or any other list functioning under OGF auspices, • the GFD Editor or the GWD process • Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. OGF-21 Software Forum

  3. Preservation Requirements • Authenticity • Maintain information about provenance of data • Assertions made about the file at the time of ingestion • Integrity • Maintain information about the management of the data • Assertions made by the archivist • Access controls, audit trails, checksums, replication, synchronization, federation • Infrastructure independence • Management of properties of records independently of choice of storage system • Scalability • Management of large collections (billions of records, petabytes of data, thousands of attributes) OGF-21 Software Forum

  4. Data Grid Evolution • Data grids • Infrastructure independence • Data sharing through data and trust virtualization • SRB - Storage Resource Broker • Rule-based data grids • Automation of management policies Management virtualization • Open source software • iRODS - integrated Rule-Oriented Data System OGF-21 Software Forum

  5. Data Management Applications • Data grids • Share data - organize distributed data as a collection • Digital libraries • Publish data - support browsing and discovery • Persistent archives • Preserve data - manage technology evolution • Real-time sensor systems • Federate sensor data - integrate across sensor streams • Workflow systems • Analyze data - integrate client- & server-side workflows OGF-21 Software Forum

  6. Generic Infrastructure • Data grids organize distributed data into shared collections • Persistent name spaces for files, users, storage • Collection attributes • Provenance, descriptive, system metadata • Data grids manage heterogeneous storage systems • Standard operations across file systems, tape archives, object ring buffers • Enable technology evolution • At the point in time when new technology is available, both the old and new systems can be integrated OGF-21 Software Forum

  7. Using a Data Grid – in Abstract Data delivered Ask for data • The data is found and returned • Where & how details are hidden Data Grid • User asks for data from the data grid OGF-21 Software Forum

  8. Using a Data Grid - Details DB iRODS Server Metadata Catalog iRODS Server • User asks for data • Data request goes to iRODS Server • Server looks up information in catalog • Catalog tells which iRODS server has data • 1st server asks 2nd for data • The 2nd iRODS server applies rules OGF-21 Software Forum

  9. Extremely Successful • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, … • Astronomy Data grid • Bio-informatics Digital library • Earth Sciences Data grid • Ecology Collection • Education Persistent archive • Engineering Digital library • Environmental science Data grid • High energy physics Data grid • Humanities Data Grid • Medical community Digital library • Oceanography Real time sensor data, persistent archive • Seismology Digital library, real-time sensor data • Goal has been generic infrastructure for distributed data OGF-21 Software Forum

  10. OGF-21 Software Forum

  11. Requirements Driving Evolution • Observe that as the size of the shared collections grow, the administrative tasks can become onerous. • Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment • Need to minimize labor support through automation of administrative functions • File ingestion tasks • Verification of desired collection properties • Integrity checks and replica management OGF-21 Software Forum

  12. Requirements Driving Evolution • Observe that each community has unique management policies • User administration • File retention & deletion • Time-dependent access controls • Data distribution and replication • File update (versions, backups) • Descriptive metadata OGF-21 Software Forum

  13. Requirements Driving Evolution • Socialization of collections • The creators of the collection have specific properties that they assert the collection will possess • Completeness • Authoritative sources • Authenticity • The users of the collection have their own criteria for the properties they expect • Socialization is the mapping from creator assertions to user expectations OGF-21 Software Forum

  14. Data Grid Mechanisms • Essential components needed for synergism implemented in SRB • Infrastructure independence • Data and trust virtualization • Components needed for specific management policies and processes implemented in iRODS • Map policies to rules that control all processes • Map processes to standard micro-services OGF-21 Software Forum

  15. Data Management iRODS - integrated Rule-Oriented Data System OGF-21 Software Forum

  16. Rules • Rule classes • System enforced rules • Administrator controlled rules • User defined rules • Rule execution • Atomic rules - executed on each operation invoked by a client • Deferred rules - executed at a future time • Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity) OGF-21 Software Forum

  17. iRODS Rule Syntax • Event | Condition | Action-set | Recovery-set • Event - triggered by operation or queued rule • Condition - composed of tests on any attributes in the persistent state information • Action-set - composed from both micro-services and rules • Recovery-set - used to ensure transaction semantics and consistent state information • Executed by a rule engine installed at each storage location - server side workflows OGF-21 Software Forum

  18. Micro-Services • Challenge is that storage systems do not provide desired processes • Have “minimal” set of standard operations that are performed at the storage system • Have actions required by clients such as replication, metadata extraction • Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes. OGF-21 Software Forum

  19. Data Virtualization Access Interface Map from the actions requested bythe access method to a standard set of micro-services. The standard micro-services are mapped to the operations supported bythe storage system Standard Micro-services Data Grid Standard Operations Storage Protocol Storage System OGF-21 Software Forum

  20. integrated Rule-Oriented Data System Service Manager Consistency Check Module Rule Engine Client Interface Admin Interface Rule Invoker Rule Modifier Module Config Modifier Module Metadata Modifier Module Rule Base Current State Consistency Check Module Consistency Check Module Confs Resources Metadata-based Services Resource-based Services Metadata Persistent Repository Micro Service Modules Micro Service Modules OGF-21 Software Forum

  21. Distributed Management System Data Transport Metadata Catalog Rule Engine Persistent State information Virtualization Policy Management Execution Engine Execution Control Server Side Workflow Messaging System Scheduling OGF-21 Software Forum

  22. Micro-service Classes • Test • System • Workflow control • Client • iCAT catalog • User level invoked by “irule” • Image manipulation OGF-21 Software Forum

  23. Digital Preservation • Preservation community is defining the rules need to assert trustworthiness of a digital repository • RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist. http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf • Defined 105 rules that are being implemented in iRODS OGF-21 Software Forum

  24. RLG/NARA Assessment • Example TRAC assessment criteria OGF-21 Software Forum

  25. Classes of Assessment Criteria • Collection properties • List properties of associated name spaces • Verify properties • Compare properties with assertions • Collection operations • Transform file formats • Migrate data • Generate audit trails • Structured information • Parse audit trails to generate compliance reports • Apply templates to extract information • Apply templates to format state information OGF-21 Software Forum

  26. iRODS Development • NSF - SDCI grant “Adaptive Middleware for Community Shared Collections” • iRODS development, SRB maintenance • NARA - Transcontinental Persistent Archive Prototype • Trusted repository assessment criteria • NSF - Ocean Research Interactive Observatory Network (ORION) • Real-time sensor data stream management • NSF - Temporal Dynamics of Learning Center data grid • Management of Institution Research Board approval OGF-21 Software Forum

  27. iRODS Development Status • Current release is version 0.9.2 • June 2007 • Production release will be version 1.0 • Fall quarter 2007 • International collaborations • SHAMAN - University of Liverpool • Sustaining Heritage Access through Multivalent ArchiviNg • UK e-Science data grid • IN2P3 in Lyon, France • DSpace policy management OGF-21 Software Forum

  28. Planned Development • GSI support • Time-limited sessions via a one-way hash authentication • Python Client library • GUI Browser (AJAX in development) • Driver for HPSS (in development) • Driver for SAM-QFS • Porting to additional versions of Unix/Linux • Porting to Windows • Support for MySQL as the metadata catalog • API support packages based on existing mounted collection driver • MCAT to ICAT migration tools • Extensible Metadata including Databases Access Interface • Zones/Federation • Auditing - mechanisms to record and track iRODS persistent state changes OGF-21 Software Forum

  29. Preservation Requirements • What are your required preservation management policies? • What are your required preservation processes? • What are your required preservation assessment criteria? • What preservation systems are you using, and how can the preservation systems interoperate? • Can a set of records be migrated from your preservation environment into another system while maintaining authenticity, integrity, and chain of custody? OGF-21 Software Forum

  30. For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/ OGF-21 Software Forum

  31. iRODS Rules Rules for administration acCreateUser||msiCreateUser##acCreateDefaultCollections##msiCommit|msiRollback##msiRollback##nop acVacuum(*arg1)||delayExec(msiVacuum,*arg1)|nop acCreateDefaultCollections||acCreateUserZoneCollections|nop acCreateUserZoneCollections||acCreateCollByAdmin(/$rodsZoneProxy/home,$otherUserName)##acCreateCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName)|nop##nop acCreateCollByAdmin(*parColl,*childColl)||msiCreateCollByAdmin(*parColl,*childColl)|nop acDeleteUser||acDeleteDefaultCollections##msiDeleteUser##msiCommit|msiRollback##msiRollback##nop acDeleteDefaultCollections||acDeleteUserZoneCollections|nop acDeleteUserZoneCollections||acDeleteCollByAdmin(/$rodsZoneProxy/home,$otherUserName)##acDeleteCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName)|nop##nop acDeleteCollByAdmin(*parColl,*childColl)||msiDeleteCollByAdmin(*parColl,*childColl)|nop Rule for pre-processing on storage use acSetRescSchemeForCreate||msiSetDefaultResc(demoResc,noForce)##msiSetRescSortScheme(random)##msiSetRescSortScheme(byRescType)|nop##nop##nop Rule for pre-processing on data reads acPreprocForDataObjOpen||msiSortDataObj(random)|nop Rule for post processing data writes acPostProcForPut||nop|nop acPostProcForCopy||nop|nop Rule for setting number of threads for parallel I/O acSetNumThreads||msiSetNumThreads(default,default,default)|nop Rule for data deletion policy setting acDataDeletePolicy||nop|nop OGF-21 Software Forum

  32. iRODS Demonstration • Demonstrate generic put command • ilsresc • ils -l nvo • iput -R demoResc ../src/icd.c nvo • ils -l nvo • Revise put command to automatically create a replica • cp core.irb.1 ../../../server/config/reConfigs/core.irb • ils -l nvo • iput -R demoResc ../src/ipwd.c nvo • ils -l nvo • Illustrate execution of a user-defined rule • icd • iput carl.ged foo1 • irule -vF ruleInp3 OGF-21 Software Forum

  33. iRODS Demonstration • # iRODS Rule Base - core.irb • # Each rule consists of four parts separated by | • # The four parts are: name, conditions, function calls, and recovery. • # The calls and recoveries can be multiple ones, separated by ##. • # For each rule, the number recovery calls should match the calls; • # for example, if the 2nd call fails, the 2nd recover call is made. • # • acPreprocForDataObjOpen||msiSortDataObj(random)|nop • acSetRescSchemeForCreate||msiSetDefaultResc(demo2Resc,noForce)##msiSetRescSortScheme(random)##msiSetRescSortScheme(byRescType)|nop##nop##nop • acDataDeletePolicy||nop|nop • acPostProcForPut||nop|nop OGF-21 Software Forum

  34. iRODS Demonstration # iRODS Rule Base - core.irb # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recover call is made. # acPreprocForDataObjOpen||msiSortDataObj(random)|nop acSetRescSchemeForCreate||msiSetDefaultResc(demo2Resc,noForce)##msiSetRescSortScheme(random)##msiSetRescSortScheme(byRescType)|nop##nop##nop acDataDeletePolicy||nop|nop acPostProcForPut||nop|nop OGF-21 Software Forum

  35. iRODS Demonstration # iRODS Rule Base # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number of recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recovery call is made. # acPreprocForDataObjOpen||msiSortDataObj(random)|nop acSetRescSchemeForCreate||msiSetDefaultResc(demo2Resc,noForce)##msiSetRescSortScheme(random)##msiSetRescSortScheme(byRescType)|nop##nop##nop acDataDeletePolicy||nop|nop acPostProcForPut|$objPath like /tempZone/home/rods/nvo/*|msiSysReplDataObj(nvoReplResc)|nop acPostProcForPut||nop|nop OGF-21 Software Forum

  36. iRODS Demonstration • # This is an example of an input for the irule command. • # This first input line is the rule body • # The second input line is the input parameter in the format of label=value. • # Multiple inputs can be specified using the '%' character as the separator. • # The third input line is the output description. For multiple outputs use '%’ • myTestRule||msiDataObjOpen(*A,*S_FD) ##msiDataObjCreate(*B,null,*D1_FD) ##msiDataObjRead(*S_FD,100,*R1_BUF) ##msiDataObjWrite(*D1_FD,*R1_BUF,*W1_LEN) ##msiDataObjClose(*D1_FD,*junk2) ##msiDataObjCreate(*C,null,*D2_FD) ##msiDataObjRead(*S_FD,50000,*R2_BUF) ##msiDataObjWrite(*D2_FD,*R2_BUF,*W2_LEN) ##msiDataObjClose(*D2_FD,*junk3) ##msiDataObjClose(*S_FD,*junk4) • *A=/tempZone/home/rods/foo1%*B=/tempZone/home/rods/foo2%*C=/tempZone/home/rods/foo3 • *R1_BUF%*W2_LEN%*A OGF-21 Software Forum

  37. iRODS Demonstration • Add and query metadata • imeta add -d foo1 speed 100 "mph" • imeta add -d foo1 length 200 "ft" • imeta add -d foo2 speed 300 "mph" • imeta add -d foo3 length 400 "ft" • imeta ls -d foo1 • imeta qu -d speed = 100 • imeta qu -d speed ">=" 100 • imeta qu -d length ">=" 100 • Copy Metadata • imeta ls -d foo1 • imeta ls -d foo3 • imeta cp -d -d foo1 foo3 • imeta ls -d foo3 OGF-21 Software Forum

  38. iRODS Demonstration • Copy metadata attributes of a file to a collection • imeta ls -C /tempZone/home/rods • imeta cp -d -C foo1 /tempZone/home/rods • imeta ls -C /tempZone/home/rods OGF-21 Software Forum

  39. More Information moore@sdsc.edu SRB: http://www.sdsc.edu/srb iRODS: http://irods.sdsc.edu/ OGF-21 Software Forum

More Related