100 likes | 106 Views
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information. Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate Students: Sang Song and Muluwork Geremew Institute for Advanced Computer Studies
E N D
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate Students: Sang Song and Muluwork Geremew Institute for Advanced Computer Studies University of Maryland, College Park
Research Objectives • Development of tools and technologies for: • Automated Distributed Ingestion – flexible platform for Producer-Archive Interactions • Management of Preservation Processes – Monitoring, Integrity Auditing, and Preservation Services. • Evaluation and demonstration of tools on widely different collections.
Recent Major Accomplishments • ACE (Auditing Control Environment): a policy-driven software environment to continually verify the integrity of an archive’s holdings. • FOCUS – a scalable, and secure registry for persistent information and services applied to formats. • Substantial enhancements to PAWN – Producer-Archive Workflow Network software platform.
Client ACE-IMS ACE-AM 3rd Party Auditor ACE – Overview Hash (obj) obj Integrity Token
Basic Ideas • Integrity auditing service that can interoperate with any archiving architecture. • Active (periodic) and user-triggered auditing. • Time-stamped certificates that enable the verification of the integrity of the object throughout its lifetime – auditable record of every transformation. • Cost effective, scalable, and based on rigorous techniques.
FOCUS: FOrmat CUration Service • Maintains persistent information on digital formats, services, and applications to access and manipulate them. • Accessible either • Directly through LDAP • Or indirectly through SOAP (Web Services) Web Service Agent Format Registry SOAP LDAP
Answer to Question #1 • Biggest Surprise – None but a number of small surprises such as: • OAIS may be too general to provide a useful framework?? • Significant differences for automated ingestions regarding the push and pull models. • Not at all clear which communities will be able to handle or afford wide area distributed infrastructure.
Answer to Question#2 • What have you done that you never thought you would? • Confuse my graduate students!! Trying to explain: authenticity of an archive’s holdings (the object is what it claims to be!!); ensuring access to data after hundreds of years without having any idea about how the technology will evolve over the next ten or twenty years!
Answer to Question #3 • How is the area of your project changed? A Lot and Not Much: • Hardware (processor and storage) is changing very quickly – as expected. • Web technologies are more mature and more widely used – as expected. • Grid technologies did not progress as much as had been expected! • Very little work regarding preservation services.
Conclusion • Three major pieces of software: ACE, FOCUS, and PAWN. • Interoperable with any archiving architecture • Scalable, secure, and platform independent • Continued development of preservation services.