1 / 15

Preservation DataStores - Storage Assist for Preservation Environments

Preservation DataStores - Storage Assist for Preservation Environments. Presenter: Simona Cohen Haifa Research Lab. Team: Simona Cohen, Michael Factor, Kalman Meth, Dalit Naor Leeat Ramati, Petra Reshef, Julian Satran, Yaron Wolfsthal. What is Preservation?.

Download Presentation

Preservation DataStores - Storage Assist for Preservation Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation DataStores -Storage Assistfor Preservation Environments Presenter: Simona Cohen Haifa Research Lab Team: Simona Cohen, Michael Factor, Kalman Meth, Dalit Naor Leeat Ramati, Petra Reshef, Julian Satran, Yaron Wolfsthal

  2. What is Preservation? • Challenge: preserve large amounts of heterogeneous data for long periods of time (tens if nothundreds of years) • Want to preserve the Information and not only Bits • Preservation of Information implies continuing Understandability and Usability • Preservation of Information is hard and requires vigilance • Changes in technologies • Changes in users (Designated Community) • The amount of digital preservation data can increase fast and become very large in the future • NARA projected that in year 2010, they will have 10,000 TB of data to be preserved forever, in 2020 - 230,000 TB, and in 2022 - 350,000 TB • Along with that comes large metadata that needs to be added to the raw data in order to interpret it

  3. Healthcare OSHA requires employers to keep records of both medical and other employees who are exposed to toxic substances and harmful agents. Employers must maintain these records for 30 years Medical records should be preserved for the life of the individual and beyond Finance Rule 17a-4 requires broker-dealers to retain account record information for six years. The six-year period begins either at the time the account is closed or when the information is replaced or updated Life insurance policies has to be kept for life of policy plus 6-10 years X-rays are often stored for periods of75 years The retention requirement for the [medical] records of minors varied from 20 to 43 years of age Aerospace Pharma Petroleum Scientific and Cultural Aircraft designs records have to be retained for the lifetime of each aircraft (30+ years) Oil-field data is used over life of field (50+ years) Pharma needs off-line electronic data storage for50 to 100 years or longer Satellite data is kept for ever We would like to keep Libraries and Art data for ever Is Preservation Needed?

  4. Preservation Approaches • Museum approach • Content and rendering devices are preserved in their original state and maintained operational. • Does not allow re-interpretation of the data, requires maintenance of lots of software/hardware • Best example: ability to print documents • Emulation approach • Keep the content in its original form • Adapt the rendering device by emulating it to up-to date software and computers • UVC (Universal Virtual Computer) approach, pioneered by Raymond Lorie from IBM Almaden • Reduces the problem to that of preserving the UVC platform • Migration approach • Migrate key characteristics content • Preserve characteristics ensuring its identity and integrity • May introduce noise • Descriptive approach • Preserve description enabling its reproduction (e.g. artistic data, scores) • Do not preserve content or its rendering device

  5. What is Preservation DataStores? • Storage assist for preservation environments • Supports Open Archival Information System Reference Model • ISO Archiving standards (ISO:14721:2002) • The storage component of CASPAR (http://www.casparpreserves.eu) • Generic - agnostic to the type of application, type of stored data, or the physical layer (disk, tape, …) • Scaleable • Offloading functionality to the storage layer • Decrease the probability of data loss • Simplify the applications • Provide improved performance and robustness • Based on Object Storage • Supports the various preservation approaches • Originated and developed in IBM Haifa Research Lab, Israel

  6. The CASPAR Framework Preservation DataStores

  7. OAIS Functional Model Preservation DataStores

  8. Bit Preservation vs. Logical Preservation • Bit preservation – ability to restore the bits in the presence of • storage media degradation, storage media obsolescence, environmental catastrophes like fire, flooding, etc. • Products exist and well tested – copy services, refreshment, error correcting codes modules • Logical preservation - preserving the understandability and usability of the data in the future • current technologies for computer hardware and software may not exist anymore, and the users of the data may be not born yet. • Technology is still in research phase • Preservation DataStores concentrates on supporting logical preservation

  9. Preservation on Tapes vs. Preservation on Disks • Individual disk drives provide: • random access • sub second performance for 50 Megabytes • not reliable and tend to deteriorate approximately every 3 years • Tapes provide: • serial access • transfer time is 10 times slower than that of disks • more reliable and their expected lifetime is 3-10 times higher than that of disks • consume 25 times less power than disks • Less cooling cost • Tapes are much more cost-effective than disks • Preservation DataStores supports disks and tapes where the disks are used as cache and tapes are the ultimate place of the data

  10. Preservation DataStores Functionality • Support migration • Load and execute transformations • Self-describing export format • Strong encapsulation of metadata with the data • Complex interrelated objects, context information, provenance information, formats, representation information • The association of raw with meta data is integral; otherwise, the association needs to be preserved as well (a recursive problem). • Graceful loss of data • Minimize the effect of media loss/corruption • Self-describing self contained media format • Enable the following functions in the presence of long-lived data and multiple migrations • Provenance, chain of custody • Fixity – authenticity/integrity • Future new interpretations and applications for the data

  11. Preservation DataStores Architecture Preservation Web Services OAIS-based Preservation Package Preservation WSDL Preservation application XAM API (*) XAM Package Data Consumer XAM to OSD Package WAS CE web service Higher Level API + Object Store Security Admin backend (*) The API includes application hints to denote OAIS “hot” attributes such as the attribute that links to representation information object Preservation DataStore

  12. 100 Years Archive Task Force • A task force in Storage Networking Industry Association (SNIA) • Aims to define best practices and storage standards for long term digital information retention • It conducts now a survey to collect business and IT requirements for long term data retention • 63 questions • Over 200 responses to date • Join at http://www.snia-dmf.org/100year

  13. Survey Partial Results • 36.8% wants to preserve over 100 years • Compliance requirements are the main external factors driving the requirements for long term digital archives • The applications that generate data that needs to be preserved are databases, email, custom business apps • 61% do nothing to assure logical preservation • 63% do nothing to deal with legal discovery • 83% would like to have interoperable long-term storage systems

  14. Summary • There is a need for a new storage system that is preservation aware and based on OAIS. It should offload functionality to the storage layer • Decrease the probability of data loss • Simplify the applications • Provide improved performance and robustness • Preservation DataStores are such OAIS-based preservation aware storage. • Preservation DataStores will be developed and experimented within the CASPAR EU project • Preservation DataStores is originated and developed in IBM Haifa Research Lab, Israel.

  15. Thank You!

More Related