Digital Preservation: Introduction and Case Study - PowerPoint PPT Presentation

digital preservation introduction and case study n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Digital Preservation: Introduction and Case Study PowerPoint Presentation
Download Presentation
Digital Preservation: Introduction and Case Study

play fullscreen
1 / 58
Digital Preservation: Introduction and Case Study
362 Views
Download Presentation
mora
Download Presentation

Digital Preservation: Introduction and Case Study

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Digital Preservation: Introduction and Case Study Sarah Shreeves Tim Donohue April 21, 2008

  2. Outline • What is digital preservation? • What is digital preservation management? • What standards and tools are available? • Case study of IDEALS

  3. What is digital preservation? • Digitization? • Using “archival” CDs or DVDs? • Collecting electronic records? • Building an institutional repository? • Running back-ups?

  4. What is digital preservation?(short definition) Digital preservation combines policies, strategies and actions that ensure access to digital content over time. • Preservation and Reformatting Section, ALCTS, ALA http://www.ala.org/ala/alcts/newslinks/digipres/index.cfm

  5. What is digital preservation?(medium definition) Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. - Preservation and Reformatting Section, ALCTS, ALAhttp://www.ala.org/ala/alcts/newslinks/digipres/index.cfm

  6. The goal of digital preservation is the accurate rendering of authenticated content over time.

  7. Accurate Rendering – 3 Components (1) • Viability • Making sure that the bitstream (the 1’s and 0’s that make up a digital file) is unchanged and readable from the storage media • “bit level preservation” • Running basic checks to ensure bitstream is the same as what was received (via checksums or other methods), virus checks, refreshing storage media, back ups, etc. • Focus is not on renderability or functionality of the content but the viability of the file itself – the carrier of the content

  8. Accurate Rendering – 3 Components (2) • Renderability • Making sure that the bitstream can be opened by a computer and can be read by a human • Ensuring that a MS Word document can be opened by a software program and can be understood by a human • Focus is on the content and the file itself BUT! • Does not assume that all functionality, style, and appearance of the content and file is there.

  9. Accurate Rendering – 3 Components (3) • Understandability • Ensuring that the digital content can be understood by a human • Ensuring that functionality of content and file is intact • Ensuing that a MS Excel spreadsheet with macros embedded can be understood and USED by a human • Focus is on the content and the file itself

  10. Viability is the ‘easiest’ to do and is the base level for preservation programs Renderability is often the compromise for some types of (proprietary, usually) files Understandability requires a more in depth understanding of both content and carrier and the context in which these were created Most digital preservation programs are doing some combination of these three. Preserve to what level?

  11. The Open Access Information System (OAIS) Reference Model http://public.ccsds.org/publications/archive/650x0b1.pdf

  12. Key concepts from OAIS • Not meant to just apply to digital content • Came out of NASA • Preservation does not happen in isolation • Understanding between the producers of content and those preserving it (designated communities) • Understanding between the users of content and those preserving it (designated communities) • Submission and Access packages are not necessarily the same as the Archive package • Administration and preservation planning are key

  13. Attributes of a Trusted Digital Repository • Came out of RLG / OCLC / NARA • Focused on how libraries, archives, and other cultural organizations should position themselves as ‘trusted digital repositories’ • Key ‘white paper’ that spawned the TRAC guidelines http://www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf

  14. TRAC: Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) v. 1.0 • Currently hosted by the Center for Research Libraries: http://www.crl.edu/PDF/trac.pdf • Does not require OAIS compliance, but draws very heavily from the OAIS Reference Model • Looks at the organizational commitment, resources, policies, procedures, and technical infrastructure • Has a minimum set of requirements

  15. TRAC 1 - Say what you are going to do around digital preservation 2 - Do what you say 3 - Prove that you are doing what you say

  16. So how do we start a digital preservation management program? (1) • Organizational Framework - • The policies, procedures, practices, people—the elements that any programmatic area needs to thrive, but specialized to address digital preservation requirements. It addresses this key development question: • What are the requirements and parameters for the organization's digital preservation program? http://www.icpsr.umich.edu/dpm/dpm-eng/eng_index.html

  17. So how do we start a digital preservation management program? (2) • Technological Infrastructure - • Consists of the requisite equipment, software, hardware, a secure environment, and skills to establish and maintain the digital preservation program. It anticipates and responds wisely to changing technology. It addresses this key development question: • How will the organization meet defined digital preservation requirements?

  18. So how do we start a digital preservation management program? (3) • Resource Framework - • Addresses the requisite startup, ongoing, and contingency funding to enable and sustain the digital preservation program. It addresses this key development question: • What resources will it take to develop and maintain the organization’s digital preservation program?

  19. The Digital Preservation Platform From the Cornell Digital Preservation Tutorial

  20. The Developmental Stages • Acknowledge – understanding that digital preservation management is a local concern • Act – Initiating digital preservation management projects • Consolidate – Segueing from projects to programs • Institutionalize – Incorporating the larger environment and rationalizing programs • Externalize – Embracing inter-institutional cooperation

  21. Where does the University Library Fit?

  22. Getting Started with Digital Preservation • Start with a discrete, manageable collection of content • Start with materials that you have a mandate to preserve - whether by tradition or by project scope • Start with the understanding that it will be an ongoing, evolving process • Doesn’t have to be done in-house: can outsource

  23. Digital Preservation:Standards,Best Practices& Utilities

  24. Emulation v. Migration • Two primary strategies for DP • Emulation • Keep files as-is • Build an emulator which “behaves like” original hardware, operating system, software necessary to read the file • Migration • Transform files to modern formats, so they can be read on modern hardware/software

  25. PREMIS Overview • Preservation Metadata: Implementation Strategies • Is: • Common data model for organizing/thinking about preservation metadata • Guidance for local implementations • Is not: • Out-of-box solution • All metadata necessary (only “core” preservation) • http://www.loc.gov/standards/premis/

  26. PREMIS Data Model Intellectual Entities RightsStatements Agents Objects Events

  27. PREMIS Data Model • Object • Digital file(s) being preserved • “Represent” the intellectual entity • e.g. hamlet.doc • Intellectual Entity • Set of content considered an “intellectual unit” • Has 1+ digital representation(s) • e.g. “Hamlet” by Shakespeare (a play)

  28. PREMIS Data Model • Events • Actions on an Object • Help document provenance • e.g. ingest event, migration event • Agents • Person, Org or software performing an event • e.g. Tim Donohue, U of Illinois Library,IDEALS • Rights Stmt • Agent X grants Permission Y to the repository on Object Z

  29. PREMIS – Simple Example Tracking Provenance of a Migration is source of represents hamlet.doc (original) hamlet.pdf (migrated) through event IDEALS System (Agent) performed by Doc2PDF Migration Event

  30. Preservation Tools & Utilities • Format Registries • Technical info about specific file formats • Esp. info necessary to view a file of that format • www.formatregistry.org • http://www.nationalarchives.gov.uk/PRONOM/

  31. Preservation Tools & Utilities • Format Identification / Validation Tools • Identifies file formats based on PRONOM registry (and based on internal “signature” of file) • Identifies file formats (based on internal “signature”) • Validates known file formats http://droid.sourceforge.net/ http://hul.harvard.edu/jhove/

  32. Preservation Tools & Utilities • Migration / Emulation Tools • Detects file formats • Converts/migrates to “open” formats • Computer hardware emulator (16-bit CPU) • Runs Windows 3.0, 16-bit Linux, MS DOS (any old programs which run on those) http://xena.sourceforge.net/ http://dioscuri.sourceforge.net/

  33. What is IDEALS? Institutional Repository for the scholarship and research in digital form of the faculty, students, and staff as well as material that reflects the intellectual environment of the University of Illinois at Urbana-Champaign. Joint project of CITES and the University Library and supported by the Office of the Provost. http://ideals.uiuc.edu/

  34. What is an institutional repository? A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. Clifford Lynch, Executive Director Coalition for Networked Information

  35. What type of materials? Also audio and video

  36. In the beginning:Promises, promises • Can we really commit to preserving everything? • What does it really mean to preserve this stuff? • What kind of staff expertise do we need? • What kind of resources do we need? • What kind of technical infrastructure do we need?

  37. Getting our act together • Got our Preservation Librarian involved • Training and self education • Cornell’s Digital Preservation Management Workshop and Online Tutorial http://www.icpsr.umich.edu/dpm/dpm-eng/eng_index.html • Understanding Open Archival Information System conceptual model • Trustworthy Repositories Audit Checklist

  38. Takeaways: • You do need to be explicit about what you will do and what you won’t do. • You don’t have to preserve everything if you say you aren’t. • Digital preservation management is not about the technology.

  39. Establish pilot policy http://www.ideals.uiuc.edu/about/IDEALSPreservationSupport.html • Repository Support • Allowable Downtime • Content Preservation • Allowable Data Loss • Data Back-Up • Format & Data Integrity • Disaster Preparations Policy is realistic and feasible for where we were.

  40. Getting our act together, cont. Backup tapes stored next to the server! Not Really Our Server Room! Photo by Sylvar. Used under a Creative Commons 2.0 Attribution license. http://www.flickr.com/photos/sylvar/

  41. Looking forward to production:Digital Preservation White Paper http://hdl.handle.net/2142/135 • Laid out for the Library and CITES administration what supporting a digital preservation management program would mean: • Commitment on the part of both organizations • Resources in terms of funding and staff are specifically allocated • Processes, policies, and the institutional commitment are documented and as transparent as possible. • The technical infrastructure is developed using community standards. • Commitment of resources for planning and community standards building.

  42. IDEALS Preservation Policy:Organizational Framework and Commitment https://services.ideals.uiuc.edu/wiki/bin/view/IDEALS/IDEALSDigitalPreservationPolicy • Mandate • Agreement that we are making with our user community • Role of the University Library in preserving access to material • Objectives • Persistent access • Trusted service for our user community • Scope • Research and scholarship • Who’s responsible? • CITES and the Library

  43. What resources do we need? • Funding • Currently from the Office of the Provost • Designated staff • Built into our job descriptions Technology infrastructure • Move from Library to CITES • Better environment • Better security • Distributes support for the tech infrastructure

  44. Risks and Challenges • Technological Change • Sustainability • Partnership between the University Library and CITES

  45. Moving towardsactionable policies and procedures

  46. Putting the Plan into Practice • Policy should lead Technology (not vice-versa) • “Support” Policies will change • Reassessment necessary • Document decisions… • and reasons! • “Best Practices” – no reason to go it alone

  47. What will IDEALS “support”? • What have others done? • Michigan’s Deep Blue – Preservation & Format Policy • Florida Digital Archive – Policies & Format “Action Plans” • Library of Congress – Sustainability of Formats • Australian Partnership for Sustainable Repositories (APSR) • What Support Policies are we missing? • Digital Preservation Support Policy • Format Support “Matrix” • Format Recommendations

  48. Digital Preservation Support • Format-based Categories of Support High Confidence • Full Support (including migration) Medium Confidence • No migration promised Low Confidence • “Bit-level” support only (size ≠ weight)

  49. Format Support Matrix • Compilation of “known” formats • Concentration on textual formats Microsoft Office OpenOffice.org, HTML Proprietary Open OpenOffice.org Microsoft Office, HTML Limited Adoption Widely Adopted Microsoft Office Adobe PDF, HTML Limited Support Widely Supported MS Powerpoint (w/ Audio or Video) MS Powerpoint Embedded Content / DRM Nothing Embedded No/Lossless Compression JPEG TIFF, JPEG 2000 Lossy Compression