1 / 28

IAC Digital Preservation Committee ________________________________________________

IAC Digital Preservation Committee ________________________________________________. 10 April 2007 Yale University Library. 10 April 2007. IAC Digital Preservation Committee ________________________________________________. Outline Charge & members. Accomplishments Policy Best practices

Download Presentation

IAC Digital Preservation Committee ________________________________________________

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. IACDigital Preservation Committee________________________________________________ 10 April 2007 Yale University Library 10 April 2007

  2. IAC Digital Preservation Committee________________________________________________ • Outline • Charge & members. • Accomplishments • Policy • Best practices • What’s next 10 April 2007

  3. IAC Digital Preservation Committee________________________________________________ The DPC is an Integrated Access Council committee charged to: • Develop a digital preservation program by evaluating, compiling, documenting and articulating policies, procedures, best practices and systems in order to establish a digital preservation infrastructure at Yale University Library. • Work from a base of clearly articulated policies, then focus on preservation program planning and, finally, make recommendations for program implementation through digital preservation projects, initiatives, and system development. 10 April 2007

  4. IAC Digital Preservation Committee________________________________________________ • Members: • Rebekah Irwin, BRBL • David Gewirtz, ILTS/AM&T • Kevin Glick, MSS/A • Audrey Novak, ILTS (Co-Chair) • Bobbie Pilette, Preservation (Co-Chair) • E.C. Schroeder, BRBL • Former members: • Ann Green, ILTS/ITS, Co-Chair • Nicole Bouche, Beinecke Library • Gretchen Gano, Social Science Library 10 April 2007

  5. IAC Digital Preservation Committee________________________________________________ Accomplishments: • Published a Digital Preservation policy that establishes a mission statement and promulgates preservation policies for institutional standards governing the quality, type and source of digital assets to be archived in the repository (revised Feb 2007). • Published best practices addressing: Local practice for implementing PREMIS; Preservation Strategies; Persistent Identifiers; Fixity (checksums, message digest and digital signatures); Format Registries; Encoding & Transmission of Structured Metadata; and Care and Handling of Originals. • Modeled an organizational structure for the ongoing coordination and management of digital preservation. This structure recognizes that the responsibility for the creation and administration of digital preservation services at Yale is shared by three services: Metadata, Repository and Preservation. 10 April 2007

  6. Digital Preservation Best Practices________________________________________________ Digital preservation does not have established and vetted standards. Issues and problems associated with preserving digital resources are numerous, complex and dynamic. DPC best practices are an effort to parse the larger digital preservation problem space into discrete issues and to identify processes, activities and/or methodologies that are emerging as standards. This work by the DPC is by no means finished. More work is required to establish additional best practices for the myriad of related topics and to keep these recommendations current with the latest thinking and research in this field. Note, too, that although informed by research, most of these best practices are untested in production preservation archives. 10 April 2007

  7. Best Practice – Care & Handling of Physical Collections ________________________________________________ “White paper” to advise Library staff on how to protect originals during digital conversion. Available on the web site for easy access • Sections include: • Assessment of Physical Collections • Criteria for Selecting Proper Scanning Equipment • Preparing the Scanning Surface • Specifications for Scanning • Handling Procedures for Library Materials 10 April 2007

  8. Care & Handling of Physical Collections, continued ________________________________________________ • Assessment of Physical Collections • Important to include Preservation Department; contact Tara Kennedy, Field Service Librarian • List of questions to ask before scanning an object • Criteria for Selecting Proper Scanning Equipment • Describes available equipment and appropriate use • Indicates which materials can be scanned safely on each type of equipment • Preparing the Scanning Surface • How to clean the scanning surface (flatbed) 10 April 2007

  9. Care & Handling of Physical Collections, continued __________________________________________ • Specifications for Scanning • Illumination levels and types, • Proper supports for bound materials, • Environmental considerations (dust, temperature, relative humidity) • Handling Procedures for Library Materials • Mostly “common sense” reminders, but also specific suggestions, e.g. oversized materials • Includes paper-based, multimedia (sound, film, historical, optical), objects 10 April 2007

  10. Best Practice - Fixity________________________________________________ • Fixity, in preservation terms, means that the digital object has not been changed between two points in time or events. • Fixity checks such as checksums, message digests and digital signatures are used to verify a digital object’s fixity. • Information created by these fixity checks, provides evidence for the integrity and authenticity of the digital objects and are essential to enabling trust. 10 April 2007

  11. Fixity, continued________________________________________________ • Fixity checks are all used in the same basic way. A value is initially generated and saved. Then, in response to an event (e.g., ingest) or over time, it is recomputed and compared to the original to ensure the object (file or bitstream) has not changed. • All fixity checks are not the same. • Checksums are the simplest and least reliable method. They are typically used in error-detection to find accidental problems in transmission and storage. They do not account for such changes as the re-ordering of bytes or changes that cancel one another out. 10 April 2007

  12. Fixity, continued________________________________________________ • Message digests are more secure. They are computed by applying a more complex algorithm to the file of any length to produce a unique, short, uniform length character string. Change one pixel or one note in the file and the message digests will be completely different.(Ex: 93326bff6636655dcd6abff18ed2de997). • Digital signatures combine message digests with encryption. The message digest is created and then encrypted using a private/public key pair. 10 April 2007

  13. Fixity, continued________________________________________________ Current best practice for digital preservation repositories: • The creation of message digests using two algorithms, MD5 and SHA-1. • These are implemented in the widely used JHOVE format identification, validation and characterization application (e.g, in the Rescue Repository before and after ingest). 10 April 2007

  14. Best Practice – Format Registries and Tools ________________________________________________ What is a Format? • A technical specification describing a standard encoding or representation of digital content stored in a file. • A file format extension such as “.jpg” indicates the encoded content is a digital image. • File encoding standards are used by programs to read the encoded information and present useable content of the file to a user’s monitor or another output device. 10 April 2007

  15. Format Registries________________________________________________ What is a Format Registry? • A database that stores information about the technical specifications of an electronic file’s format. • Format registries record file format changes over time so that files remain readable in the face of technological obsolescence to a format standard. How does a format registry work? • Global Digital Format Registry 10 April 2007

  16. File Format Tools ________________________________________________ File format identification & validation tools answer two questions: • How can we tell a file's type? • If we know its type, how can we be sure that it conforms to its format specification so that we know it is still useable? 10 April 2007

  17. File Format Tools __________________________________________ • JHOVE:  A  widely used tool file type identification, validation and characterization tool developed by Harvard Univ. Library & JSTOR. • Handles many format types, (e.g., AIFF, ASCII, BYTESTREAM, GIF, HTML, JPEG, JPEG2000, PDF, TIFF, UTF8, WAV, XML.) • Is configurable in many respects, including the option to: select full validation or “short” mode, in which only the header’s signature is analyzed; the ability to include or exclude message digests in the output; and to choose from various output formats, including plain text and XML. • Because JHOVE does both file type identification as well as validation, it is currently Yale University Library’s format-related tool of choice. 10 April 2007

  18. File Format Tools _______________________________________________ Other tools: • DROID (Digital Record Object Identification): A file type identification tool developed by the Digital Preservation Department of the National Archives of the United Kingdom, to perform automated batch file format identification, using the PRONOM registry . • National Library of New Zealand Preservation Metadata Extract Tool: A tool that extracts metadata from file headers. This Java tool uses “adapters” to extract metadata from filetypes including: MS Word, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, TIFF, JPEG, WAV, MP3, HTML, PDF,GIF, and BMP.  This data is output in a standard XML format. 10 April 2007

  19. Best Practice – Persistent Identifiers __________________________________________ • A persistent identifier (PI) is a unique name (identifier) associated with an internet resource that provides a link to the content and persists over changes of server location, ownership, and other state conditions. • A location (e.g., a given URL) is not a persistent identifier if the content moves to another location.The principal problem addressed by PIs is: Broken links to internet resources, i.e., “the HTTP 404 Error – Document not found.” • Persistent identification is not possible without an associated service. It is the service that supports persistence. The identifier takes you to the service, the service resolves to the object. • Optimally a PI should be created and assigned when the digital object is created. 10 April 2007

  20. Best Practice – Persistent Identifiers __________________________________________ • Several technologies are available to create persistent identifiers such as: • CNRI Handle System – A generic system for assigning names to objects and resolving them. Key is the Global Handle Registry which manages the namespace of all handle prefixes. • DOI(Digital Object Identifier) - An application of the CNRI Handle System that associates intellectual property to structured metadata. A typical use of a DOI is to give a scientific paper or article a unique identifying number that can be resolved through the DOI resolver or the CNRI global handle resolver. • PURL – A Persistent Uniform Resource Locator is a URL that describes an intermediate (and more persistent) location which when retrieved results in a standard HTTP redirect to the current location of the resource.

  21. Persistent Identifiers - Handle Server________________________________________________ • The implementation of a CNRI handle server at YUL is tightly coupled to the implementation of the VITAL/Fedora Digital Repository Service. • Digital objects within the Digital Repository Service will have handles such as: http://moonpie:8085/fedora/get/hdl:10079.2F-2103288706 (opaque), or http://hdl.rutgers.edu/1782.1/SPCOLSMAPS.Map.b1849 (semantic) • A handle server, like a web server, requires ongoing system administration, e.g., when resources are moved. • Continuing research in the assignment of handles to resources in other YUL repositories such as the Rescue Repository, Image Commons (DL/Insight), etc. / 10 April 2007

  22. Best Practice - Maintenance Strategies________________________________________________ A1. Clear Allocation of Responsibilities A2. Provision of the appropriate technical infrastructure A3. Establishment & implementation of a plan for system maintenance, support and replacement A4. Establishment & implementation of plan for regular transfer of records to new storage media A5. Adherence to appropriate storage and handling conditions for storage media A6. Ensuring redundancy and regular backup A7. Establishment of system security A8. Disaster planning 10 April 2007

  23. B1. Use of standards B2. Data extraction and structuring B3. Encapsulation B4. Restricting the range of formats to be managed B5. Technology preservation B6. Reliance on backward compatibility B7. Migration B8. Software re-engineering B9. Viewers and migration at the point of access B10. Emulation B11. Non-digital approaches B12. Data restoration Best Practice - Preservation Strategies________________________________________________ 10 April 2007

  24. Best Practice - PREMIS__________________________________________ PREservation Metadata: Implementation Strategies Yale Working Group Matthew Beacom, Metadata Librarian, Catalog and Metadata Services (Co-chair) Rebekah Irwin, Catalog Librarian for Digital Projects, Beinecke Library (Co-chair) Youn Noh, Digital Resources Catalog Librarian, Catalog and Metadata Services George Ouellette, Senior Programmer Analyst, Library ILTS David Walls, Preservation Librarian, Library Preservation Dept Yale Advisory Group Reed Beaman, Associate Director for Biodiversity Informatics, Peabody Museum Lee Faulkner, Media Director, Digital Media Center for the Arts David Gewirtz, Project Manager, Library Projects, ITS Kevin Glick, Electronic Records Archivist, Manuscripts and Archives Edward Kairiss, Director, Instructional Computing Instructional Technology, ITS Daniel Lee, E-Publishing/Internet Marketing Manager, Yale University Press Thomas Raich, Associate Director, Information Technology, Art Gallery 10 April 2007

  25. Best Practice - PREMIS_______________________________________________ Outcome: Develop PREMIS profiles that match specific digital collection and administrative needs Base profile (up to 6 elements): This base profile of elements would support digital preservation of a wide range of digital assets Full profile (over 200): This full profile would provide guidance to administrators of digital information assets acting as trusted custodians of material deemed to be of long-term value 10 April 2007

  26. Best Practices - Summary________________________________________________ • Most of these best practices are the outcome of current research projects. • Few are tested in production preservation repositories. • At Yale the Rescue Repository is becoming a local testbed. • Fixity: MD5 and SHA-1 message digests • JHOVE file format identification and validation • Maintenance strategies • PREMIS base profile element set. • VITAL/Fedora Digital Repository Service implementation • Persistent identifiers through the CNRI Handle System. 10 April 2007

  27. What’s Next________________________________________________ Goals: • Creation of a Transition Team to continue the work of the DPC, and most importantly, within a 6 month timeframe, create the roadmap for the implementation of the permanent management model for an ongoing digital preservation program. • The recommended structure consists of a core team representing 2FTE comprised of staff with expertise in metadata, repository and preservation services. It is modeled as a virtual Digital Curation Center (DCC). The DCC will put into practice the identified best practices and the Digital Repostiory Service (DRS) Preservation Archive. • The Transition Team will prepare a business plan for the Digital Curation Center. The business plan will identify the DCC’s: Vision, mission, goals and first year deliverables; Staffing models; Budget; and Timeline for creation. 10 April 2007

  28. IAC Digital Preservation Committee ________________________________________________ Website: http://www.library.yale.edu/iac/dpc.html 10 April 2007

More Related