1 / 21

Where are we with Digital Preservation?

Where are we with Digital Preservation?. Andrew Waugh Public Record Office Victoria. Where are we?. It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning Winston Churchill This talk will cover Consensus views on digital presevation

matt
Download Presentation

Where are we with Digital Preservation?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria

  2. Where are we? • It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning • Winston Churchill • This talk will cover • Consensus views on digital presevation • Open questions and future challenges

  3. What this presentation will cover • Understanding (building systems) • Storage (preserving the bit strings) • Access (preserving the meaning) • Metadata (preserving the context & authenticity) • Transfer (overcoming system senescence)

  4. Understanding • Communication requires shared terminology and concepts • Open Archival Information System (OAIS) reference model (IS 14721:2003) • http://public.ccsds.org/publications/archive/650x0b1.pdf • High level terminology very widely used, but few use the detail in the model • Does not cover preservation • Pre web and detail does not reflect actual implementations • Currently under review

  5. Trusted digital repositories • How can you be sure if an organisation (& its system) is up to holding your digital objects? • Trustworthy Repositories Audit and Certification • CRL/NARA (2007) • http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91 • Administrative focus rather than technical • high level (cannot be tested) • Based on OAIS, basis for audit checklists

  6. Audit checklists • Provide tests to see if a repository can be trusted • Drambora: DCC/DPE (2007) • Risk based, self certification • http://www.repositoryaudit.eu/

  7. Public domain digital repositories • Public domain digital repository code • D-Space (http://www.dspace.org/) • Fedora (http://www.fedora-commons.org/) • Both came out of the academic community and primarily support institutional repositories

  8. Storage – preserving the bit string • Fundamental task of digital preservation is ensuring that the bits that make up the digital objects are preserved • “Solved” problem – large scale data repositories have existed for decades and there is lots of operational experience • Archival twist: actively monitor health of stored objects using hashes

  9. Storage - future challenges • Reducing storage cost (and chance for error) • Swedish National Archives estimated in 2005 between 4 and 8 Euro per digitised page mostly in system and support costs • http://www.tape-online.net/docs/Palm_Black_Hole.pdf • Reducing risks • Administrator risk vs packaged risk • Ideal storage system • Packaged (i.e. built in administration such as the Centera) • Open so that you can trust it and replace components • CLOCKSS • Uses redundant copies at participating institutions to ensure preservation (LOCKSS) • http://www.clockss.org/clockss/Home

  10. Access – preserving the meaning • What do you do when you no longer have an application to open the data files? • Current approach is either • Do nothing now with eventual migration • Normalisation upon accession • Future approach might be emulation

  11. Migration • Save what you capture now and convert to new formats as required • Web harvesting (studies show web sites are mostly safe formats – HTML, XML, jpeg, gif, etc) • Formats (and software) proving surprisingly resilient

  12. Normalisation • Convert upon accession to small number of long term preservation formats • E.g. PDF/A (PROV), ODF (NAA) • Immediate cost upon accession, but expected lower long term management cost • Criteria for good LTPF (Library of Congress) • http://www.digitalpreservation.gov/formats/intro/intro.shtml

  13. Challenges • What is it? Tools to determine file formats • Pronom – repository of format descriptions and DROID (format classifier) http://www.nationalarchives.gov.uk/pronom/ • JHOVE (Harvard) classifier and simple validation http://hul.harvard.edu/jhove/ • How accurate is the conversion? • Is it a valid file according to the standard?

  14. Metadata is better data • Metadata is information about the bit string • What it is (semantic) • What it is (technical) • How it relates to other digital objects • What is its history? • How is it to be managed? • Unfortunately, lots and lots of large metadata standards

  15. Metadata standards • For an excellent summary of metadata standards see the Metadata chapter in the DCC Digital Curation Manual • http://www.dcc.ac.uk/resource/curation-manual/chapters/metadata/metadata.pdf

  16. Digital preservation metadata • Data Dictionary for Preservation Metadata (PREMIS) • little descriptive information and nothing format specific • http://www.loc.gov/standards/premis/ • ISO 23081 (Metadata for records) • National Archives Australia Recordkeeping Metadata Standard • http://www.naa.gov.au/Images/rkms_pt1_2_tcm2-1036.pdf

  17. Future challenges • Too many competing standards • Which do I implement? • Too many elements • Increases cost of standard development and software implementation • Few elements ever used • Too expensive and too hard to capture metadata

  18. TransferOvercoming system senescence • Digital objects have a much longer life than the systems that hold them • Move objects to digital repositories where they can be properly managed • Move them from one digital repository to its replacement • Storage is so cheap that holders may be tempted to keep digital objects (until it is too late)

  19. Future challenges • Current systems are not designed around the assumption that digital objects must be relocated • AIHT, Conceptual Issues from Practical Tests, Clay Shirky, D-Lib Magazine, Vol 11 No 12, December 2005, http://www.dlib.org/dlib/december05/shirky/12shirky.html • ADRI-UN/CEFACT work on a standard to transfer custody of digital records

  20. More information • If I have whetted your appetite... • PADI Annotated bibliography of digital preservation (http://www.nla.gov.au/padi/) • D-Lib Magazine (http://www.dlib.org/)

  21. Final thoughts • We know about compasses, and we have some charts, but there are a lot of rocks out there… We are a long way from satellite navigation • What about small/medium archives… personal archives? • Are photographs better digital or as negatives? • http://www.wilhelm-research.com/

More Related