1 / 23

Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager

Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter: @prwheatley http://openplanetsfoundation.org/blogs/paul. Summary. Some digital preservation challenges and solutions Not exhaustive

Download Presentation

Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter: @prwheatley http://openplanetsfoundation.org/blogs/paul

  2. Summary • Some digital preservation challenges and solutions • Not exhaustive • Illustrate with some real examples • Summarise with some practical steps for digital preservation • Taking a community approach to digital preservation • SPRUCE Project • How to get involved • Where to get help

  3. Keeping the bits 001011001011101010101010101010111001010101010010010010110101010101111010100100010111101010101010110101110010101001010100101010100101010101001010111010111

  4. Digital data is fragile Courtesy of State and University Library, Denmark

  5. Digital preservation storage: keeping the bits • Media decay • Media becomes partially or completely unreadable • Media obsolescence • Without the respective hardware to read the (hand held) media, it becomes inaccessible • Practical issues • Inserting lots of discs into a drive is costly Images courtesy of The British Library

  6. Bit storage recommendations • Don’t fall for media longevity claims from vendors! They are missing the point! • Accept that media decays, media formats will change, and any media will become inaccessible in the medium term • Rather than putting your data in a dark archive and trusting it will survive for a long period... • Manage it closely, refresh to new media frequently, chose media that is easy to manage • Choose media that is easy to access (server storage, cloud, external hard drives) • Make at least 3 copies of all data, keep copies in different geographical locations • Frequently check the condition of your data

  7. Verifiable Manifests (Checksums!) • Allow you to easily check the condition of your digital stuff Are any of your digital files damaged? Are any of your digital files missing? • Single most useful digital preservation activity • Generate manifests as early as possible • Frequently re-check them over time • Mend content when necessary • LoCBagit specification and Bagger tool

  8. Dependence on software 1010101010111101000101010100010010100101110100101010101001001001010101001000001010101010111110100010101010001001010010111010010101010100100100101010100100000101010101011111010001010101000100101001011101001010101010010010010101010010000010101010101101010101111010001010101000100101001011101001010101010010010010101010010000010101010101111101000101010100010010100101110100101010101001001001010101001000001010101010111110100010101010001001010010111010010101010100100… SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 200x392 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2…

  9. When it goes wrong…

  10. Migration, Emulation and all that... • Migrate content from an obsolete format to a more modern usable format • Emulate the original computing environment and run the obsolete software originally used • Words of caution: • Is software obsolescence a really critical risk for our digital data? • The debate continues... International Council of Archives Congress 2012: • Michael Carden, National Archives Australia: NAA migrates all content • Oliver Morley, UK National Archives: “digital formats have standardized” • Blogged by Inge Angevarre: http://www.ncdd.nl/blog/?p=2786 • The hard part is the quality assurance of the results. Was anything lost or damaged in the process?

  11. Stuff happens! • Whenever a digital collection is moved, processed, curated or altered in any way.... things can go wrong! • Network dropouts at critical times • Disks get full, subsequent data copied there is lost • Software bugs lead to unexpected results • Human error leads to all sorts of issues • Stuff happens a lot more at scale!

  12. Digitisation post processing corruption Images courtesy of The British Library

  13. TIFF to JPEG2000 migration corruption Images courtesy of The British Library

  14. Technology can be imperfect! Format specification ambiguity and corresponding tool bugs JPEG2000s can be missing vital source resolution JPEG 2000 • For more on JPEG2000 format and tool risks see: http://wiki.opf-labs.org/display/TR/JP2 Images courtesy of The British Library

  15. Assume nothing, validate everything • Only process or alter digital content when it is absolutely necessary • Double check everything • Make no assumptions

  16. First steps in practical digital preservation • Prompt check in – have you got what you thought you would receive? • Check expected files are present, open a random selection to verify expected quality • Request replacements from supplier promptly • Create a verifiable manifest • Create a top down manifest file that lists each digital object in your collection as a relative filename and a checksum • Library of Congress Bagit specification and tools will also do a good job here • Make at least 3 copies. Protect the bits • Keep a copy on easily accessible media • Backup to tape or more disk. Keep copies in different geographical locations to avoid catastrophic disaster. Cloud storage is also an option. • Frequently inspect the condition of your data • Revisit the collection, recalculate your manifests and verify content has not been lost • Do a test recovery of your backups to ensure they are working effectively! • Record the existence of each of your collections in a digital items register • Record: What it is, who is the responsible owner, where it is, who owns it, and who can access it. • Assume nothing, validate everything! • Double check any processes in the lifecycle that move or alter your digital content • Built in checks can be flawed, a second opinion is much more trustworthy

  17. SPRUCE Project Sustainable Preservation Using Community Engagement • JISC funded • 2 years in length (until Nov 2013) • £250k funding http://wiki.opf-labs.org/display/SPR

  18. Some observations • Lack of focus on the real needs of digital preservation practitioners • Insufficient collaboration + coordination • Duplication of effort

  19. The SPRUCE Mashup: Identify and Solve concrete problems • 3 day workshop for ~30 people • Practitioners bring along digital collections • We identify preservation challenges • Pair up practitioners with technical experts • Apply existing open source tools to solve the problems • In doing so, we exchange knowledge about digital preservation • Develop a supportive community Glasgow Mashup April 2012

  20. What questions do practitioners want answered? • What is this digital collection? • What risks are associated with this digital collection? • Separate collection content from temporary/other files. • Identify and weed duplicate or similar files. • Is the metadata consistent with the content? • Are all the pages present in each issue? • Are all digitised pages in focus? • Are any files damaged? • Are the files compliant with a particular profile? • See the results here: http://bit.ly/spruce-results

  21. Make it sustainable • Work with practitioners to develop a business case for their work • Make small funding awards to further develop and embed the work begun in the mashups York Mashup September 2011

  22. Online collaboration • Sharing requirements • Sharing experiences: what tools worked well, what approaches should be avoided • Building on existing tools, rather than re-inventing the wheel • Libraries + Information Science question and answer site: • http://libraries.stackexchange.com/ • More recommended collaborative activities: • http://bit.ly/spruce-collaborate

  23. Thanks for listening! Any quesions? Paul Wheatley SPRUCE Project Manager University of Leeds Twitter: @prwheatley Email: p.r.wheatley@leeds.ac.uk http://openplanetsfoundation.org/blogs/paul

More Related