1 / 22

Technology Support for ESSSS

Technology Support for ESSSS. Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding. Progress, Issues, and Challenges.

connor
Download Presentation

Technology Support for ESSSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technology Support for ESSSS Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding Progress, Issues, and Challenges ESSSS Digital Archive Workshop February 4, 2012

  2. Turning Pages on Paper to Digital Images • Digitizing in the field involves many compromises compared to what can be done in more controlled settings • Access to archives may be of limited duration • Arbitrary and political • Materials deteriorating rapidly • Practices related to physical preservation tend to be minimal • Must be light, fast, and expensive

  3. Achieve best results possible • Maximize quality and consistency • Handheld digital cameras • Rapid advancement in capabilities • Early images down at lower resolutions compared with what is possible today • Fixed camera stands • Consistency in orientation and framing • Organization of Images (folders / image names)

  4. Image Standards • TIFF: Currently regarded as best image format for archiving images • RAW: Native proprietary format of a camera • JPEG: Compressed images for display on the Web • Data lost during compression: non-reversible • VU system creates multiple sizes of JPEG images • JPEG2000 • Lossless compression method • Not well supported on the Web

  5. Bringing Images to the Web • Take advantage of infrastructure developed at by the Vanderbilt University Library to manage images • Digital Library framework: • Presentation and functionality created in Perl-based interface • Data and Metadata stored in MySQL relational tables • ODBC connectivity between presentation layer and MySQL • Microsoft Windows Server/IIS for Web server • Images reside on digital storage provided by the Vanderbilt University Library

  6. Digital Preservation • Disaster Recovery • Ability to restore files in the case of any hardware, software, or human Error • Digital Preservation • Commitment and processes in place to preserve digital information for the very long term • Multiple replications • Migration of data into future formats as current standards become obsolete

  7. Building structure through Metadata • Metadata structure based on Dublin Core • Volume-level descriptive metadata • Courtney Campbell designed metadata structure and is analyzing volumes to populate metadata for each volume • EXIF Data extracted from images into the individual records for each page • Page-level structure • Supports ability to select volumes and browse page images

  8. Demonstration • Image management environment • Interface • Metadata • Page Images

  9. Turning Pages into Data • The contents of the page images contain valuable data • Page images can be read by humans but do not support essential features: search, computer analysis, etc. • Full value of these collections can be realized through transcription

  10. Challenges in transcription • Page characteristics • Hand written by many different hands • Many names and numbers • Spanish language • Varying contrast • Many defects: water damage, insects, etc

  11. Human transcription • Scholars that work with pages of interest can create transcriptions manually • Optical character recognition? • Highly accurate for typescript • Not effective for handwritten manuscripts

  12. Crowdsourcing • Find ways to have large numbers of persons create transcript snippets • Google uses crowdsourcing to improve transcripts for Google Books project.

  13. Google ReCAPTCHA: • “Digitizing books one word at a time” • Each transaction transcribes one or two words • Each word is transcribed many times • Results compared to determine correct version

  14. Google ReCAPTCHA

  15. Crowdsourcing to Transcribe ESSSS • Scholars contribute any transcriptions created as they work with any given set of pages • Students assigned to create transcriptions • Language, history, LIS • Collaboration with some organization with ReCAPTCHA like infrastructure

More Related