1 / 32

Methods for Knowledge Management & Digital Preservation

Methods for Knowledge Management & Digital Preservation. The Theory and Practice of Digital History. Carl A. Young, M.A. in waiting 1 December 2009. Project Overview. Challenge.

jenis
Download Presentation

Methods for Knowledge Management & Digital Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methods for Knowledge Management & Digital Preservation The Theory and Practice of Digital History Carl A. Young, M.A. in waiting 1 December 2009

  2. Project Overview Challenge Resource and skill-constrained historians and archivists require efficient methods for capturing, analyzing, and sharing original artifacts. • Multi-phase project • Develop a low-cost process for digitally archiving documents • Store them in a standards-based data storage platform • Set the conditions to scale with future phases • Creating a collaborative, accessible, online digital repository fully leveraging the optionality of the digital domain. Methodology Phase I – Prototyping Phase II- Capture Phase III- Web Access Phase IV- Initial Expansion Phase V- Infinite Expansion Major Phases

  3. Phase I: Prototype Completed in November 2009, this phase established a usable, affordable methodology for project development by prototypingthe capture and conversion of an original artifact for testing and exploration purposes.

  4. Phase I: Prototype (cont.) Demonstration Original Digital Camera .JPG file format 2 MB Treatment w/Photoshop .TIFF 29 MB Adobe Conversion .pdf 278 KB Time elapsed: Photo: <1 min Treatment: ~3 min Conversion: <1min

  5. Phase I: Prototype (cont.) Process Flowchart Legend

  6. Phase II: Capture Completed in November 2009, this phase performed and documented a low-budget document capture, artifact preservation, and conversion to a distributable format where a historic text is extracted from the original document, archived, and presented to the user in both the original capture (.jpg or .tiff) and distributable (.pdf and .xml) format with an evaluation of optical character recognition (OCR) and transcription requirements.

  7. Phase II: Capture (cont.) Image Treatment • Filter • Blur • Smart Blur • Radius-100 • Threshold-100 • Quality- High • Mode- Normal • Surface Blur • Radius-100 • Threshold-25 • Surface Blur (if needed) • Radius-100 • Threshold-25 • Lens Blur • Shape - Octagon • Radius - 5 • Blade Curve - 50 • Rotation - 300 • Brightness -10 • Threshold - 75 • Noise- 3 • Distro –Uniform Select • Select • Color Range • Modify Shadows • No Invert • Modify • Expand 2 Cut File • New * • Width-1600 • Height - 2500 • Resolution- 300 • CM - RGB 16bit • * Recommend saving as a preset. Paste Flatten Clean up as needed Save As .TIFF Select Area • Image • Adjustments • Curves “Digitization” • Channel - RGB • Output-203 • Input-160

  8. Phase II: Capture (cont.) OCR and Transcription Demo OCR Time elapsed: OCR: <1 min Transcription: ~5min Transcription

  9. Transcription OCR

  10. Phase II: Capture (cont.) TEI Demo Time elapsed: Preliminary Data: ~45 min Page: ~5 min Look at UVA’s TEI How To

  11. Phase II: Capture (cont.) Methodology Flow Chart Legend

  12. Phase II: Capture (cont.) Labor Estimates Militiaman’s Guide 155 pages total, type text, fair condition 40 hours (optimal) / 5 Gbs Per Page Estimates • Photography: • ~30 sec • 2.5 Mbs @ 5Mpxl • .tiff Conversion • ~3 min • 23 Mbs • .pdf Conversion • ~1 min • 300 Kbs • OCR - ~45 sec • Error Correction/Transcription: ~5 min • TEI - ~5 min (~45 min overhead) Case Estimates • Photography: • ~1:15 • ~ 400 Mbs • .tiff Conversion • ~7:45 • 3.5 Gbs • .pdf Conversion • ~2:30 • 50 Mbs • OCR - ~2 hours • Error Correction/Transcription: ~13 hrs • TEI - ~14 hrs

  13. Equipment Baseline • Consumer-grade HP 5Mpxl digital camera ($125) • Slightly above consumer-grade PC ($1100) • 4 GB RAM • 1 GB VRAM • 500 GB, SATA HD • Dual Screens • Consumer Software ($600) • Adobe Creative Suite 3

  14. Lessons Learned • Use a Tripod/Mount • Use consistent lighting • Safely flatten pages as much as possible • Use a mounting frame • Highest Resolution available • OCR is NOT reliable • Need an efficient method for TEI

  15. Phase III: Web-Access This phase is the subject of this grant funding request. A team of professional developers will construct a suitable multi-media database for storage and access of original artifact captures, distributable .pdf versions, and XML-based data and metadata derived from the original. The team will also develop a working prototype web site to access the data. Fundamental to this phase will be data archiving and disaster recovery for the data. Successful conclusion of this phase will yield a working version 1.0 available for release and continued development.

  16. Phase III: Web-Access (cont.) Flow Chart

  17. Phase III: Web-Access (cont.) Database Development Prototype Evaluation Prototype Web Development Work Breakdown Structure Alpha Test & Mod Beta Test & Mod RC1 Documentation Disaster Recovery Testing Test & Mod v1.0 Estimated Cost: $52,000

  18. Phase III: Web-Access (cont.) Project Gantt Chart

  19. Phase IV: Initial Expansion Beyond the scope of this grant request, this phase seeks to develop partnerships and data shares across multiple institutions with similar projects in development or production. The level of participation directly influences the scale of this phase. It is anticipated that the minimal costs will be shared across participating institutions.

  20. Phase IV: Initial Expansion (cont.) Publish Methodology Find Partners Large Scale Capture Leverage v1.0 Work Breakdown Structure Update Code and Processes Conduct Lifecycle Management Review Documentation Disaster Recover Testing Estimated Cost: $8,000

  21. Phase V: Infinite Expansion Optionally, and depending on the success of the earlier phases, this phase will greatly expand collaborative efforts by potentially make this capability available to amateur and resource-constrained archivists and historians by providing a standards-based methodology and data capture technique and a collaborative platform to share the data once stored. This aspect of the final phase will be limited only by technology maintenance and scalability costs.

  22. Phase V: Infinite Expansion (cont.) Publish Updated Methodology Publish Membership Schema Open Data Models Work Breakdown Structure Leverage Current Version Release New Version(s) Conduct Lifecycle Management Review Documentation Disaster Recover Testing Estimated Cost: $82,000

  23. Summary Project Summary Grant Request / Funding Summary • 5-Phase Approach • “How-To” • Digitization • TEI • Manage the project • Sets the stage • Broad/ambitious goals and plan • Manageable pieces • Flexible optionality • Phase III support: • $51,733.33 • Prototype Validation • Database Development • Web Development • Hosting • Disaster Recovery • Phase IV and V templates • Future expansion as desired • Flexible Planning

  24. Questions

  25. CONCLUSION

  26. Dead Guy Quote Man had always assumed that he was more intelligent than dolphins because he had achieved so much... the wheel, New York, wars, and so on, whilst all the dolphins had ever done was muck about in the water having a good time. But conversely the dolphins believed themselves to be more intelligent than man for precisely the same reasons. - Douglas Adams

  27. Backup

  28. Phase I: Prototype (cont.) Image Capture Image Preservation Image Manipulation Work Breakdown Structure Database Development TEI Process Development Data Development Documentation Disaster Recovery Testing Static Web-Page Prototyping Estimated Cost: $5,000

  29. Phase I: Prototype (cont.) Gantt Chart

  30. Phase II: Capture (cont.) Image Capture TEI Work Breakdown Structure Prototype Database Input Documentation Disaster Recovery Testing Estimated Cost: $2,000

  31. Phase II: Capture (cont.) Gantt Chart

More Related