1 / 13

JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing

JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing. Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007. Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University. JHOVE2 project.

nievesd
Download Presentation

JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JHOVE2A Next-Generation Architecture forFormat-Aware Preservation Processing Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007 Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University

  2. JHOVE2 project • Two year NDIIPP-funded collaborative project to develop “next generation” architecture for format-aware preservation processing • Harvard University • Stephen Abrams, Gary McGath, Robin Wendler • Portico • Evan Owens, John Meyer, Sheila Morrissey • Stanford University • Tom Cramer, Richard Anderson, Hannah Frost, Rachel Gollub, Nancy Hoebelheinrich, Keith Johnson • Open source • Educational Community License (ECL) • SourceForge

  3. JHOVE2 project goals • Refactor the existing architecture • Rectify known inefficiencies and idiosyncrasies • Simplify the process of integration • Encourage third-party extensions • Provide enhancements • Separate identification from validation • Standardized error handling • Standardized handling of validation profiles • Standardized reporting using METS, with XSL transform • More sophisticated data model • Arbitrary processing modules

  4. JHOVE2 project goals • Develop modules • Signature-based identification using DROID • Validation and characterization • Symbolic display of selected binary formats • API-level editing capability • Policy-based assessment

  5. Data model • Implicit assumption in JHOVE • 1 object = 1 file = 1 format • But what about… • TIFF with embedded ICC profile and XMP metadata • 1 object = 1 file = 3 formats • JPEG 2000 JPX fragmentation • 1 object = n files = 1 format • ESRI Shapefile • 1 object = 3 files = 3 formats • JHOVE2 will support processing of complex aggregate objects and nested formatted bit streams • 1 object = n files = m formats

  6. Common “backplane” • Outer loop is an iteration over digital objects • Inner loop of processes applied against each object, passing a common memory structure while (has-another-object) { while (has-another-process) { process (object, state); } }

  7. Validation • There is a useful distinction between well-formedness, validity, renderability, and usability • Well-formedness and validity are “bright line” determinations relative to a specification • Renderability is a “bright line” determination relative to a specific rendering tool • Usability is a “fuzzy” determination relative to local policies and heuristics

  8. Policy-based assessment • Evaluate objects based on prior characterization and locally-defined policy rules and heuristics, for example: • Risk of technological obsolescence • Risk of transformative loss • Codify assessment methodologies and best practice recommendations • Develop a formal language in which to express policy rules • Implement a rules engine

  9. Format support • Audio AIFF, WAVE • Color ICC • Document PDF • GIS Shapefile • Image GIF, JPEG, JPEG 2000, TIFF • Text ASCII, HTML, SGML, UTF-8, XML

  10. Schedule • 6 months of community outreach, requirements gathering, and design • 6 months implementation of core APIs and the engine • 1 year implementation of modules • Continual prototyping and re-factoring

  11. Questions (for you)? • Do you care about the open source license (ECL)? • Do you care about the distribution platform (SourceForge)? • Do you have functional requirements or use cases? • How do you use JHOVE today? • What needs doesn’t it meet? • What types of policy assessments do you perform? • How do you quantify risk? • What is your underlying assessment model? • Are you aware of existing expression languages and engines for rules-based assessment?

  12. Questions (for you)? • What can we do to facilitate integration into existing (or planned) systems and workflows? • What can we do to facilitate third-party development and extension? • What help would you need to implement your own modules? • Would you be interested in a co-development arrangement with the JHOVE2 project? • Do you have interesting test files that you are willing to share?

  13. Questions (from you)?

More Related