130 likes | 204 Views
Explore the implementation of metadata standards in JISC digitisation projects for still images and text, emphasizing the move towards interoperability and preservation. Learn about technical details, project history, present metadata standards, and envisioned future advancements.
E N D
Metadata Toolsfor JISC Digitisation Projectsof still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton
Overview: BOPCRIS today • Move to work natively with standards • Interoperability • Preservation • Design project procedures from ground up with metadata in mind • File-naming and directory structuring • Metadata capture processes • Production workflow that automates where possible • Minimize possibility for human error / subjectivity • “Final package” of digital object that records preservation information on the “digital shelf” and aims for maximum interoperability between systems, all in one place
Overview: technical details • File-naming / directory structure • Incorporating project-specific “unique ids” • Final package (digital object) • Internally consistent “tarball” [*.TAR] • Relative path-naming conventions • METS wrapper • Extension formats for metadata: descriptive (MODS); technical (MIX); process (PREMIS) • Production workflow • Automated production of final package • Metadata recording • Dynamic input by scanner operators
History • Eighteenth Century Parliamentary Papers • Project under Phase 1 of JISC Digitization Programme • Proprietary system and data formats (Agora) • Manual input of metadata • Descriptive and Structural • Advantages and Disadvantages
History: Advantages • Proprietary system with advanced functionality: • OCR workflow • Web presentation • Highly customizable • Metadata fields specified and modified at will
History: Disadvantages • Non-standard metadata fields • No mapping to standard formats • difficulties: interoperability; metadata harvesting • Translation • Between systems, or between “use” and “archive” formats • introduces possibility of versioning issues • No scope for preservation metadata • Separation between workflow / presentation system and preservation strategy • Resulted in disparate collection of scripts and tools to manage data
Present: Metadata Standards • Bibliographic database export • File-system level • Directory structure • File-naming conventions • Scanning level • TIFF headers • Additional descriptive metadata • METS profile • Tailored to project needs • Extension formats (MODS, MIX, PREMIS) • Checksums (MD5)
Present: Metadata Origins File-naming Directory structure Bibliographic Metadata MARC21 / MODS / etc. PRECURSORS GENERATED • Scanned Images • TIFF headers • MIX • (Z39.87) • Other metadata • Process • Additional descriptive • PREMIS • Custom dmdSec OCR (Agora / ABBYY) METS • File formats • TIFF master / Derived JPEG • Flat text (TXT) & Word-co-ordinated OCR (TAR)
Present: Digital Object (“final package”) (1) ID.TAR METS XML ./ID.XML dmdSec MODS XML amdSec MIX, PREMIS XML fileSec ./master (TIFF) ./derived (JPEG) ./txt (plain text) ./idx (word-co-ordinated) structMap physical logical Master images (TIFF) ./master/ Derived images (JPEG) ./derived/ Text OCR (TXT) ./txt/ Word-co-ordinated OCR (IDX) ./idx/ (2) ID.CHECKSUM (MD5)
Future • One tool for entire process, from scanned images to METS • Tool would: • Extract technical metadata • Include descriptive metadata • Build flat-structure METS • Tool would require: • File-naming, directory-structuring conventions • Image file sources
Future: Advantages • Abstraction = standardization • All digitization projects will produce metadata in similar formats interoperability • Certain technical base-standards will be present preservation • Any centrally developed preservation or presentation systems would be able to ingest output from any project • Saves wasted effort developing similar solutions many times, when one solution can be developed once and adapted
Future: Questions… • Usefulness of such a tool? • Relevance to your project? • Problems / obstacles? • How much flexibility is necessary? • Manual input / editing? • Main points: • Abstraction, functionality, flexibility
Further information • Ed Fay, Software Developer • BOPCRIS, Hartley Library • University of Southampton • ef1@soton.ac.uk • 023 8059 3575