Working with metadata in digital archives

Working with metadata in digital archives Erpanet Metadata in Digital Preservation Marburg, 3-5 September 2003 Bill Roberts bill.roberts@tessella.com Tessella Support Services plc3 Vineyard ChambersAbingdon OX14 3PX United Kingdom www.tessella.com

Metadata functions Edit Import Search Collect View Store Export

Collect metadata (1) • Some must be manual – assist user, prevent mistakes • Avoid duplication – record hierarchies • automation in user environment (business process, workflow etc.) • automatic analysis of file properties • processing history (virus checking results etc.)

Collect metadata (2) • UK National Archives Digital Archive – Stellent “OutsideIn” • analyses file to determine type • could also form part of approach to extract metadata from content

Collect metadata (3) • Pfizer Central Electronic Archive • Small metadata set • Automatic collection of metadata • Software agents on user servers • Possible to do more • Improve ease of use • Improve accuracy • Pfizer aiming to simplify provenance metadata

Import metadata (1) • Transfer format – XML • link metadata to files during transfer • virus checking, file format analysis etc. • Maintain loose coupling between components of system – agreed interfaces

Import metadata (2) • Efficiency – large transfers • XML can be expensive to process • speed • memory – DOM can be 20 times larger than XML file

Storage - requirements • don’t lose it! • maintain links between metadata, records and files • find what you are looking for • retrieve

Storage approaches • encapsulation vs. ease of access • volume of data • speed of searching vs. speed of import/export • typically metadata in database and files on file server

The National Archives (UK) Digital Archive approach • Relational database for metadata, file server for computer files • Metadata stored as XML documents in database • A few key elements stored in tables and indexed (unique identifier, PROCAT reference) • Links between records, files, accessions, metadata managed in database • Subset of metadata identified as searchable – values extracted into text based index • File contents not currently searchable

UK Digital Archive (2) • record and file metadata kept separately • flexible relationship between records and computer files • Unlimited depth of record hierarchy (records can contain sub-records) • metadata imported/exported as XML so easier/quicker to store as XML • designed for ease of extension to metadata (disadvantage of extracting metadata into database tables) • <GSMElement name=“Title”> rather than <Title>

Alternatives • VERS approach: metadata and content files encapsulated together within XML file • +ve: record is self-contained • +ve: well-suited to use of digital signatures on both metadata and content • -ve: more denormalisation required for access • -ve: complexity of adding to or editing metadata • -ve: if file is needed for more than one record, must be duplicated

Interoperability • Not much experience in practice so far • XML helps - but not much! • Likely to be similar but not identical schemas • Different implementations of same schema • Short term: ad hoc mapping between schemas for specific systems • Longer term: various initiatives, but standardisation and semantics-based approaches are difficult

Extending or changing the schema • Schema may (will!) change in future • No “one size fits all” approach • TNA plans for extensions to core metadata according to file type and according to function • Version control

Preservation metadata • Maintain ability to understand and authentically reproduce content files • PRONOM system – separate database for file formats/accessibility • KB preservation layer model approach • Technology watch

Authentication/Integrity • Digital signatures – has something changed? (also simpler hashing algorithms) • Digital signatures – who signed it? • Control access • Audit logs

Conclusions • Digital preservation is still a young discipline, so “best” approach not always clear • Do something! Learn from experience • Design for flexibility/replaceability – records must outlive any implementation

Working with metadata in digital archives

Working with metadata in digital archives

Presentation Transcript

Working with Digital Equipment in the Classroom

Adding Metadata and Ingesting Large Born-Digital Archives with Archivematica

Washington State Digital Archives

Archives, Digital Archives and Encoded Archival Description

The PREMIS Working Group: Preservation Metadata for Digital Repositories

Wisconsin Digital Archives

FGDC Metadata Working Group

Metadata in Digital Libraries

Working with digital resources

Archives in a Digital Age

Metadata Working Group Report

Working With Digital Archives at the Harry Ransom Center

Digital data archives in the humanities

DIGITAL ARCHIVES

Working with Metadata in ArcGIS

Versioning in Digital Archives: A Workflow

Working with Metadata in ArcGIS

Working with Metadata in ArcGIS

FGDC Metadata Working Group

FGDC Metadata Working Group

Archives in a Digital Age

Wisconsin Digital Archives