1 / 17

Working with metadata in digital archives

Working with metadata in digital archives. Erpanet Metadata in Digital Preservation Marburg, 3-5 September 2003 Bill Roberts bill.roberts@tessella.com Tessella Support Services plc 3 Vineyard Chambers Abingdon OX14 3PX United Kingdom www.tessella.com. Metadata functions. Edit. Import.

Download Presentation

Working with metadata in digital archives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with metadata in digital archives Erpanet Metadata in Digital Preservation Marburg, 3-5 September 2003 Bill Roberts bill.roberts@tessella.com Tessella Support Services plc3 Vineyard ChambersAbingdon OX14 3PX United Kingdom www.tessella.com

  2. Metadata functions Edit Import Search Collect View Store Export

  3. Collect metadata (1) • Some must be manual – assist user, prevent mistakes • Avoid duplication – record hierarchies • automation in user environment (business process, workflow etc.) • automatic analysis of file properties • processing history (virus checking results etc.)

  4. Collect metadata (2) • UK National Archives Digital Archive – Stellent “OutsideIn” • analyses file to determine type • could also form part of approach to extract metadata from content

  5. Collect metadata (3) • Pfizer Central Electronic Archive • Small metadata set • Automatic collection of metadata • Software agents on user servers • Possible to do more • Improve ease of use • Improve accuracy • Pfizer aiming to simplify provenance metadata

  6. Import metadata (1) • Transfer format – XML • link metadata to files during transfer • virus checking, file format analysis etc. • Maintain loose coupling between components of system – agreed interfaces

  7. Import metadata (2) • Efficiency – large transfers • XML can be expensive to process • speed • memory – DOM can be 20 times larger than XML file

  8. Storage - requirements • don’t lose it! • maintain links between metadata, records and files • find what you are looking for • retrieve

  9. Storage approaches • encapsulation vs. ease of access • volume of data • speed of searching vs. speed of import/export • typically metadata in database and files on file server

  10. The National Archives (UK) Digital Archive approach • Relational database for metadata, file server for computer files • Metadata stored as XML documents in database • A few key elements stored in tables and indexed (unique identifier, PROCAT reference) • Links between records, files, accessions, metadata managed in database • Subset of metadata identified as searchable – values extracted into text based index • File contents not currently searchable

  11. UK Digital Archive (2) • record and file metadata kept separately • flexible relationship between records and computer files • Unlimited depth of record hierarchy (records can contain sub-records) • metadata imported/exported as XML so easier/quicker to store as XML • designed for ease of extension to metadata (disadvantage of extracting metadata into database tables) • <GSMElement name=“Title”> rather than <Title>

  12. Alternatives • VERS approach: metadata and content files encapsulated together within XML file • +ve: record is self-contained • +ve: well-suited to use of digital signatures on both metadata and content • -ve: more denormalisation required for access • -ve: complexity of adding to or editing metadata • -ve: if file is needed for more than one record, must be duplicated

  13. Interoperability • Not much experience in practice so far • XML helps - but not much! • Likely to be similar but not identical schemas • Different implementations of same schema • Short term: ad hoc mapping between schemas for specific systems • Longer term: various initiatives, but standardisation and semantics-based approaches are difficult

  14. Extending or changing the schema • Schema may (will!) change in future • No “one size fits all” approach • TNA plans for extensions to core metadata according to file type and according to function • Version control

  15. Preservation metadata • Maintain ability to understand and authentically reproduce content files • PRONOM system – separate database for file formats/accessibility • KB preservation layer model approach • Technology watch

  16. Authentication/Integrity • Digital signatures – has something changed? (also simpler hashing algorithms) • Digital signatures – who signed it? • Control access • Audit logs

  17. Conclusions • Digital preservation is still a young discipline, so “best” approach not always clear • Do something! Learn from experience • Design for flexibility/replaceability – records must outlive any implementation

More Related