1 / 17

Classifying Historical Documents

This article discusses the challenges of classifying historical documents and presents a solution using faceted classification. It explores the use of metadata, semantic networks, and ergonomic user interfaces to improve the classification process. The ARCHON classification system is also introduced.

marcusb
Download Presentation

Classifying Historical Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying Historical Documents Maria Theodoridou, Martin Doerr Institute of Computer Science Foundation for Research and Technology - Hellas Heraklion - Crete

  2. The classification problem • automatic transcription not possible • inaccurate OCR software • interpretation dependent • manual keyword assignment • time consuming process • keywords not necessarily unique • inconsistent between users • not obvious for users in retrieval • complete classification only on parts of data base • by different aspects • at different times • by different people

  3. Archival standards • Dublin Core METAdata Elements • EAD Encoded Archival Description Document Type Definition • ISAD (G) General International Standard Archival Description

  4. Task Analysis • Archivist maintains the inventory • Organizes fonds and subfonds (manageable units and provenance) • assigns identification numbers to ensure integrity • documents provenance, chronology of collective units. • Handling of the material is hazardous to health and to the material. • Replace access by electronic surrogate • Preserve electronic copies for preservation of contents • Researchers are granted access to study parts • focused studies - resulting in publications • primary information partially overlaps between studies.

  5. Idea of Operation • Scanned images replace access to originals. • Researchers should leave core documentation on partial contents • Ergonomic classification user interface (minutes per document) • Thesauri assist classification

  6. Classification structure • Classification by semantic net of metadata. • Analysis of entities of the archive material • Classification of documents by: • (1) Date and type of administrational act • (2) described activities • syntactic structure to describe multiple and nested activities • Notion of identity of persons, places, objects • Coherent classification on instance and concept level

  7. Historical ArchivesModelling collections derived_from structural corresponding ArchivalDescription ArchivalType belongs_to Fonds subfonds_of Subfonds copy_of FilmArchive CurrentFonds HistoricalFonds part_of copy_of_ part classification generalisation Current Subfonds Historical Subfonds attribute

  8. Historical ArchivesModelling collections and objects derived_from (d) structural (s) corresponding (c) ArchivalDescription ArchivalType Conceptual ArchivalType Physical ArchivalType belongs_to (s) Fonds subfonds_of (s) Subfonds Item copy_of (d) FilmArchive CurrentFonds HistoricalFonds classification generalisation part_of (c) attribute copy_of_ part (d) originates_from (c) kept_in (c) UnitOfDescription Current Subfonds Historical Subfonds

  9. Historical ArchivesModelling objects vs. contents derived_from (d) structural (s) corresponding (c) ArchivalDescription Conceptual ArchivalType Physical ArchivalType ArchivalType Item UnitOfDescription Series ItemUnit Document Picture contains (s) corresponds_to (c) Book BookPage classification corresponds_to (c) Photograph generalisation attribute contains_first (s) contains (s) Shot Sheet File SheetPage Microfilm copy_of (d) contains_second (s) contains (s)

  10. Historical ArchivesModelling processes Occurence history DescriptionType EventType result derived_from structural corresponding ArchivalDescription ActionType ArchivalType PhysicalArchivalType ConceptualArchivalType ElectronicDocumentType ElectronicProcessingType Fonds UnitOfDescription Item ElectronicDocument ElectronicProcessing product ItemUnit classification corresponds_to generalisation Editing Scanning SheetPage Document Picture attribute Transcription produced_from Translation produced_from ScannedPage

  11. Historical ArchivesThe Facets • For levels: • The act of documentation • The act of administration • The targeted social activity • Other related activities and items • Questions that need to be answered: • Who? Persons and organizations • Where? Places • When? Time • What? Objects • How? Activities and actions

  12. Historical ArchivesFaceted classification by concepts Facet Polyhierarchies Instances (metadata) Manuscripts’ Digital Library

  13. Historical ArchivesFaceted classification by concepts- An example Persons and Organisations Places Facet Polyhierarchies Individuals live in Houses Instances (metadata) is Martin’s Martin house nr.415 Manuscripts’ Digital Library

  14. Historical ArchivesThe ARCHON classification Item has type: Document Type has publication date: Date has creation date: Date has description: Activity has activity type: Activity Type has actor type: Actor Type has object type: Object Type has place type: Place Type happened at: Date has actor: Actor has type: Actor Type has place: Place has type: Place Type has object: Object has type: Object Type has related activity: Activity

  15. Historical ArchivesThe ARCHON classification • Where: • Activity Type = marriage, selling, condemnation, tax regulation, statistics.. • Actor Type = Pasha, judge, farmer,…., but also: Witness, • Place Type= City, village, monastry, prefecture…. • Object Type= house, payment, privilege….

  16. Είδος Facet ARXONFacet ARXONHierarchy Περιγραφή Έγγραφο Δραστηριότητα Δράστης Αντικείμενο Τόπος Χρόνος Διοικητικές Πράξεις Φορέας Κινητό Κτίσματα Μουσουλμανικός Μήνας Περιεχόμενο Πρόσωπο Δικαστικές Περιπτώσεις Ακίνητο Φυσικός Τόπος Χριστιανικός Μήνας Ρόλος στην υπόθεση Άλλα Μη Υλικό Διοικητικός Τόπος Παρουσία στην υπόθεση classification Εκδότης/Παραλήπτης generalization attribute

  17. Classifying Historical DocumentsConclusions • Faceted classification by concepts • has high precision • maintains identity of concepts and not keywords • creates a base of domain knowledge • preserves the syntactic structure of the expression used for the classification

More Related