Integration of complementary archaeological sources Martin Doerr Maria Theodoridou ICS-FORTH, Heraklion, Crete, Greece Kurt Schaller Magistrat der Stadt WienGeschäftsgruppe Kultur und Wissenschaft – Stadtarchäologie, Wien, Austria
Outline • Problem statement – Working context • Objective • Approach • Technical description • Results • Conclusion, future work
Project VBI ERAT LVPAThe Internet Tracks of the Roman She-Wolf • Traditional corpora: • very high quality, difficult to maintain, difficult to search, uncorrelated to complementary resources • New Database Projects: • varying quality, overlapping contents, continuously updated, easy to search, uncorrelated between each other. • Altogether: • A conglomerate of highly interrelated archaeological sources • of overwhelming detail and volume • Ubi-erat-lupa: A European “Culture 2000” Project • An aggregation of complementary scientific databases and corpora describing finds with inscriptions and iconography of the Roman era • to create a body of unique archaeological knowledge in digital form.
VBI ERAT LVPA Objective • creation of a global index about a set of semi-autonomous sources for global access to the unified knowledge • integration of complementary information under a common ontology/schema and identification of common elements in different sources • development of an integration algorithm that converges to the best state of knowledge und continuous update • creation of a research tool for formulating queries of archaeological content to detect contextual relationships that cannot be derived from interpreting the sources in isolation
Approach • Develop a semantic network based on the CIDOC CRM model to integrate the complementary archaeological sources • Data, relevant to global querying over all contents, are extracted, transformed and stored in an RDF repository, that is incrementally updated over time. • Integration in two phases: • source schema is intellectually interpreted in terms of the CIDOC model • “non canonical” data reported to respective source • mistakes in sources removed, quality of source improved • actual data automatically transformed and stored into an RDF repository • an a posteriori data cleaning process removes as many duplicates as can be (semi-) automatically detected
refer to / refine E22 Man-Made Object E34 Inscription participate in is documented in location at E53 Place The CIDOC CRMTop-level Entities relevant for Integration E55 Types E39 Actor E41 Appellations refer to / identifie affect or / refer to E31 Document E5 Event
The CIDOC CRM – VBI-ERAT-LVPARepository Indexing CIDOC CRM Ontology expansion Thesauri extent CRM entities Objects Events Places Derived knowledge data (RDF) Background knowledge / Authorities arachne AE Sources Stone databases Name data bases Epigraphic corpora CSIR OPEL lupa CIL
Complementary archaeological sources • Stone data bases • Lupa - 7000 archaeological records, City of Vienna, Austria • Arachne - 40.000 archaeological records, Antike Plastik, Cologne • Name data bases • ONOMASTICON PROVINCIARVM EVROPAE LATINARVM (OPEL) Information about the amount and distribution of Roman names in the European provinces of the empire, City of Vienna, Austria • Epigraphic corpora • CIL – Corpus Inscriptionum Latinarum • AE – L'Année Epigraphique • Inscriptions Clauss/Slaby – University of Frankfurt • Thesauri / Dictionaries • TGN – Getty Thesaurus of Geographic Names • Alexandria DL Gazetteer – 5.000.000 current place names (web service) • Barrington Atlas of the Greek and Roman World Map-by-Map Directory–provides information about every place or feature in the Atlas
E22 Man-Made Object E42 Object Identifier E55 Type OID:LUPA.5 Stele E31 Document E53 Place E5 Event E31 Document E35 Title E53 Place E31 Document E31 Document E53 Place CIT:Vorbeck,Militarinschr (1980) Nr. 182 DiscoveryOf:PO: LUPA.5 CIT:CILIII 13483 Petronell Pannonia Petronell(Carnuntum) Vorbeck CILIII SteledesC.Iulius Mapping stone data bases to CIDOC-CRM P102F.has_title P1F.is identified_by P2F.has_type P106B.forms part of P70B.is documented in P106B.forms part of PO:LUPA.5 P12B.was present at P7F.tookplace at P55F.has current location P89F.falls within
E22 Man-Made Object Literal Literal CaiusIuliusCaifiliusCorneliaThessalonicamileslegionisXVApollinarisannorumXXXIstipendiorumXIIhicsitusestCaiusCluviusetBassuslibertusheredesposuerunt C(aius)IuliusC(ai)f(ilius)Corne(lia)Thessal(onica)mil(es)leg(ionis)XVApolli(naris)ann(orum)XXXIstip(endiorum)XIIh(ic)s(itus)e(st)C(aius)Cl(u)viusetBassusl(ibertus)h(eredes)p(osuerunt) E34 Inscription E31 Document E31 Document Literal INSC:PO:LUPA.5 Vorbeck CILIII CIULIUSCFCORNETHESSALMILLEGXVAPOLLIANNXXXISTIPXIIHSECCLVIUSETBASSUSLHP E41 Appellation E31 Document E41 Appellation E31 Document CIT:Vorbeck,Militarinschr (1980) Nr. 182 CIT:CILIII, 13483 Mapping stone data bases to CIDOC-CRM P65F.shows visual item P150F shows characters P151F hastranscription P152F hascleartext P1F.is identified by P70B.is documented in PO:LUPA.5 P106B.forms part of P106B.forms part of
Literal C(aius)Iulius/C(ai)f(ilius)Corne(lia)/Thessalo(nica)/mil(es)leg(ionis)XV/Apolli(naris)ann(orum)/XXXIstip(endiorum)XII/h(ic)s(itus)e(st)/C(aius)Cluvius/etBassus/l(ibertus)h(eredes)p(osuerunt) E34 Inscription E31 Document Literal E31 Document E31 Document INSC:CIL III, 13483 CIULIUSCFCORNETHESSALOMILLEGXVAPOLLIANNXXXISTIPXIIHSECCLUVIUSETBASSUSLHP DOC:IG-10-02-01 DOC:AE 1896 DOC: CILIII E41 Appellation E31 Document E41 Appellation E31 Document E41 Appellation E31 Document CIT:CILIII, 13483 CIT: IG-10-02-01, 01033 CIT: AE 1896, 00024 Mapping epigraphic corpora to CIDOC-CRM P150F.shows characters P151F.hastranscription P1F.is identified by P70B.is documented in P106B.forms part of P106B.forms part of P106B.forms part of
E55 Type cognomina E41 Appellation E41 Appellation E41 Appellation E41 Appellation E34 Inscription E41 Appellation APPEL: Bassu[s] APPEL: Bassus APPEL: [B]assa INSC:CIL III 13483 APPEL: BASSVS* APPEL: Bassa E31 Document E53 Place E31 Document CIT: CIL III 13483 CILIII Pannonia E24 Physical Man-Made Stuff PO:CIL III 13483 E5 Event DiscoveryOf:PO: CIL III 13483 Mapping OPEL to CIDOC-CRM P67B.is referred to by P70B.is documented in P65B.is shown by P139F.has alternative form P2F.has type P106B.forms part of P12B.was present at P7F.tookplace at
E22 Man-Made Object was present at is shown by E41 Appellation E34 Inscription E41 Appellation was present at has alternative form APPEL: BASSVS* INSC:CIL III 13483 APPEL: Bassus E5 Event is referred to by is documented in E53 Place E31 Document E31 Document E53 Place is identified by is documented in DiscoveryOf:PO: LUPA.5 DOC:AE CILIII Pannonia Petronell(Carnuntum) E24 Physical Man-Made Stuff tookplace at E34 Inscription shows visual item tookplace at PO:CIL III 13483 INSC:PO:LUPA.5 is identified by E41 Appellation E31 Document forms part of CIT: AE 1896, 00024 E41 Appellation E31 Document falls within E5 Event forms part of DiscoveryOf:PO: CIL III 13483 CIT:CILIII, 13483 Integration Into One Resource PO:LUPA.5 Stone data bases Epigraphic corpora Name data bases Thesauri/Dictionaries
Identity Problem Two approaches: a) avoid taking two different items for the same => use local id, where uniqueness is guaranteed b) try to find global names with a high chance to match. • Lupa solution is a): • We give a serial number to any new object we insert • We use the serial number of the source database. • Example: P.O: arachne.45305 • or : P.O: lupa.4501 • We maintain local id in the global index as valid names and remove detected duplicates continuously. • Cost-benefit optimization of over- and under-identification!
E22 Man-Made Object PO:LUPA.2849 E42 Object Identifier E42 Object Identifier E55 Type OID:ARACHNE.80581 OID:LUPA.2849 Stele E34 Inscription INSC:CIL III 10514 E22 Man-Made Object E35 Title E35 Title SteledesNertusLingauster Grabstele des Nertus • Reactive Data Cleaning • Initial Data has title has type is identified by shows visual item shows visual item is identified by PO:ARACHNE.80581 has title
E22 Man-Made Object PO:LUPA.2849 E42 Object Identifier E42 Object Identifier E55 Type OID:LUPA.2849 OID:ARACHNE.80581 Stele E34 Inscription INSC:CIL III 10514 E35 Title E35 Title SteledesNertusLingauster Grabstele des Nertus • Reactive Data Cleaning • Result has title has type is identified by shows visual item is identified by has title
VBI ERAT LVPA Results • A method and architecture for integration of diverse archaeological copora on the Roman stone monuments under the CIDOC CRM model. • We developed an efficient way for place name recognition • We are developing a research tool suitable for formulating queries and drawing conclusions on archaeological data • detection of contextual relationships that cannot be derived from interpreting the sources in isolation • a method of identifying epigraphic references and finds • test bed for the CIDOC CRM model - proved its adequacy • First large scale integration project of multiple complementary resources as a global index to the original sources
Future work • integrate more data sources • support a mechanism to visualize a source • support an automatic mapping process so that archaeologists will be able to maintain the system b themselves.