1 / 21

Thesaurus Mapping

Thesaurus Mapping. Martin Doerr. Centre for Cultural Informatics and Documentation Systems. Institute of Computer Science. Foundation for Research and Technology - Hellas. Bath, UK, January 11, 2000. Thesaurus Mapping The Problem. Logical aspects Semantics of involved entities

romney
Download Presentation

Thesaurus Mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ThesaurusMapping Martin Doerr Centre for Cultural Informatics and Documentation Systems Institute of Computer Science Foundation for Research and Technology - Hellas Bath, UK, January 11, 2000

  2. Thesaurus MappingThe Problem • Logical aspects • Semantics of involved entities • Notions of translation • Objectives and logics of mapping • Production of mappings • Human • Language engineering, cluster analysis • Architecture • Mapping management • Mapping service • Integration in IT environment

  3. Thesaurus MappingWhy do we need mapping? • Thesauri for information retrieval depend on: • View point (e.g. functional, morphological, social, special database fields etc.) • Language or social group (experts, common people etc.) • Size and distribution of target material (effective partitioning) • Therefore • Concepts differ • Use of concepts differs • Semantic embedding differs • Even if we agree on the same world • Research topic: Formalisation of views and context

  4. Thesaurus Mapping Semantics of entities • Concepts are defined by agreement, e.g. orange (colour) • Concepts identify sets of real world objects • Concepts are identified by • scope notes, literature references, examples, images • Concepts should not be changed • they should be created or abandoned • they should be understood, accepted or rejected • A Descriptor is a concept identifier

  5. Thesaurus Mapping Semantics of entities • Links should express opinions and differences • about set relation between concepts subsumtion, disjointness etc. • about derived concepts • about term usage • opinions may be human or computational ! • Terms (noun phrases) should be used • by social groups to refer to (multiple) concepts • without direct linguistic meaning • one term is selected as concept identifier

  6. Thesaurus Mapping Semantics of entities • concept - concept relations: • set semantics : BT, between thesauri/ version - for query expansion, users • associative: RTs, BTP, etc, - for user guidance • concept - term : • authoritative: preferred, used for - for cataloguers, users • statistical, possible synonyms: - for information retrieval • term - term relations : • dictionary entries: - limited precision, within LE tools

  7. Thesaurus Mapping What is a Multilingual Thesaurus? • A translated thesaurus: For comprehension • Established concepts and terms from one user group • Optimally interpreted in words of another or more languages • Translations are not established terms • Mapped thesauri (ISO5964): For transition • Independent thesauri, each one from another user group • Established concepts and terms. • links declare “overlap” between concepts • Interlingua: For communication and knowledge sharing • Compromise to share concepts between many user groups • Optimally interpreted in words of another language

  8. Thesaurus Mapping Functionality of Mapping • Transparent query transformation (Z39.50!) • Replace Boolean term combination from thesaurus A with optimal term combination from thesaurus B to retrieve equivalent results • Guaranteed transition needed (ev. to higher concepts) • Need controlled loss of precision or recall (research!) • Combinatorial explosion: Need cascading Thes A => Thes B => Thes C

  9. Thesaurus Mapping Logics of Mapping • Interthesaurus relations (ISO 5964) • (from Descriptor of Thes. A to Descriptor of Thes. B ) • partial equivalence • Better: broader equivalence • narrower equivalence • exact equivalence • inexact equivalence (“+/-”) • good for FTR only • single to multiple equivalence • Better:exact equivalence to BOOLEAN combination of • target terms. • “AND” (intersection), “OR” (union), “NOT” (complement)

  10. +/- • +/- • +/- Thesaurus Mapping Translation and Mapping English Heritage Thesaurus Merimee Thesaurus AND Interthesaurus relations linguistic translation linguistic translation Interlingua French Vocabulary English Vocabulary

  11. Thesaurus Mapping Boolean OR-Combinations • Combines instances of B and C • Uses properties of either B or C • Is BT of B, C and NT of • their common broader terms. Exact equivalence A B OR C BT Boolean Compound B C

  12. Thesaurus Mapping Boolean AND-Combinations • Uses instances of both, B and C • Combines properties of B and C • Is NT of B, C and BT of their • common narrower terms. C B BT A Exact equivalence B AND C Boolean Compound

  13. Thesaurus Mapping Approximation by Inclusion Broader equivalence A BT B Narrower equivalences C

  14. Thesaurus Mapping Avoid redundant linking! Broader equivalence B A BT Exact equivalence Narrower equivalences

  15. Thesaurus Mapping Problems of Mapping • Consistency and reasoning (Description Logics!) • Optimal substitution of combined query terms • Protocol to propagate recall/ precision control • Inverse reading of one-to-many links. • Postcoordination : unclear semantics ! e.g. “grinding & factories”, solution by DL ?

  16. Thesaurus Mapping Production of Mappings • Human assessment needs (see Term-IT): • CSCW, work flow, decentralised management tools • Excellent comparative presentation of thesaurus contents • Language engineering (see Term-IT): • termhood recognition, automatic translation by parallel texts, filtering by occurrence in target indexing language. • Excellent for preprocessing ! • Analysis of use: • Cluster analysis with doubly indexed entries. • Libraries: problem to identify the same “work” !

  17. BT links of group2 links of group1 SIS - Thesaurus Management System Co-operative linking Group 1 Group 2 Version 0 Version 0 Version 1 Version 1 Version 2 New Workspace New Workspace obsolete term

  18. Distributed Retrieval User’s Authorities Target Authorities CMS Collections foreign language ? old version specialized Agreed-on Term Local Term Thesaurus MappingUsers Environment

  19. Search Aid Tool Cascaded mapping service Thesaurus MappingThree-level Architecture End User National Authority Providers Local TMS Local TMS concept proposal Update term use Thesaurus initialization concept proposal Update term use Thesaurus initialization CMS Maintainer CMS Maintainer CMS CMS

  20. Thesaurus Mapping Architectural Considerations • We propose to distinguish: • Collection Management Systems with local term management • National authority providers • Mapping service • Mapping service: • Co-operative mapping production environment and system, - for few languages (3?), domain specific ? • Large scale mapping tables detached from production system, accessible as replicated Web resource. • Integration: • Access engines connect to mapping resources on demand • Provision of suitable metadata for CMS capabilities

  21. Thesaurus Mapping Conclusions • Thesaurus mapping is feasible and the best means to access coherently multiple CMS with controlled vocabulary • Thesaurus mapping is a major investment in human resources and IT environment • Targeted research can much improve the currently feasible - quality of mapping - quality of service - and production cost

More Related