1 / 65

Conceptual foundations for semantic mapping and semantic search

Explore the use of canonical expressions and the KOS (Knowledge Organization Systems) Concept Hub to achieve interoperability and improve semantic-based search across multiple collections in multiple languages. Examples and implementation approaches are discussed.

fcarlson
Download Presentation

Conceptual foundations for semantic mapping and semantic search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo Cologne Conference on Interoperability and Semantics in Knowledge OrganizationCologne University of Applied Sciences Institute of Information Management (IIM)July 19, 2010

  2. Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport LCSH Shipping Inland water transport Merchant marine Harbors German Hafen 2

  3. Outline • Objective: Interoperability Plus • KOS concept hub: canonical expressions • Examples: Knowledge base and applications • ImplementationCanonical expressions local, hub globalKnowledge-based, computer-assisted creation of canonical expressions to represent concepts.Crowdsourcing • Cross-language mapping and shades of meaning • Conclusion

  4. Objective Improve semantic-based search across multiple collections in multiple languages. • Interoperability between any two participating KOS(Knowledge Organization Systems) • Support for search, esp. facet-based search • for any collection indexed by a participating KOS • for search based on free-text or free-form social tagging • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned

  5. KOS Concept Hub • Interoperability is achieved by representing concepts from all participating KOS through canonical expressions, such as a description logic formula using atomic concepts and relationships • The backbone of the proposed system is an extensible faceted core classification of atomic concepts together with a set of relationships • Mapping from KOS to KOS is achieved by reasoning over these canonical expressions

  6. Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport LCSH Shipping Inland water transport Merchant marine Harbors German Hafen

  7. Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Traffic station Vehicle parking Terminal facilities Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport By type of water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport By component of traffic station Vehicle parking ⊓ Water transport Terminal facilities ⊓ Water transport LCSH/AAT Shipping water transport Inland water transport Merchant marine Harbors ports harbors 7

  8. Examples from theLibrary of Congress Classificationand theLibrary of Congress Subject

  9. Examples from theLibrary of Congress Classificationand theLC Subject Headings

  10. Core Classificationfaceted classification

  11. LC subject headings with combinations of atomic concepts

  12. Mapping through a Hub LCC TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing Hub L17 Vehicles ⊓ L33 Air transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing ⊓ T73 Military⊓ Underwater LCSH Aeroplanes-Soundproofing Ships-Soundproofing

  13. Mapping user queries User query Free text Combination of elemental concepts through facets (guided query formulation) Controlled term(s) from a KOS, possibly found through browsing a KOS Hub Canonical form of query (DL formula) Final query (Enriched) free text query Query in terms of a KOS

  14. Query:L17 Vehicles AND R37 Soundproofing

  15. Examples fromNALT, LCSH, DDC, and SWD • NALT National Agricultural Library Thesaurus • LCSH Library of Congress Subject Headings • DDC Dewey Decimal Classification • SWD Schlagwortnormdatei

  16. Mapping through a Hub LCSH Air - pollution Laws and regulations Air – pollution - Laws and regulations Hub [isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable NALT Air pollution Laws and regulations Air pollution ANDLaws and regulations

  17. Mapping through a Hub DDC 363.739 2 Air pollution 340 Law 344.046 342 Air pollution [Law] 363.739 26 Air pollution rights Hub [isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable [prop.] Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] International treaty [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] Rights [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable SWD Luftverschmutzung Gesetz ??? Übereinkommen über weiträumige grenzüberschreitende Luftverschmutzung Umweltzertifikat

  18. Soil moisture vs. Soil water LCSH term Soil moisture [isa] Water [containedIn] Soil NALT term Soil water [isa] Water [containedIn] Soil Mapping LCSH ▬► NALT Soil moisture ▬► Soil water

  19. Greenhouse gardening LCSH term Greenhouse gardening [isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home NALT terms Home gardening [isa] Gardening [inEnvironment] Home Greenhouse [isa] Greenhouse Mapping LCSH ▬► NALT Greenhouse gardening ▬► Home gardening AND Greenhouse

  20. Salad greens LCSH term Salad greens [isa] Green leafy vegetable [usedFor] Salad NALT term Green leafy vegetables [isa] Green leafy vegetable Mapping LCSH ▬► NALT Salad greens ▬► BT Green leafy vegetables

  21. Emerging diseases LCSH term Emerging infectious diseases [isa] Disease [hasProperty] Infectious [hasProperty] Emerging NALT term Emerging diseases [isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging Mapping LCSH ▬► NALT ??? Emerging infectious diseases ▬► Emerging diseases Emerging infectious diseases ▬► BT Emerging diseases

  22. Mapping through a Hub DDC 331.4 Women workers Hub [isa] Worker [hasGender] Female [isa] Worker [hasGender] Female [hasStatus] Employee [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Unskilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Skilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] Salaried [isa] Work BeingDone [executedBy] {Worker [hasGender] Female} SWD Arbeitnehmerin Arbeiterin Ungelernte Arbeiterin Hilfsarbeiterin Facharbeiterin Angestellte Frauenarbeit

  23. Knowledge base for query formulation Physician = [isa] Worker [profLevel] Doctoral [domain] Medicine Oncologist = [isa] Worker [profLevel] Doctoral [domain] Oncology Ophthalmologist = [isa] Worker [profLevel] Doctoral [domain] Ophthalmology Physician ST Doctor Ophthalmologist ST Eye doctor Medicine BT Health care [isa] Worker [profLevel] Doctoral BT Professional Income ST Earnings Income NT Compensation Compensation ET Pay Compensation NT Wages Fee schedule [usedBy] {Insurance company [domain] Health care} <influences> Compensation [receivedBy] Physician

  24. Mapping user queries User query Doctor's pay Hub Compensation [receivedBy] Physician Final query (Enriched) free text query See below [(Physician OR Doctor OR Oncologist OR Ophthalmologist OR (Professional AND (Medicine OR "Health care" OR Oncology OR Ophthalmology))) AND (Pay OR Earnings OR Compensation OR Wages OR Income)] OR [("fee schedule" OR fee) AND ("health insurance" OR "Blue Cross" OR Medicare OR Medicaid)]

  25. Examples from the realm of AAT Taiwan AAT Art and Architecture Thesaurus (Getty) AAT Taiwan TELDAP, Institute for Information Science Academia Sinica TGM Thesaurus of Graphic Materials, Library of Congress E-HowNet A Lexical Knowledge Base for Semantic Composition, Academia Sinica

  26. Mapping through a Hub TGM temples synagogues churches mosques Buddhist temples Taoist temples Hub Facility ⊓ Worship Facility ⊓ Worship ⊓ Judaism Facility ⊓ Worship ⊓ Christianity Facility ⊓ Worship ⊓ Islam Facility ⊓ Worship ⊓ Buddhism Facility ⊓ Worship ⊓ Taoism AAT/ Chinese temples (buildings) synagogues (buildings) churches (buildings) mosques (buildings) 禪寺 道觀

  27. Mapping to Chinese • Use E-HowNet formal semantic expressions • Use terms that already exist in E-HowNet • Add terms using computer-assisted derivation of semantic expressions as described later for English

  28. E-HowNet ontology 廣義知識知識本體 • Building| 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施: domain = {religion |宗教}} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施: domain = {Buddhist |佛教}} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施: domain = {Taoism |道教}} 29

  29. ImplementationHow to get to thepromised land

  30. Examples of derivingcanonical expressions • Creating canonical expressions is key • Start out with some examples

  31. Underlying faceted classification

  32. Method: Assigning atomic concepts 1

  33. Method: Assigning atomic concepts 2

  34. Method: Assigning atomic concepts 3

  35. Method: Assigning atomic concepts 4

  36. Method: Assigning atomic concepts 5

  37. Distributed implementation • Key principle:Canonical expressions can be created locally,The hub places each concept in a global structure • The person or algorithm producing canonical expressions need to know only the core classification. They need not know the structure of the often large KOS to be mapped

  38. Distributed implementation • Ideally, use one central faceted classification of core concepts, but multiple mapped core classifications could be used • The central core classification is extensible and should continuously updated by many contributors • The central core classification must be able to express shades of meaning and, in the long run, usage information

  39. Distributed implementation • A KOS could assign canonical expressions to its concepts − let's call this a semantically enhanced KOS or SEKOS • It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used)

  40. Efficient creation of canonical expressions • Apply existing knowledge:Large knowledge base ▬► less effort for processing a new KOS • Use knowledge of KOS structure for hierarchical inheritance • Use linguistic analysis of terms and captions • Eliminate redundant atomic concepts • Check or produce mapping results from assignment of concepts to the same records • Get human editors’ input and verification where needed through a user-friendly interface. Crowdsourcing, one term at a time • KOS “owners” may verify and edit data pertaining to their KOS

  41. Knowledge base Requires an ever larger classification and lexical knowledge base containing many kinds of data: • A faceted classification of atomic conceptsSeeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies

  42. Knowledge base 2 Requires an ever larger classification and lexical knowledge base containing many kinds of data: 2. Linguistic knowledge bases such as WordNet, E-HowNet (Chinese), FrameNet, and mono-,bi-, and multi-lingual dictionaries and thesauri 3. Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Schlagwortnormdatei ,MeSH and UMLS, AGROVOC, Gene Ontology 4. These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a canonical expression. One database: Intellectual, not physical. Could be in Linked Data

  43. Take-home message It is time to unify many disparate mapping efforts on a sound semantic footing

  44. Dagobert Soergel dsoergel @ buffalo.edu www.dsoergel.com

  45. 3. To help students distinguish between'beat', 'earn', 'gain' & 'win'

  46. Air pollution laws LCSH term Air – Pollution – Laws and regulations [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable} NALT terms Air pollution [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable Laws and regulations [isa] Legal rule Mapping LCSH ▬► NALT Air – Pollution – Laws and regulations ▬► Air pollution AND Laws and regulations Interpretation for indexing and searching in both directions

  47. T

  48. Means Create a comprehensive knowledge base relating many classification schemes and subject heading lists used in libraries and in other contexts (LCC, DDC, DMOZ directory, LCSH, European schemes). Use combinations of atomic concepts taken from a well-structured underlying faceted classification to represent the meaning of classes and subject headings. • This project will achieve the following • Interoperability between any two participating Knowledge Organization Systems (KOS) (to the extent the two schemes allow) • Facet-based search • for any collection indexed by a participating KOS • for free-text search • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned

More Related