conceptual foundations for semantic mapping and semantic search l.
Download
Skip this Video
Download Presentation
Conceptual foundations for semantic mapping and semantic search

Loading in 2 Seconds...

play fullscreen
1 / 65

Conceptual foundations for semantic mapping and semantic search - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

Conceptual foundations for semantic mapping and semantic search. Dagobert Soergel Department of Library and Information Studies, University at Buffalo. Cologne Conference on Interoperability and Semantics in Knowledge Organization Cologne University of Applied Sciences

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Conceptual foundations for semantic mapping and semantic search' - waldo


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
conceptual foundations for semantic mapping and semantic search

Conceptual foundations for semantic mapping and semantic search

Dagobert Soergel

Department of Library and Information Studies, University at Buffalo

Cologne Conference on Interoperability and Semantics in Knowledge OrganizationCologne University of Applied Sciences

Institute of Information Management (IIM)July 19, 2010

slide2

Mapping through a Hub

Dewey

387 Water, air, space transportation

386 Inland waterway & ferry transportation

387.5 Ocean transportation

386.8 Inland waterway tr. > Ports

387.1 Ports

Hub

Water transport

Inland water transport

Ocean transport

Traffic station ⊓ Water transport

Traffic station ⊓ Inland water tr.

Traffic station ⊓ Ocean transport

LCSH

Shipping

Inland water transport

Merchant marine

Harbors

German

Hafen

2

outline
Outline
  • Objective: Interoperability Plus
  • KOS concept hub: canonical expressions
  • Examples: Knowledge base and applications
  • ImplementationCanonical expressions local, hub globalKnowledge-based, computer-assisted creation of canonical expressions to represent concepts.Crowdsourcing
  • Cross-language mapping and shades of meaning
  • Conclusion
objective
Objective

Improve semantic-based search across multiple collections in multiple languages.

  • Interoperability between any two participating KOS(Knowledge Organization Systems)
  • Support for search, esp. facet-based search
    • for any collection indexed by a participating KOS
    • for search based on free-text or free-form social tagging
  • Assistance in cataloging (metadata creation) by catalogers or users (social tagging)
  • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned
kos concept hub
KOS Concept Hub
  • Interoperability is achieved by representing concepts from all participating KOS through canonical expressions, such as a description logic formula using atomic concepts and relationships
  • The backbone of the proposed system is an extensible faceted core classification of atomic concepts together with a set of relationships
  • Mapping from KOS to KOS is achieved by reasoning over these canonical expressions
slide6

Mapping through a Hub

Dewey

387 Water, air, space transportation

386 Inland waterway & ferry transportation

387.5 Ocean transportation

386.8 Inland waterway tr. > Ports

387.1 Ports

Hub

Water transport

Inland water transport

Ocean transport

Traffic station ⊓ Water transport

Traffic station ⊓ Inland water tr.

Traffic station ⊓ Ocean transport

LCSH

Shipping

Inland water transport

Merchant marine

Harbors

German

Hafen

slide7

Mapping through a Hub

Dewey

387 Water, air, space transportation

386 Inland waterway & ferry transportation

387.5 Ocean transportation

386.8 Inland waterway tr. > Ports

387.1 Ports

Hub

Traffic station

Vehicle parking

Terminal facilities

Water transport

Inland water transport

Ocean transport

Traffic station ⊓ Water transport

By type of water transport

Traffic station ⊓ Inland water tr.

Traffic station ⊓ Ocean transport

By component of traffic station

Vehicle parking ⊓ Water transport

Terminal facilities ⊓ Water transport

LCSH/AAT

Shipping

water transport

Inland water transport

Merchant marine

Harbors

ports

harbors

7

slide13

Mapping through a Hub

LCC

TL681.S6 Airplanes. Soundproofing

VM367.S6 Submarines. Soundproofing

Hub

L17 Vehicles ⊓ L33 Air transport ⊓

R37 Soundproofing

L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing

L17 Vehicles ⊓ L37 Water transport ⊓

R37 Soundproofing ⊓ T73 Military⊓

Underwater

LCSH

Aeroplanes-Soundproofing

Ships-Soundproofing

slide14

Mapping user queries

User query

Free text

Combination of elemental concepts through facets (guided query formulation)

Controlled term(s) from a KOS, possibly found through browsing a KOS

Hub

Canonical form of query

(DL formula)

Final query

(Enriched) free text query

Query in terms of a KOS

examples from nalt lcsh ddc and swd
Examples fromNALT, LCSH, DDC, and SWD
  • NALT National Agricultural Library Thesaurus
  • LCSH Library of Congress Subject Headings
  • DDC Dewey Decimal Classification
  • SWD Schlagwortnormdatei
slide17

Mapping through a Hub

LCSH

Air - pollution

Laws and regulations

Air – pollution - Laws and regulations

Hub

[isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable

Undesirable

[isa] Legal rule

[isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant

[property] Undesirable

NALT

Air pollution

Laws and regulations

Air pollution ANDLaws and regulations

slide18

Mapping through a Hub

DDC

363.739 2 Air pollution

340 Law

344.046 342 Air pollution [Law]

363.739 26 Air pollution rights

Hub

[isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable

[prop.] Undesirable

[isa] Legal rule

[isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable

[isa] International treaty [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable

[isa] Rights [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable

SWD

Luftverschmutzung

Gesetz

???

Übereinkommen über weiträumige grenzüberschreitende Luftverschmutzung

Umweltzertifikat

soil moisture vs soil water
Soil moisture vs. Soil water

LCSH term

Soil moisture

[isa] Water [containedIn] Soil

NALT term

Soil water

[isa] Water [containedIn] Soil

Mapping LCSH ▬► NALT

Soil moisture ▬► Soil water

greenhouse gardening
Greenhouse gardening

LCSH term

Greenhouse gardening

[isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home

NALT terms

Home gardening

[isa] Gardening [inEnvironment] Home

Greenhouse

[isa] Greenhouse

Mapping LCSH ▬► NALT

Greenhouse gardening ▬► Home gardening AND

Greenhouse

salad greens
Salad greens

LCSH term

Salad greens

[isa] Green leafy vegetable [usedFor] Salad

NALT term

Green leafy vegetables

[isa] Green leafy vegetable

Mapping LCSH ▬► NALT

Salad greens ▬► BT Green leafy vegetables

emerging diseases
Emerging diseases

LCSH term

Emerging infectious diseases

[isa] Disease [hasProperty] Infectious [hasProperty] Emerging

NALT term

Emerging diseases

[isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging

Mapping LCSH ▬► NALT ???

Emerging infectious diseases ▬► Emerging diseases

Emerging infectious diseases ▬► BT Emerging diseases

slide23

Mapping through a Hub

DDC

331.4 Women workers

Hub

[isa] Worker [hasGender] Female

[isa] Worker [hasGender] Female [hasStatus] Employee

[isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay

[isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Unskilled

[isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Skilled

[isa] Worker [hasGender] Female [hasStatus] Employee

[hasPayStatus] Salaried

[isa] Work BeingDone [executedBy] {Worker [hasGender] Female}

SWD

Arbeitnehmerin

Arbeiterin

Ungelernte Arbeiterin

Hilfsarbeiterin

Facharbeiterin

Angestellte

Frauenarbeit

slide24

Knowledge base for query formulation

Physician = [isa] Worker [profLevel] Doctoral [domain] Medicine

Oncologist = [isa] Worker [profLevel] Doctoral [domain] Oncology

Ophthalmologist = [isa] Worker [profLevel] Doctoral [domain] Ophthalmology

Physician ST Doctor

Ophthalmologist ST Eye doctor

Medicine BT Health care

[isa] Worker [profLevel] Doctoral BT Professional

Income ST Earnings

Income NT Compensation

Compensation ET Pay

Compensation NT Wages

Fee schedule [usedBy] {Insurance company [domain] Health care} <influences> Compensation [receivedBy] Physician

slide25

Mapping user queries

User query

Doctor's pay

Hub

Compensation [receivedBy] Physician

Final query

(Enriched) free text query

See below

[(Physician OR Doctor OR Oncologist OR Ophthalmologist OR (Professional AND (Medicine OR "Health care" OR Oncology OR Ophthalmology))) AND

(Pay OR Earnings OR Compensation OR Wages OR Income)]

OR

[("fee schedule" OR fee) AND ("health insurance" OR "Blue Cross" OR Medicare OR Medicaid)]

examples from the realm of aat taiwan
Examples from the realm of AAT Taiwan

AAT Art and Architecture Thesaurus (Getty)

AAT Taiwan TELDAP, Institute for Information Science Academia Sinica

TGM Thesaurus of Graphic Materials, Library of Congress

E-HowNet A Lexical Knowledge Base for Semantic Composition, Academia Sinica

slide27

Mapping through a Hub

TGM

temples

synagogues

churches

mosques

Buddhist temples

Taoist temples

Hub

Facility ⊓ Worship

Facility ⊓ Worship ⊓ Judaism

Facility ⊓ Worship ⊓ Christianity

Facility ⊓ Worship ⊓ Islam

Facility ⊓ Worship ⊓ Buddhism

Facility ⊓ Worship ⊓ Taoism

AAT/ Chinese

temples (buildings)

synagogues (buildings)

churches (buildings)

mosques (buildings)

禪寺

道觀

mapping to chinese
Mapping to Chinese
  • Use E-HowNet formal semantic expressions
  • Use terms that already exist in E-HowNet
  • Add terms using computer-assisted derivation of semantic expressions as described later for English
e hownet ontology
E-HowNet ontology 廣義知識知識本體
  • Building| 建築物

Facilities |設施

Chinese Word: 廟

English: Temple

Conceptual expression: {facilities |設施: domain = {religion |宗教}}

Chinese Word: 禪寺

English: Buddhist temple

Conceptual expression: {facilities |設施: domain = {Buddhist |佛教}}

Chinese Word: 道觀

English: Taoist temple/ Taoist quan

Conceptual expression: {facilities |設施: domain = {Taoism |道教}}

29

examples of deriving canonical expressions
Examples of derivingcanonical expressions
  • Creating canonical expressions is key
  • Start out with some examples
distributed implementation
Distributed implementation
  • Key principle:Canonical expressions can be created locally,The hub places each concept in a global structure
  • The person or algorithm producing canonical expressions need to know only the core classification. They need not know the structure of the often large KOS to be mapped
distributed implementation39
Distributed implementation
  • Ideally, use one central faceted classification of core concepts, but multiple mapped core classifications could be used
  • The central core classification is extensible and should continuously updated by many contributors
  • The central core classification must be able to express shades of meaning and, in the long run, usage information
distributed implementation40
Distributed implementation
  • A KOS could assign canonical expressions to its concepts − let's call this a semantically enhanced KOS or SEKOS
  • It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used)
efficient creation of canonical expressions
Efficient creation of canonical expressions
  • Apply existing knowledge:Large knowledge base ▬► less effort for processing a new KOS
  • Use knowledge of KOS structure for hierarchical inheritance
  • Use linguistic analysis of terms and captions
  • Eliminate redundant atomic concepts
  • Check or produce mapping results from assignment of concepts to the same records
  • Get human editors’ input and verification where needed through a user-friendly interface. Crowdsourcing, one term at a time
  • KOS “owners” may verify and edit data pertaining to their KOS
knowledge base
Knowledge base

Requires an ever larger classification and lexical knowledge base containing many kinds of data:

  • A faceted classification of atomic conceptsSeeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies
knowledge base 2
Knowledge base 2

Requires an ever larger classification and lexical knowledge base containing many kinds of data:

2. Linguistic knowledge bases such as WordNet, E-HowNet (Chinese), FrameNet, and mono-,bi-, and multi-lingual dictionaries and thesauri

3. Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Schlagwortnormdatei ,MeSH and UMLS, AGROVOC, Gene Ontology

4. These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a canonical expression. One database: Intellectual, not physical. Could be in Linked Data

take home message
Take-home message

It is time to unify many disparate mapping efforts on a sound semantic footing

slide45

Dagobert Soergel

dsoergel @ buffalo.edu

www.dsoergel.com

air pollution laws
Air pollution laws

LCSH term

Air – Pollution – Laws and regulations

[isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable}

NALT terms

Air pollution

[isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable

Laws and regulations

[isa] Legal rule

Mapping LCSH ▬► NALT

Air – Pollution – Laws and regulations ▬► Air pollution AND

Laws and regulations

Interpretation for indexing and searching in both directions

slide50

Means

Create a comprehensive knowledge base relating many classification schemes and subject heading lists used in libraries and in other contexts (LCC, DDC, DMOZ directory, LCSH, European schemes).

Use combinations of atomic concepts taken from a well-structured underlying faceted classification to represent the meaning of classes and subject headings.

  • This project will achieve the following
  • Interoperability between any two participating Knowledge Organization Systems (KOS) (to the extent the two schemes allow)
  • Facet-based search
    • for any collection indexed by a participating KOS
    • for free-text search
  • Assistance in cataloging (metadata creation) by catalogers or users (social tagging)
  • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned
why might this work
Why might this work
  • Principled, powerful, concept-based intellectual technology.Long-established ideas revived using modern linguistic and AI methods
  • Large scale creates synergy
    • As more KOS participate in the system, processing a new KOS requires less effort
    • Unified access to mapping data from many projectsNeed to talk to MACS, CrissCross, Stitch, OAEI (Ontology Alignment Evaluation Initiative), etc. about importing their mapping data
  • Configured as a Semantic Web Resource
slide52
L
  • We will seed the faceted classification from many sources that have well-developed facets such as the AOD thesaurus, the Harvard Business Thesaurus (if it is made available), Art and Architecture thesaurus (some facets have mainly elemental concepts), various ontologies to be identified.Will consult CIDOC-CRM for structure
slide53
L
  • The internal data format will be rich to deal with any kind of information on concepts and terms that will be useful
  • Will keep very detailed track of sources
  • Will keep track of access rights
  • Will import and export to any format required, especially to SKOS and OWLNote: Many of these formats are limited and will not preserve all information available in the proposed system
koeln 20090706
Koeln 20090706
  • Themen
  • Role indicators for building themes
  • arrangement of themes for exploration under user control
  • carry-over from citation order
  • Practical problem of connection to the participating systems – should use IDs for combinations in Hub. Make sure that hub stays consistent with participating systems.
e hownet ontology55
E-HowNet ontology 廣義知識知識本體
  • Building| 建築物

Facilities |設施

Chinese Word: 廟

English: Temple

Conceptual expression: {facilities |設施: domain = {religion |宗教}}

Chinese Word: 禪寺

English: Buddhist temple

Conceptual expression: {facilities |設施: domain = {Buddhist |佛教}}

Chinese Word: 道觀

English: Taoist temple/ Taoist quan

Conceptual expression: {facilities |設施: domain = {Taoism |道教}}

55

slide56

Mapping Issues- 1Terms related to Chinese religious concept

The word “temples” is frequently considered as an equivalent term “廟miao” in Chinese.

However, due to different purposes of the building and the spirit that it worships, names of religious buildings in Taiwan are varied.

Temples (buildings) (religious buildings, <religious structures>, ... Built Environment (Hierarchy Name))

Note: Buildings housing places devoted to the worship of a deity or deities. In the strictest sense, it refers to the dwelling place of a deity, and thus often houses a cult image. In modern usage a temple is generally a structure, but it was originally derived from the Latin "templum" and historically has referred to an uncovered place affording a view of the surrounding region. For Christian or Islamic religious buildings the terms "churches" or "mosques" are generally used, but an exception is that "temples" is used for Protestant, as opposed to Roman Catholic, places of worship in France and some French-speaking regions.

Q1. The mapping team has found that “temple” in AAT is broader than the concept in Chinese. Therefore it is necessary to distinguish the differences in each Chinese terms before mapping.

56

56

slide57

Mapping Issues- 1

Terms related to Chinese religious concept

  • Despite the similar appearance, each of them has slight difference from the others.
  • Miao(廟): In the past, it was a place to worship ancestors. Since Han dynasty, it had been used as a place both worship ancestor and the spirits.
  • ci (祠): It is built for the purpose to worship/ commemorate saints, or some famous scholars, poets, people with great achievement. Sometimes also refers to those places that worship ancestors.
  • si (寺): Generally refers to a place that worship the Buddhist spirits. Sometimes it also refers to the place where Buddhist monk live.
  • an (庵): used to refers to scholars’ study place (書齋). Nowadays it refers to where Buddhist nuns live.
  • guan(觀): only refers to Taoist building
  • yan(巖): refers to those miaos(廟) established nearby or at mountain.

57

57

slide58

Mapping Issues- 2

A Chinese set term stands for broader meaning

  • 文玩 (Wenwan)
  • A word combined with two words “文物 cultural object” and “古玩antique curio”.
  • (文玩兼有文物與古玩的特點)
  • It specifically refers to those objects used in the educated people’s reading room, including those writing equipments, small tools and decorations.
  • (特指文人書齋中的書寫設備、小工具和擺飾)
  • It represents the culture of reading room, by combining the practical function of educated people’s study equipments and art crafts for people’s appreciation.
  • (文玩是種書齋文化,結合了文人書生的實用器物與具觀賞價值的藝術品)
  • Common objects including: ink stones, seals, washing vessels, fine sculptured decoration…etc. “Elegant” and “exquisite” are its essential characters.
  • (文玩為以下器物的泛稱: 古硯、印章、洗器、牙雕…等,“雅” 與“巧”是其基本特徵 )
  • It is produced in a highly artistic manner. Nowadays it has become popular collection that values more as an artifact than equipment.
  • (以高藝術性的方式製造,現今多為賞而勿用的文房珍玩)

58

58

slide59

Mapping Issues- 2A Chinese set term stands for broader meaning

  • olive stone boat sculpture 果核小舟
  • seal 鴛錦雲章循連環田黃石印
  • blue snuff bottle
  • 藍地金星套料鼻煙壺
  • ivory desk tidy 象牙雕山水人物筆筒
  • lotus pod shaped vessel for injecting water 雙蓮房水注
  • banana leaf shaped wooden plate 癭木蕉葉盤
  • lotus leaf shaped washing vessel 白玉荷葉式洗

59

59

slide60

Mapping Issues- 2A Chinese set term stands for broader meaning

desk sets (sets (groups), <object groupings by general context>, ... Object Groupings and Systems)

Note: Sets of matching articles intended to be used on a desk including such articles as inkstands, pen trays, and stamp boxes.

Q2. The mapping team has found the meaning of Wenwan is boarder than the term “desk sets”, while some part of them are equal. Therefore, the 2 terms are inexact equivalent relations.

Is it more suitable to create a new term “Wenwan” in the structure, or it should be referred as desk sets?

60

60

when english terms have broader meanings 1 2
When English terms have broader meanings (1/2)

EX1:

ID: 300053660 Record Type: concept

stitching (<processes and techniques by specific type>, <processes and techniques>, Processes and Techniques)

Note: Refers to the process of fastening, joining, closing, uniting, mending, or creating ornamentation by stitches, which are the portions of thread left in fabric or another material by the in and out movement of a threaded needle through the thickness or surface of the material, or the loops of thread created on a needle in knitting or other needlework. In the context of textiles and needleworking, its meaning overlaps with "sewing." In the context of bookbinding, it refers to the fastening together a number of leaves or gatherings by passing the thread or wire through all of the sheets at once; it is distinct from "sewing," which, in the context of bookbinding, is used for the joining of leaves or gatherings together one by one by drawing thread or wire backwards and forwards through the back fold of each sheet to attach it to the cords.

縫綴/縫訂(<依特定種類區分之過程與技術>, <過程與技術>,過程與技術)

範圍註:意指藉由針線進出穿過材料或其表面的動作,將針腳留在布料或其他材料上,或是在編織或針織時形成針目,以固定、結合、閉合、合併、修補或製作裝飾的過程。若指涉的是紡織品與手工繡品方面,則其意義與「縫紉 (sewing)」一詞重疊。若指涉的是書籍裝幀方面,則意指將若干頁面或疊層,用線或金屬線一次穿過所有紙張固定在一起。而「線訂(sewing)」在書籍裝幀方面,是指用針線或金屬線,在一疊書頁的摺縫處上下穿梭,使其與裝訂線固定的方法。

In different contexts (bookbinding vs. needleworking), the meaning of stitching may change accordingly. In AAT, two kinds of meanings are explained in the same record, but when translating the term into Chinese, there will be two ways of translation, 縫合 (feng he) for needleworking and 縫訂 (feng ding) for bookbinding. The same problem occurs in the record of sewing (ID: 300053658).

Stiching in bookbinding

Stiching in needleworking

when english terms have broader meanings 2 2
When English terms have broader meanings (2/2)

EX2:

300004184 Record Type: concept

patios (<uncovered spaces>, <rooms and spaces by form>, ... Components (Hierarchy Name))

Note: Paved recreation areas adjoining contemporary houses and the paved interior courts of Spanish or Spanish-style buildings.

The term refers to two types of open spaces, so the translations could be 屋外休憩區 or (西班牙)內院.

Patio adjoining a house

Spanish patio

when english terms have broader meanings 2 263
When English terms have broader meanings (2/2)

EX3:

300266238

Record Type: concept

maculatures (<prints by process or technique>, prints (visual works), ... Visual and Verbal Communication)

Note: Prints made by taking a second impression without reinking the plate, often used for cleaning the plate. May also refer to blotting paper. Also used for scrap paper that can reinforce fabric in Medieval embroidery.

The term maculatures could be used in three different contexts (prints, blotting paper, and scrap paper) , and there are three kinds of translations (吸墨紙版畫、吸墨紙、固定刺繡布料的紙片).

Q3: In this case, since the record contains multiple meanings, it’s not a problem of which one being the preferred term, so how should the Chinese translations be displayed?