Semantic enrichment of mappings
Download
1 / 36

Semantic Enrichment of Mappings - PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on

Semantic Enrichment of Mappings. Patrick Arnold. Outline. 1. Motivation 2. Goals 3. Related Work 4 . Determining the Relation Type 5. Implementation 6. First Results 7. Conclusions. 1. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Semantic Enrichment of Mappings' - gretchen-roth


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Semantic enrichment of mappings

SemanticEnrichmentofMappings

Patrick Arnold

WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig


Outline
Outline

AbteilungfürDatenbanken, Inst. fürInformatik, Universität Leipzig

1. Motivation

2. Goals

3. Related Work

4. Determining the Relation Type

5. Implementation

6. First Results

7. Conclusions


1 motivation
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Classic approaches in schema/ontology matching provide only little information about the correspondences

    • Source node

    • Target node

    • Confidence

  • Further details are commonly omitted

    • What kind of relation?

      • equal, is-a, part-of, overlap

    • Simple correspondence vs. complex correspondence?

      • (first name, last name) ↔ name

    • Transformation functions?

      • gross price = net price * (1 + sales taxes)

      • name = first name + “ “ + last name


1 motivation1
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Our intentions: Mapping enrichment

    • Enhance a mapping by adding further or more-specific information to its correspondences

    • Useful for merging and transforming schemas/ontologies

  • Workflow:

    • Input: A mapping

    • Mapping enrichment carried out in an independent system (blackbox)

    • Output is an enriched mapping

      • Implies a new, more-specific format


1 motivation2
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Typical relation types

    • Equal

    • Is-a

    • Part-of

    • Overlap

  • Inverse types:

    • Equal

    • Inverse is-a

    • Has-a

    • Overlap


2 goals
2. Goals

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • First Focus: Detecting the relation type of a correspondence

    • Investigate linguistic methods on element level

    • Extension by existing strategies possible

    • equal, is-a, inverse is-a

  • Later…

    • Relation type detection on instance level

    • Exploiting background knowledge

    • Correspondence type, transformation rules, …


3 related work
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Several projects dealing with this problem

  • Mainly based on the following methods:

    • Using dictionaries, thesauri, corpora

      • WordNet, GermaNet

      • Includes tokenization, normalization of strings etc.

    • Using background knowledge

      • The Open University: Using Swoogle to retrieve multiple ontologies referring to a concept

    • Exploiting the structure between ontologies

    • Exploiting Reasoning, Bayes Nets, Feature Vectors etc.

    • Search Engines (Google)


3 related work1
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • SMatch

    • Complex strategy using WordNet to determine the following relations:

      • Equal, more-general, less-general, overlap, mismatch

      • “Overlap” offers few interesting information (concepts are somehow related…)

    • Approach: To each word in a label, annotate all meanings of this word found in WordNet

      • Compare/match the meanings of the words

      • Exploit the relations offered by WordNet


3 related work2
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • TaxoMap

    • Focus on geographic ontologies

    • Detect relations equal, is-a, invis-a and is-close

      • Focus rather on the correspondence itself, not on the type

      • Is-a relation if a label in node S appears in node T and is a full word

    • Use WordNet as additional source

      • Working on manually pre-defined branches of WordNet instead of the entire thesaurus

      • Useful for domain-specific ontologies

      • Recall: 23 %, Precision: 83 %


3 related work3
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • LogMap

    • Uses reasoning algorithms to repair/discover mappings

      • Based on Horn logics and Dowling-Gallier-Algorithm

      • Use background knowledge (thesauri)

    • Detects full correspondences and weak correspondences

    • No specific relation detection per se


4 relation type determination 4 1 introduction
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Typically, there is no link between the syntax and semantics of words

    • stool, chair, seat… refer to the same object

    • stool, school, tool, pool, wool… have nothing in common!

  • Things change when it comes to compounds…

    • blackbird is a bird

    • high school is a school


4 relation type determination 4 1 introduction1
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Compound: Two words A, B of a language form a new word AB

    • apple + tree → apple tree

    • sun + glasses → sunglasses

    • forth + with → forthwith

  • A, B can be noun, verb, adjective/adverb, preposition

    • We are normally interested in nouns


4 relation type determination 4 1 introduction2
4. Relation Type Determination4.1 Introduction

WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig

  • No compounds are...

    • Compositions AB where A (or B) is not an official word

      • broom, nausea

    • Derivations

      • discard, unload, increase, compound

    • Compositions AB where A and B are not semantically related

      • door (do + or), wither (wit + her)


4 relation type determination 4 1 introduction3
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Unlike non-compounds, semantics can be generally derived from the compound’s syntax

    • Especially in nouns

      • blackboard is a board

      • handbag is a bag

  • Germanic languages are left-branching

    • Germanic: school bus, central intelligence agency

    • Romanic: rio de laspalmas(= palm river)

  • In English, no changes are applied to the words:

    • German: Ort + Eingang → Ortseingang, Stadt + Bau → Städtebau

    • English: city + limit → city limit, city + planning → cityplanning


4 relation type determination 4 2 classification
4. Relation Type Determination4.2 Classification

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

From an Linguistic point of views…

* C⊈A, C⊈B, AB ~R B


4 relation type determination 4 2 classification1
4. Relation Type Determination4.2 Classification

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • From the English point of view…

    • Closed form

      • database, playground, blackbird

    • Hyphened form

      • bus-driver, single-minded, small-appliance industry

    • Open form

      • web space, container ship, computer scientist

  • From a POS point of view…

    • noun-noun, adjective-noun, verb-verb, …


4 relation type determination 4 3 first conclusions
4. Relation Type Determination4.3 First Conclusions

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • From the knowledge now gained, we can enrich correspondences in schemas in two ways:

    • Set the relation type to is-a instead of equal (1)

    • Remove or at least doubt an existing correspondence (2)

  • For (1) we assume that AB ⊂ B

    • (cookbook, book, 0.8, equal) → (cookbook, book, 0.8, is-a)

  • For (2) we assume that If A is not a word in AB, the correspondence is likely to be false:

    • (stool, tool, 0.9, equal) → false?

    • (refund, fund, 0.7, equal) → false?

    • (discharge, charge, 0.7, equal) → false?


4 relation type determination 4 4 mismatches
4. Relation Type Determination4.4 Mismatches

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • A word changed its spelling over the centuries:

    • butterfly (“flutter-by”, “beat fly”, …)

    • Weiße Elster (from Czech: alstra = water)

  • A compound is of literal meaning (metaphor):

    • Completely different meaning

      • computer mouse, gravy train, buttercup

    • Obvious origin (in a broad sense being related):

      • airport, birdhouse, downtown, snowman


  • 4 relation type determination 4 4 mismatches1
    4. Relation Type Determination4.4 Mismatches

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Inaccuracies in (vernacular) language

      • e.g., in biology: strawberry, blackberry, raspberry etc.

        • Neither is a berry in the biological sense

        • (yet tomato, banana, grape, pumpkin, melon etc. are)


    4 relation type determination 4 4 mismatches2
    4. Relation Type Determination4.4 Mismatches

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • For detecting the relation type, the mismatch problem has no negative effect on the mapping

      • The correspondence is wrong after all

        • (buttercup, cup, equal) is as wrong as(buttercup, cup, is-a)

      • Enrichment has no negative effect on the mapping per se

        • Still, enhanced methods can be used to reduce the mismatches


    5 implementation 5 1 goals
    5. Implementation5.1 Goals

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Specify the following relation types on linguistic methods: equal (default), is-a, inverse is-a

      • Missing: part-of and overlap

      • English and German language

        • Main focus on English language

    • Possibly apply mapping repair

      • Remove correspondences that seem clearly wrong

    • Test & Evaluation


    5 implementation 5 1 goals1
    5. Implementation5.1 Goals

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • First concentrate on the element level

      • Use linguistic knowledge as presented before

      • Different cases to be distinguished

        • Single items vs. itemizations


    5 implementation 5 2 cases
    5. Implementation5.2 Cases

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Simple Case (1:1)

      • Source and target node consist of one item

        • blackboard ↔ board

        • high school ↔ school

        • international database conference ↔ conference


    5 implementation 5 2 cases1
    5. Implementation5.2 Cases

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Complex Cases (1:n, n:1, n:m)

      • Source/target node consist of several item

        • blackboard, whiteboard ↔ board

        • wine ↔ white wine, red wine

        • beer, wine ↔ wine

        • computers, laptops ↔ computers


    5 implementation 5 3 node level vs path level
    5. Implementation5.3 Node Level vs. Path level

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Relation type depends on the perspective…

      • Node level vs. Path level

      • Relation is often…

        • is-a on node level

        • equal on path level


    5 implementation 5 3 node level vs path level1
    5. Implementation5.3 Node Level vs. Path level

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig


    5 implementation 5 4 requirements
    5. Implementation5.4 Requirements

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Benchmarks / Gold Standards (English language)

      • Manually defined

    • Dictionary / Thesauri

    • More-specific data structure

      • Correspondence: source node, target node, confidence, type

      • Node: A list of items

      • Item: A list of word

      • Word: single word vs. compound


    5 implementation 5 5 generating benchmarks
    5. Implementation5.5 Generating Benchmarks

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Benchmarks

      • More difficult than in standard mappings

      • In some cases even for humans difficult to decide

        • Birdhouse is a house?

        • Airport is a port?

      • How to judge correspondences in an evaluation?

        • car = bike → FALSE

        • car = auto → TRUE

        • motorbike ⊂ bike → ?


    5 implementation 5 6 challenges
    5. Implementation5.6 Challenges

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Exocentric compounds

      • Airport, buttercup, saw tooth, …

    • Compounds in itemizations

      • (French wine, German wine — French wine) inverse is-a

      • (French wine, German wine — European wine)is-a

      • (French wine, German wine — Mosel wine) overlap

      • (French wine, German wine — Italian wine) mismatch


    5 implementation 5 6 challenges1
    5. Implementation5.6 Challenges

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Plurals

      • (Christian churches — church)

      • (red wine, white wine — wines)

    • Short forms

      • Infant colic — colic (equal instead of is-a)

    • Node Level vs. Path Level

      • Compound extending/skipping levels in the schema


    5 implementation 5 6 challenges2
    5. Implementation5.6 Challenges

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Limited recall

      • Strong dependency to input (mapping)

      • Some is-a relations cannot be detected with simple linguistic methods

        • (car, vehicle)

        • (wine, beverage)

        • (cell phones, communication devices)


    6 first results
    6. First Results

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Web ↔ Yahoo

      • 421 Correspondences

      • 68 subset-correspondences

    • Found 50 subset-relations, with 34 being correct

      • Recall: 50.0 %

      • Precision: 68.0 %

      • f-Measure: 59.0 %


    6 first results1
    6. First Results

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Google Health ↔ Yahoo Health (excerpt)

      • 396 Correspondences

      • 31 subset-correspondences

    • Found 20 subset-relations, with 15 being correct

      • Recall: 48.3 %

      • Precision: 75.0 %

      • f-Measure: 61.6 %


    6 first results2
    6. First Results

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Main issues observed…

      • Imprecise labels

        • infant colic — colic (equal)

        • Uterine-Fibroids —Uterus.Fibroids(equal)

        • picture frames — frames (equal in field “arts”)

      • Node-Path-Discrepancies

      • “No-Compound”-Subsets

        • vehicle — car (isa)


    7 conclusions
    7. Conclusions

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    • Mapping Enrichment

      • Relation type

      • Simple vs. complex correspondences

        • Transformation rules

    • Relation Type Determination

      • Linguistic approach on element level

        • Compounds, itemizations

      • Advanced methods

        • Instance level, background knowledge etc.

        • Increase recall, keep up precision


    Discussion
    Discussion

    WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

    ThankYou!


    ad