semantic enrichment of mappings n.
Download
Skip this Video
Download Presentation
Semantic Enrichment of Mappings

Loading in 2 Seconds...

play fullscreen
1 / 36

Semantic Enrichment of Mappings - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

Semantic Enrichment of Mappings. Patrick Arnold. Outline. 1. Motivation 2. Goals 3. Related Work 4 . Determining the Relation Type 5. Implementation 6. First Results 7. Conclusions. 1. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Semantic Enrichment of Mappings' - gretchen-roth


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
semantic enrichment of mappings

SemanticEnrichmentofMappings

Patrick Arnold

WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig

outline
Outline

AbteilungfürDatenbanken, Inst. fürInformatik, Universität Leipzig

1. Motivation

2. Goals

3. Related Work

4. Determining the Relation Type

5. Implementation

6. First Results

7. Conclusions

1 motivation
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Classic approaches in schema/ontology matching provide only little information about the correspondences
    • Source node
    • Target node
    • Confidence
  • Further details are commonly omitted
    • What kind of relation?
      • equal, is-a, part-of, overlap
    • Simple correspondence vs. complex correspondence?
      • (first name, last name) ↔ name
    • Transformation functions?
      • gross price = net price * (1 + sales taxes)
      • name = first name + “ “ + last name
1 motivation1
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Our intentions: Mapping enrichment
    • Enhance a mapping by adding further or more-specific information to its correspondences
    • Useful for merging and transforming schemas/ontologies
  • Workflow:
    • Input: A mapping
    • Mapping enrichment carried out in an independent system (blackbox)
    • Output is an enriched mapping
      • Implies a new, more-specific format
1 motivation2
1. Motivation

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Typical relation types
    • Equal
    • Is-a
    • Part-of
    • Overlap
  • Inverse types:
    • Equal
    • Inverse is-a
    • Has-a
    • Overlap
2 goals
2. Goals

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • First Focus: Detecting the relation type of a correspondence
    • Investigate linguistic methods on element level
    • Extension by existing strategies possible
    • equal, is-a, inverse is-a
  • Later…
    • Relation type detection on instance level
    • Exploiting background knowledge
    • Correspondence type, transformation rules, …
3 related work
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Several projects dealing with this problem
  • Mainly based on the following methods:
    • Using dictionaries, thesauri, corpora
      • WordNet, GermaNet
      • Includes tokenization, normalization of strings etc.
    • Using background knowledge
      • The Open University: Using Swoogle to retrieve multiple ontologies referring to a concept
    • Exploiting the structure between ontologies
    • Exploiting Reasoning, Bayes Nets, Feature Vectors etc.
    • Search Engines (Google)
3 related work1
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • SMatch
    • Complex strategy using WordNet to determine the following relations:
      • Equal, more-general, less-general, overlap, mismatch
      • “Overlap” offers few interesting information (concepts are somehow related…)
    • Approach: To each word in a label, annotate all meanings of this word found in WordNet
      • Compare/match the meanings of the words
      • Exploit the relations offered by WordNet
3 related work2
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • TaxoMap
    • Focus on geographic ontologies
    • Detect relations equal, is-a, invis-a and is-close
      • Focus rather on the correspondence itself, not on the type
      • Is-a relation if a label in node S appears in node T and is a full word
    • Use WordNet as additional source
      • Working on manually pre-defined branches of WordNet instead of the entire thesaurus
      • Useful for domain-specific ontologies
      • Recall: 23 %, Precision: 83 %
3 related work3
3. Related Work

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • LogMap
    • Uses reasoning algorithms to repair/discover mappings
      • Based on Horn logics and Dowling-Gallier-Algorithm
      • Use background knowledge (thesauri)
    • Detects full correspondences and weak correspondences
    • No specific relation detection per se
4 relation type determination 4 1 introduction
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Typically, there is no link between the syntax and semantics of words
    • stool, chair, seat… refer to the same object
    • stool, school, tool, pool, wool… have nothing in common!
  • Things change when it comes to compounds…
    • blackbird is a bird
    • high school is a school
4 relation type determination 4 1 introduction1
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Compound: Two words A, B of a language form a new word AB
    • apple + tree → apple tree
    • sun + glasses → sunglasses
    • forth + with → forthwith
  • A, B can be noun, verb, adjective/adverb, preposition
    • We are normally interested in nouns
4 relation type determination 4 1 introduction2
4. Relation Type Determination4.1 Introduction

WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig

  • No compounds are...
    • Compositions AB where A (or B) is not an official word
      • broom, nausea
    • Derivations
      • discard, unload, increase, compound
    • Compositions AB where A and B are not semantically related
      • door (do + or), wither (wit + her)
4 relation type determination 4 1 introduction3
4. Relation Type Determination4.1 Introduction

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Unlike non-compounds, semantics can be generally derived from the compound’s syntax
    • Especially in nouns
      • blackboard is a board
      • handbag is a bag
  • Germanic languages are left-branching
    • Germanic: school bus, central intelligence agency
    • Romanic: rio de laspalmas(= palm river)
  • In English, no changes are applied to the words:
    • German: Ort + Eingang → Ortseingang, Stadt + Bau → Städtebau
    • English: city + limit → city limit, city + planning → cityplanning
4 relation type determination 4 2 classification
4. Relation Type Determination4.2 Classification

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

From an Linguistic point of views…

* C⊈A, C⊈B, AB ~R B

4 relation type determination 4 2 classification1
4. Relation Type Determination4.2 Classification

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • From the English point of view…
    • Closed form
      • database, playground, blackbird
    • Hyphened form
      • bus-driver, single-minded, small-appliance industry
    • Open form
      • web space, container ship, computer scientist
  • From a POS point of view…
    • noun-noun, adjective-noun, verb-verb, …
4 relation type determination 4 3 first conclusions
4. Relation Type Determination4.3 First Conclusions

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • From the knowledge now gained, we can enrich correspondences in schemas in two ways:
    • Set the relation type to is-a instead of equal (1)
    • Remove or at least doubt an existing correspondence (2)
  • For (1) we assume that AB ⊂ B
    • (cookbook, book, 0.8, equal) → (cookbook, book, 0.8, is-a)
  • For (2) we assume that If A is not a word in AB, the correspondence is likely to be false:
    • (stool, tool, 0.9, equal) → false?
    • (refund, fund, 0.7, equal) → false?
    • (discharge, charge, 0.7, equal) → false?
4 relation type determination 4 4 mismatches
4. Relation Type Determination4.4 Mismatches

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • A word changed its spelling over the centuries:
      • butterfly (“flutter-by”, “beat fly”, …)
      • Weiße Elster (from Czech: alstra = water)
  • A compound is of literal meaning (metaphor):
    • Completely different meaning
      • computer mouse, gravy train, buttercup
    • Obvious origin (in a broad sense being related):
      • airport, birdhouse, downtown, snowman
4 relation type determination 4 4 mismatches1
4. Relation Type Determination4.4 Mismatches

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Inaccuracies in (vernacular) language
    • e.g., in biology: strawberry, blackberry, raspberry etc.
      • Neither is a berry in the biological sense
      • (yet tomato, banana, grape, pumpkin, melon etc. are)
4 relation type determination 4 4 mismatches2
4. Relation Type Determination4.4 Mismatches

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • For detecting the relation type, the mismatch problem has no negative effect on the mapping
    • The correspondence is wrong after all
      • (buttercup, cup, equal) is as wrong as(buttercup, cup, is-a)
    • Enrichment has no negative effect on the mapping per se
      • Still, enhanced methods can be used to reduce the mismatches
5 implementation 5 1 goals
5. Implementation5.1 Goals

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Specify the following relation types on linguistic methods: equal (default), is-a, inverse is-a
    • Missing: part-of and overlap
    • English and German language
      • Main focus on English language
  • Possibly apply mapping repair
    • Remove correspondences that seem clearly wrong
  • Test & Evaluation
5 implementation 5 1 goals1
5. Implementation5.1 Goals

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • First concentrate on the element level
    • Use linguistic knowledge as presented before
    • Different cases to be distinguished
      • Single items vs. itemizations
5 implementation 5 2 cases
5. Implementation5.2 Cases

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Simple Case (1:1)
    • Source and target node consist of one item
      • blackboard ↔ board
      • high school ↔ school
      • international database conference ↔ conference
5 implementation 5 2 cases1
5. Implementation5.2 Cases

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Complex Cases (1:n, n:1, n:m)
    • Source/target node consist of several item
      • blackboard, whiteboard ↔ board
      • wine ↔ white wine, red wine
      • beer, wine ↔ wine
      • computers, laptops ↔ computers
5 implementation 5 3 node level vs path level
5. Implementation5.3 Node Level vs. Path level

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Relation type depends on the perspective…
    • Node level vs. Path level
    • Relation is often…
      • is-a on node level
      • equal on path level
5 implementation 5 3 node level vs path level1
5. Implementation5.3 Node Level vs. Path level

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

5 implementation 5 4 requirements
5. Implementation5.4 Requirements

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Benchmarks / Gold Standards (English language)
    • Manually defined
  • Dictionary / Thesauri
  • More-specific data structure
    • Correspondence: source node, target node, confidence, type
    • Node: A list of items
    • Item: A list of word
    • Word: single word vs. compound
5 implementation 5 5 generating benchmarks
5. Implementation5.5 Generating Benchmarks

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Benchmarks
    • More difficult than in standard mappings
    • In some cases even for humans difficult to decide
      • Birdhouse is a house?
      • Airport is a port?
    • How to judge correspondences in an evaluation?
      • car = bike → FALSE
      • car = auto → TRUE
      • motorbike ⊂ bike → ?
5 implementation 5 6 challenges
5. Implementation5.6 Challenges

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Exocentric compounds
    • Airport, buttercup, saw tooth, …
  • Compounds in itemizations
    • (French wine, German wine — French wine) inverse is-a
    • (French wine, German wine — European wine)is-a
    • (French wine, German wine — Mosel wine) overlap
    • (French wine, German wine — Italian wine) mismatch
5 implementation 5 6 challenges1
5. Implementation5.6 Challenges

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Plurals
    • (Christian churches — church)
    • (red wine, white wine — wines)
  • Short forms
    • Infant colic — colic (equal instead of is-a)
  • Node Level vs. Path Level
    • Compound extending/skipping levels in the schema
5 implementation 5 6 challenges2
5. Implementation5.6 Challenges

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Limited recall
    • Strong dependency to input (mapping)
    • Some is-a relations cannot be detected with simple linguistic methods
      • (car, vehicle)
      • (wine, beverage)
      • (cell phones, communication devices)
6 first results
6. First Results

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Web ↔ Yahoo
    • 421 Correspondences
    • 68 subset-correspondences
  • Found 50 subset-relations, with 34 being correct
    • Recall: 50.0 %
    • Precision: 68.0 %
    • f-Measure: 59.0 %
6 first results1
6. First Results

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Google Health ↔ Yahoo Health (excerpt)
    • 396 Correspondences
    • 31 subset-correspondences
  • Found 20 subset-relations, with 15 being correct
    • Recall: 48.3 %
    • Precision: 75.0 %
    • f-Measure: 61.6 %
6 first results2
6. First Results

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Main issues observed…
    • Imprecise labels
      • infant colic — colic (equal)
      • Uterine-Fibroids —Uterus.Fibroids(equal)
      • picture frames — frames (equal in field “arts”)
    • Node-Path-Discrepancies
    • “No-Compound”-Subsets
      • vehicle — car (isa)
7 conclusions
7. Conclusions

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  • Mapping Enrichment
    • Relation type
    • Simple vs. complex correspondences
      • Transformation rules
  • Relation Type Determination
    • Linguistic approach on element level
      • Compounds, itemizations
    • Advanced methods
      • Instance level, background knowledge etc.
      • Increase recall, keep up precision
discussion
Discussion

WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

ThankYou!