1 / 36

Semantic Enrichment of Mappings

Semantic Enrichment of Mappings. Patrick Arnold. Outline. 1. Motivation 2. Goals 3. Related Work 4 . Determining the Relation Type 5. Implementation 6. First Results 7. Conclusions. 1. Motivation.

Download Presentation

Semantic Enrichment of Mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SemanticEnrichmentofMappings Patrick Arnold WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig

  2. Outline AbteilungfürDatenbanken, Inst. fürInformatik, Universität Leipzig 1. Motivation 2. Goals 3. Related Work 4. Determining the Relation Type 5. Implementation 6. First Results 7. Conclusions

  3. 1. Motivation WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Classic approaches in schema/ontology matching provide only little information about the correspondences • Source node • Target node • Confidence • Further details are commonly omitted • What kind of relation? • equal, is-a, part-of, overlap • Simple correspondence vs. complex correspondence? • (first name, last name) ↔ name • Transformation functions? • gross price = net price * (1 + sales taxes) • name = first name + “ “ + last name

  4. 1. Motivation WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Our intentions: Mapping enrichment • Enhance a mapping by adding further or more-specific information to its correspondences • Useful for merging and transforming schemas/ontologies • Workflow: • Input: A mapping • Mapping enrichment carried out in an independent system (blackbox) • Output is an enriched mapping • Implies a new, more-specific format

  5. 1. Motivation WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Typical relation types • Equal • Is-a • Part-of • Overlap • Inverse types: • Equal • Inverse is-a • Has-a • Overlap

  6. 2. Goals WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • First Focus: Detecting the relation type of a correspondence • Investigate linguistic methods on element level • Extension by existing strategies possible • equal, is-a, inverse is-a • Later… • Relation type detection on instance level • Exploiting background knowledge • Correspondence type, transformation rules, …

  7. 3. Related Work WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Several projects dealing with this problem • Mainly based on the following methods: • Using dictionaries, thesauri, corpora • WordNet, GermaNet • Includes tokenization, normalization of strings etc. • Using background knowledge • The Open University: Using Swoogle to retrieve multiple ontologies referring to a concept • Exploiting the structure between ontologies • Exploiting Reasoning, Bayes Nets, Feature Vectors etc. • Search Engines (Google)

  8. 3. Related Work WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • SMatch • Complex strategy using WordNet to determine the following relations: • Equal, more-general, less-general, overlap, mismatch • “Overlap” offers few interesting information (concepts are somehow related…) • Approach: To each word in a label, annotate all meanings of this word found in WordNet • Compare/match the meanings of the words • Exploit the relations offered by WordNet

  9. 3. Related Work WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • TaxoMap • Focus on geographic ontologies • Detect relations equal, is-a, invis-a and is-close • Focus rather on the correspondence itself, not on the type • Is-a relation if a label in node S appears in node T and is a full word • Use WordNet as additional source • Working on manually pre-defined branches of WordNet instead of the entire thesaurus • Useful for domain-specific ontologies • Recall: 23 %, Precision: 83 %

  10. 3. Related Work WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • LogMap • Uses reasoning algorithms to repair/discover mappings • Based on Horn logics and Dowling-Gallier-Algorithm • Use background knowledge (thesauri) • Detects full correspondences and weak correspondences • No specific relation detection per se

  11. 4. Relation Type Determination4.1 Introduction WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Typically, there is no link between the syntax and semantics of words • stool, chair, seat… refer to the same object • stool, school, tool, pool, wool… have nothing in common! • Things change when it comes to compounds… • blackbird is a bird • high school is a school

  12. 4. Relation Type Determination4.1 Introduction WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Compound: Two words A, B of a language form a new word AB • apple + tree → apple tree • sun + glasses → sunglasses • forth + with → forthwith • A, B can be noun, verb, adjective/adverb, preposition • We are normally interested in nouns

  13. 4. Relation Type Determination4.1 Introduction WDI-Lab, AbteilungfürDatenbanken, Universität Leipzig • No compounds are... • Compositions AB where A (or B) is not an official word • broom, nausea • Derivations • discard, unload, increase, compound • Compositions AB where A and B are not semantically related • door (do + or), wither (wit + her)

  14. 4. Relation Type Determination4.1 Introduction WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Unlike non-compounds, semantics can be generally derived from the compound’s syntax • Especially in nouns • blackboard is a board • handbag is a bag • Germanic languages are left-branching • Germanic: school bus, central intelligence agency • Romanic: rio de laspalmas(= palm river) • In English, no changes are applied to the words: • German: Ort + Eingang → Ortseingang, Stadt + Bau → Städtebau • English: city + limit → city limit, city + planning → cityplanning

  15. 4. Relation Type Determination4.2 Classification WDI-Lab, Abteilung für Datenbanken, Universität Leipzig From an Linguistic point of views… * C⊈A, C⊈B, AB ~R B

  16. 4. Relation Type Determination4.2 Classification WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • From the English point of view… • Closed form • database, playground, blackbird • Hyphened form • bus-driver, single-minded, small-appliance industry • Open form • web space, container ship, computer scientist • From a POS point of view… • noun-noun, adjective-noun, verb-verb, …

  17. 4. Relation Type Determination4.3 First Conclusions WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • From the knowledge now gained, we can enrich correspondences in schemas in two ways: • Set the relation type to is-a instead of equal (1) • Remove or at least doubt an existing correspondence (2) • For (1) we assume that AB ⊂ B • (cookbook, book, 0.8, equal) → (cookbook, book, 0.8, is-a) • For (2) we assume that If A is not a word in AB, the correspondence is likely to be false: • (stool, tool, 0.9, equal) → false? • (refund, fund, 0.7, equal) → false? • (discharge, charge, 0.7, equal) → false?

  18. 4. Relation Type Determination4.4 Mismatches WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • A word changed its spelling over the centuries: • butterfly (“flutter-by”, “beat fly”, …) • Weiße Elster (from Czech: alstra = water) • A compound is of literal meaning (metaphor): • Completely different meaning • computer mouse, gravy train, buttercup • Obvious origin (in a broad sense being related): • airport, birdhouse, downtown, snowman

  19. 4. Relation Type Determination4.4 Mismatches WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Inaccuracies in (vernacular) language • e.g., in biology: strawberry, blackberry, raspberry etc. • Neither is a berry in the biological sense • (yet tomato, banana, grape, pumpkin, melon etc. are)

  20. 4. Relation Type Determination4.4 Mismatches WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • For detecting the relation type, the mismatch problem has no negative effect on the mapping • The correspondence is wrong after all • (buttercup, cup, equal) is as wrong as(buttercup, cup, is-a) • Enrichment has no negative effect on the mapping per se • Still, enhanced methods can be used to reduce the mismatches

  21. 5. Implementation5.1 Goals WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Specify the following relation types on linguistic methods: equal (default), is-a, inverse is-a • Missing: part-of and overlap • English and German language • Main focus on English language • Possibly apply mapping repair • Remove correspondences that seem clearly wrong • Test & Evaluation

  22. 5. Implementation5.1 Goals WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • First concentrate on the element level • Use linguistic knowledge as presented before • Different cases to be distinguished • Single items vs. itemizations

  23. 5. Implementation5.2 Cases WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Simple Case (1:1) • Source and target node consist of one item • blackboard ↔ board • high school ↔ school • international database conference ↔ conference

  24. 5. Implementation5.2 Cases WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Complex Cases (1:n, n:1, n:m) • Source/target node consist of several item • blackboard, whiteboard ↔ board • wine ↔ white wine, red wine • beer, wine ↔ wine • computers, laptops ↔ computers

  25. 5. Implementation5.3 Node Level vs. Path level WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Relation type depends on the perspective… • Node level vs. Path level • Relation is often… • is-a on node level • equal on path level

  26. 5. Implementation5.3 Node Level vs. Path level WDI-Lab, Abteilung für Datenbanken, Universität Leipzig

  27. 5. Implementation5.4 Requirements WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Benchmarks / Gold Standards (English language) • Manually defined • Dictionary / Thesauri • More-specific data structure • Correspondence: source node, target node, confidence, type • Node: A list of items • Item: A list of word • Word: single word vs. compound

  28. 5. Implementation5.5 Generating Benchmarks WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Benchmarks • More difficult than in standard mappings • In some cases even for humans difficult to decide • Birdhouse is a house? • Airport is a port? • How to judge correspondences in an evaluation? • car = bike → FALSE • car = auto → TRUE • motorbike ⊂ bike → ?

  29. 5. Implementation5.6 Challenges WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Exocentric compounds • Airport, buttercup, saw tooth, … • Compounds in itemizations • (French wine, German wine — French wine) inverse is-a • (French wine, German wine — European wine)is-a • (French wine, German wine — Mosel wine) overlap • (French wine, German wine — Italian wine) mismatch

  30. 5. Implementation5.6 Challenges WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Plurals • (Christian churches — church) • (red wine, white wine — wines) • Short forms • Infant colic — colic (equal instead of is-a) • Node Level vs. Path Level • Compound extending/skipping levels in the schema

  31. 5. Implementation5.6 Challenges WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Limited recall • Strong dependency to input (mapping) • Some is-a relations cannot be detected with simple linguistic methods • (car, vehicle) • (wine, beverage) • (cell phones, communication devices)

  32. 6. First Results WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Web ↔ Yahoo • 421 Correspondences • 68 subset-correspondences • Found 50 subset-relations, with 34 being correct • Recall: 50.0 % • Precision: 68.0 % • f-Measure: 59.0 %

  33. 6. First Results WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Google Health ↔ Yahoo Health (excerpt) • 396 Correspondences • 31 subset-correspondences • Found 20 subset-relations, with 15 being correct • Recall: 48.3 % • Precision: 75.0 % • f-Measure: 61.6 %

  34. 6. First Results WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Main issues observed… • Imprecise labels • infant colic — colic (equal) • Uterine-Fibroids —Uterus.Fibroids(equal) • picture frames — frames (equal in field “arts”) • Node-Path-Discrepancies • “No-Compound”-Subsets • vehicle — car (isa)

  35. 7. Conclusions WDI-Lab, Abteilung für Datenbanken, Universität Leipzig • Mapping Enrichment • Relation type • Simple vs. complex correspondences • Transformation rules • Relation Type Determination • Linguistic approach on element level • Compounds, itemizations • Advanced methods • Instance level, background knowledge etc. • Increase recall, keep up precision

  36. Discussion WDI-Lab, Abteilung für Datenbanken, Universität Leipzig ThankYou!

More Related