1 / 23

Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships. Eduard C. Dragut Ramon Lawrence. University of Illinois at Chicago University of British Columbia Okanagan. ODBASE 2006, Montpellier, France. Talk Overview. Introduction Background

kalona
Download Presentation

Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence University of Illinois at Chicago University of British Columbia Okanagan ODBASE 2006, Montpellier, France

  2. Talk Overview • Introduction • Background • Model and Mapping representation systems • Proposed Mapping Representation System • Invert and Compose operator definitions and properties • Mappings Composition • Experiment • Estimate the quality of the proposed system E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  3. Introduction - Terminology • Models • denote a representation of a domain in a formal language (e.g., EER, Relational, Description Logic) • has two components [Russell et al 2003] • terminological (or metadata) • This is the focus of this work and talk. • extensional (i.e. facts or instances) • Mappings • describe how two models are related to each other E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  4. Introduction - Mappings • Ways to define mappings between models • binary relationships • called morphisms[Melnik et al 2003] or inter-schema correspondences [Popa et al. 2002] • mapping using a helper model • [Bernstein et al. 2003] • mapping as queries • [Madhavan et al. 2003, Berstein et al. 2006] • Our work falls in the class of the first two types of mappings. • We call them metadata level mappings. • They are not concerned with the instances of a model. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  5. Introduction - Examples • Examples of models • diagrams, interface definitions, database schemas, web site layouts, control flow, XML schemas • Applications of mappings • mapping between XML schemas to drive message translation; • schema and database integration; • mapping between ontologies to help in the process of merging and alignment E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  6. Background - Mapping creation • The creation will be rarely completely automated. • General strategy is to semi-automatically build mappings • use heuristics to generate matchings (e.g. name similarity) • [Rahm and Bernstein 2001, Shvaiko and Euzenat 2005] (surveys) • translate matches into formulas • E.g., Clio project [Popa et al. 2002] • generate new mappings from existing mappings • Composition • E.g, [Madhavan et al. 2003, Berstein et al. 2006] • Invert • E.g, [Fagin 2006] • Semi-automatic tools can significantly speed up the process. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  7. Background - Morphisms • Mapping: • is just a set of binary relations between the elements of two models • is a set of pairs < l, r > • Advantages/Disadvantages • their expressiveness is enough for certain classes of problems and they exhibit certain mathematical properties [Melnik et al. 2003] • main drawback • assumes similarity to be transitive E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  8. Background –Morphisms problems • Composition • <ID, ActID> ○ <ActID, ActorID> = <ID, ActorID> • due to transitivity assumption • Problems with this technique • Whenever m:1 correspondence is composed with a 1:n correspondence, the composition result is a cross-product; many being false positives. • It may miss or suggest false relationships. • Legend: • Blue  correct • Red  false positive or missed E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  9. Background – Mapping with helper models • Algorithm (for right compose)[Bernstein et al 2002] • copy the right hand side mapping • for each mapping element, m, on the right, i.e. in map2 • compute its Input(m) • for each mapping element, m, on the right, i.e. in map2 • set its domain to the union of the domains of Input(m) • Example: E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  10. Background – Mapping with helper models • Composition result • Problems with this technique • It may miss or suggest false relationships. • Legend: • Red  missed relationships E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  11. The Objectives • The driving motivation • The need for a mapping definition subsuming the relationship kinds that the state of the art matching algorithms discover with high precision. • Investigate to what extent a set of operations over this mapping definition can be defined. • Provide a mapping representation at the metadata level combining the advantages of morphisms and mappings with helper models. • The former has good mathematical properties. • The latter is more expressive. • Provide a compose algorithmthat exploits the semantic relationships within the mapping expression to produce correct semantic relationships whenever these can be determined automatically and to isolate those instances that require human intervention. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  12. Proposed Mappings Representation • Model • A model has similar expressiveness as an EER model and is consistent with the definition of model used in previous work on model management. [Bernstein et al 2002, Pottinger and Bernstein 2003] • Mapping Representation • A mapping consists of a set of mapping elements, each mapping element is a directed, kinded binary relationship between a pair of elements not in the same model: • Triplets of form < m1,type,m2 >, type = {IsA, AKindOf, HasA, PartOf, =, Contains, ContainedBy, Unknown, Complex} • Comments • Some of these types were introduced in other works. • E.g, [Euzenat 2004, Giunchiglia et al. 2004, Pottinger and Bernstein 2003, Xu and Embley 2003, Wu et al. 2004] E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  13. Proposed Mappings Representation • An example • Most of the relationship kinds in the mapping representation are well-known except for Unknownand Complex • <a,Unknown,b >, means that the relationship between concept a and b is not precisely known. • < a,Complex,b >, the relationship between concept a and b may require a functional specification: a = f(b) • e.g., Price = PriceVat(VAT + 1) E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  14. Operators -Invert • Each of the relationship types introduced have well defined inversion properties: • IsA inverted is AKindOf, HasA inverted is PartOf, Contains inverted is ContainedBy • Definition [invert for mapping elements]: • Consider m = < a,type,b >. Then its corresponding inverted mapping element, denoted m-1, is given by the following expression: < b, type-1,a> • Mathematical form: < a,type, b >-1 = < b, type-1,a > • E.g. < a,HasA,b >-1 = < b,HasA-1,a > • Definition [invert for mappings]: • Given two models A and B and a mapping, map, between them, the invert of map denoted by map-1, is defined from B to A and its expression is given by: map-1 = {< b,type-1,a >| < a,type,b > map} E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  15. Operators -Compose • Composing two mappings involves defining a composition operation between the elements of the mappings (i.e. between triplets of form < a,type,b >) • Example • <HomePhone, IsA, Phone> ○ <Phone, = , Telephone> = < HomePhone, IsA, Telephone> E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  16. Compose Properties • Remarks: • The result of composing two mappings where mapping elements are expressed as triplets < a,type,b > is closed. • Mapping composition is symmetric in this framework: (<a, type,b > ○ < b,type,c >)-1 =< c,type-1,b > ○ < b,type-1,a> • The result of composing two mappings does not produce false correspondences between the elements of the two models, i.e. it does not suggest false directed, kinded relationships. • The Compose operator uses the Unknown relationship to indicate when it is not possible (in general) to suggest a relationship type given only the information expressed in the two mappings. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  17. Experiment - Setup • Experiment goal: • Show that the composition framework is robust when applied to real world application and that we are able to correctly identify problematic cases. • We compare it against mappings as morphisms. • Five real-world XML schemas in the purchase order domain: CIDR, Excel, Noris, Paragon, and Apertum from www.biztalk.org • They were used in other projects: • [Dragut and Lawrence 2004, Madhavan et al. 2001] • And a reference ontology • to which each XML schema is manually mapped both using morphisms [Dragut and Lawrence 2004] and using the new mapping definition. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  18. Experiment - Setup • Example of XML schemas: • XML Excel and CIDR schemas E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  19. Experiment - Intermediary model • Comments: • The intermediary model does not have all concepts in the schemas (e.g. unitOfMeasure, count, and VAT). • The intermediary model is structurally different from the five schemas considered and it is defined using OWL. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  20. Experiment - Methodology • Step 1: map the five schemas to the intermediary model: • First, using morphisms • Second, using the proposed mapping • Step 2: apply the compose operators to compute direct mappings between the schemas • First, employing composition over morphisms • Second, using the new compose operator • Step 3: measure the quality of the two compositions in terms of Precision, Recall, and Overall • A new metric is introduced User Effort. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  21. Experiment -Stats • Overall after composition was computed • CIDR, Excel, Noris, Paragon, and Apertum are assigned numbers 1, 2, 3, 4, and 5 respectively. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  22. Experiment - Stats • User effort is the % of mappings that must be validated by a user. • For morphisms, user effort is 100% as there is no way to distinguish true over false relationships. • In our framework, it is the ratio of the number of Unknown relationships to the number of all produced relationships. On average it is only 19%. E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

  23. End Thank you for your time and patience! E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships

More Related