Schema matching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Schema Matching PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Schema Matching. Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Ma ßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm. Goals.

Download Presentation

Schema Matching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Schema matching

Schema Matching

Matching Large XML Schemas

Erhard Rahm, Hong-Hai Do, Sabine Maßmann

Putting Context into Schema Matching

Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster

COMA - A System for Flexible Combination of Schema Matching Approaches

Hongai-Hai Do, Erhard Rahm

Christiano Santiago


Goals

Goals

  • Introductory concepts on Schema Matching

  • Context-Sensitive versus Context-Insensitive

  • Complexity on XSD schemas

Christiano Santiago


Agenda

Agenda

  • Terminology

  • Different Approaches

  • XML Schema Definition

  • Context-Insensitive

  • Context-Sensitive

  • Q&A

Christiano Santiago


Terminology

Terminology

  • Schema matching: it is the process of identifying that two objects are semantically related.

  • Mapping: it refers to the transformations between the objects.

Meaning

Conversion

Christiano Santiago


Terminology1

Terminology

Student

Name, SSN, Level,

Major, Marks

GradStudent

Name, ID, Major,

Grades

Christiano Santiago


Schema matching1

Schema Matching

Christiano Santiago


Context

Context

Context-insensitive

Context-sensitive

Christiano Santiago


Different approaches

Different Approaches

  • Schema-level matchers

  • Instance-level matchers

  • Hybrid matchers

  • Reusing matching information

Christiano Santiago


Schema level matchers

Schema-Level Matchers

  • Only consider schema information

    • Name

    • Description

    • Data type

    • Relationship

    • Constraints

    • Number of nesting levels

Christiano Santiago


Instance level matchers

Instance-Level Matchers

  • Use instance-level to gather insight into the content and meaning of schema elements

    • Linguistic

      • Dept

      • DeptName

      • EmpName

    • Constraints

      • 416-7362100

      • M3J1P3

Christiano Santiago


Hybrid level matchers

Hybrid-Level Matchers

  • Combines more than one approach

Christiano Santiago


Reusing matching information

Reusing Matching Information

  • Use previous matching information for future matching tasks

    • Structures or substructures often repeat

    • Caution

      • Salary & Income

        • Payroll

        • Tax Reporting

Christiano Santiago


Xml schema definition xsd

XML Schema Definition (XSD)

  • Data types

    • 19 built-in primitive data types

    • 25 built-in derived data types

    • User defined complex types

Christiano Santiago


Xml schema definition xsd1

XML Schema Definition (XSD)

  • Complex type definition:

    <complexType name="myNewNameType">

    <complexContent>

    <restriction base="anyType">

    <sequence>

    <element name="name" type="string" />

    <element name="location" type="string" />

    </sequence>

    <attribute name="position" type="string" />

    </restriction>

    </complexContent>

    </complexType>

    <element name="employee" type="dc:myNewNameType" />

    <dc:employee position="trainer">

    <dc:name>Don Smith</dc:name>

    <dc:location>Dallas, TX</dc:location>

    </dc:employee>

Child

Elements

Attribute

Christiano Santiago


Xml schema definition xsd2

XML Schema Definition (XSD)

  • Shared schema components

Christiano Santiago


Xml schema definition xsd3

XML Schema Definition (XSD)

  • Match Systems approaches

    • COMA: path-based

    • Cupid: materialized

  • Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.

Christiano Santiago


Xml schema definition xsd4

XML Schema Definition (XSD)

  • Distributed schemas

    • XSD allows a schema to be distributed over several schema documents (.xsd files) and namespaces

Christiano Santiago


Xml schema definition xsd5

XML Schema Definition (XSD)

Determining similarity between and

matching complex types can be as difficult

as matching two complete schemas.

Christiano Santiago


Standard schema matching context insensitive

Standard Schema Matching Context-Insensitive

  • Matchers

    • Matching algorithms to compute similarity scores between a pair of attributes

  • Weights

    • Scores are weighted

    • Confidence scores are identified based on standard statistical techniques

  • Selection of best matches

Christiano Santiago


Fragmented based schema matching c ontext insensitive

Fragmented-Based Schema Matching Context-Insensitive

  • Fragment identification

  • Identifying fragment-pair candidates

  • Fragment matching

  • Result combination

Christiano Santiago


Prototype

Prototype

  • Based on COMA: COmbining MAtch algorithm

  • Support to multiple file schema

  • Multiple matching strategies

  • Fragment-based approach

  • Result combination

Christiano Santiago


Schema matching

COMA

  • Schema representation

  • Schemas are represented by rooted DAGs (Directed Acyclic Graphs).

Christiano Santiago


Schema matching

COMA

  • Directed Acyclic Graphs

    • Direct graph

    • With no cycles

    • Part tree & part graph

    • Used in Critical Path Analysis,Expression Tree Evaluation and Game Evaluation

Christiano Santiago


Schema matching

COMA

  • Match processing

reusability

Christiano Santiago


Continuity of this work

Continuity of this work

  • 2004: COMA prototype

  • 2005: COMA++, extended previous COMA prototype

    • High quality and fast execution times

    • Default combination of 4 matchers

  • 2007: MOMA: Mapping-based Object Matching

Christiano Santiago


Context schema matching c ontext sensitive

Context Schema MatchingContext-Sensitive

  • False Negatives

RS.price.prcode = “reg”

Rs.price.price → RT.music.price

Rs.price.price → RT.music.sale

RS.price.prcode = “sale”

Christiano Santiago


Context schema matching c ontext sensitive1

Context Schema MatchingContext-Sensitive

  • Two techniques for selecting contextual matches:

    • MultiTable: find the single match with the highest confidence for every target attribute

    • QualTable: find the best matches on a per-table basis

Christiano Santiago


Context schema matching c ontext sensitive2

Context Schema MatchingContext-Sensitive

  • Experimental Results

    “Because of its poor performance, MultiTable is not considered further”

Christiano Santiago


Conclusion

Conclusion

  • Current schema matching approaches still have to improve for large and complex schemas.

  • The large search space increases the likelihood for false matches as well as execution times.

  • Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.

Christiano Santiago


Questions

Questions

Christiano Santiago


  • Login