schema matching
Download
Skip this Video
Download Presentation
Schema Matching

Loading in 2 Seconds...

play fullscreen
1 / 30

Schema Matching - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

Schema Matching. Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Ma ßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm. Goals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Schema Matching' - feivel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
schema matching

Schema Matching

Matching Large XML Schemas

Erhard Rahm, Hong-Hai Do, Sabine Maßmann

Putting Context into Schema Matching

Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster

COMA - A System for Flexible Combination of Schema Matching Approaches

Hongai-Hai Do, Erhard Rahm

Christiano Santiago

goals
Goals
  • Introductory concepts on Schema Matching
  • Context-Sensitive versus Context-Insensitive
  • Complexity on XSD schemas

Christiano Santiago

agenda
Agenda
  • Terminology
  • Different Approaches
  • XML Schema Definition
  • Context-Insensitive
  • Context-Sensitive
  • Q&A

Christiano Santiago

terminology
Terminology
  • Schema matching: it is the process of identifying that two objects are semantically related.
  • Mapping: it refers to the transformations between the objects.

Meaning

Conversion

Christiano Santiago

terminology1
Terminology

Student

Name, SSN, Level,

Major, Marks

GradStudent

Name, ID, Major,

Grades

Christiano Santiago

schema matching1
Schema Matching

Christiano Santiago

context
Context

Context-insensitive

Context-sensitive

Christiano Santiago

different approaches
Different Approaches
  • Schema-level matchers
  • Instance-level matchers
  • Hybrid matchers
  • Reusing matching information

Christiano Santiago

schema level matchers
Schema-Level Matchers
  • Only consider schema information
    • Name
    • Description
    • Data type
    • Relationship
    • Constraints
    • Number of nesting levels

Christiano Santiago

instance level matchers
Instance-Level Matchers
  • Use instance-level to gather insight into the content and meaning of schema elements
    • Linguistic
      • Dept
      • DeptName
      • EmpName
    • Constraints
      • 416-7362100
      • M3J1P3

Christiano Santiago

hybrid level matchers
Hybrid-Level Matchers
  • Combines more than one approach

Christiano Santiago

reusing matching information
Reusing Matching Information
  • Use previous matching information for future matching tasks
    • Structures or substructures often repeat
    • Caution
      • Salary & Income
        • Payroll
        • Tax Reporting

Christiano Santiago

xml schema definition xsd
XML Schema Definition (XSD)
  • Data types
    • 19 built-in primitive data types
    • 25 built-in derived data types
    • User defined complex types

Christiano Santiago

xml schema definition xsd1
XML Schema Definition (XSD)
  • Complex type definition:

Don Smith

Dallas, TX

Child

Elements

Attribute

Christiano Santiago

xml schema definition xsd2
XML Schema Definition (XSD)
  • Shared schema components

Christiano Santiago

xml schema definition xsd3
XML Schema Definition (XSD)
  • Match Systems approaches
    • COMA: path-based
    • Cupid: materialized
  • Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.

Christiano Santiago

xml schema definition xsd4
XML Schema Definition (XSD)
  • Distributed schemas
    • XSD allows a schema to be distributed over several schema documents (.xsd files) and namespaces

Christiano Santiago

xml schema definition xsd5
XML Schema Definition (XSD)

Determining similarity between and

matching complex types can be as difficult

as matching two complete schemas.

Christiano Santiago

standard schema matching context insensitive
Standard Schema Matching Context-Insensitive
  • Matchers
    • Matching algorithms to compute similarity scores between a pair of attributes
  • Weights
    • Scores are weighted
    • Confidence scores are identified based on standard statistical techniques
  • Selection of best matches

Christiano Santiago

fragmented based schema matching c ontext insensitive
Fragmented-Based Schema Matching Context-Insensitive
  • Fragment identification
  • Identifying fragment-pair candidates
  • Fragment matching
  • Result combination

Christiano Santiago

prototype
Prototype
  • Based on COMA: COmbining MAtch algorithm
  • Support to multiple file schema
  • Multiple matching strategies
  • Fragment-based approach
  • Result combination

Christiano Santiago

slide22
COMA
  • Schema representation
  • Schemas are represented by rooted DAGs (Directed Acyclic Graphs).

Christiano Santiago

slide23
COMA
  • Directed Acyclic Graphs
    • Direct graph
    • With no cycles
    • Part tree & part graph
    • Used in Critical Path Analysis,Expression Tree Evaluation and Game Evaluation

Christiano Santiago

slide24
COMA
  • Match processing

reusability

Christiano Santiago

continuity of this work
Continuity of this work
  • 2004: COMA prototype
  • 2005: COMA++, extended previous COMA prototype
    • High quality and fast execution times
    • Default combination of 4 matchers
  • 2007: MOMA: Mapping-based Object Matching

Christiano Santiago

context schema matching c ontext sensitive
Context Schema MatchingContext-Sensitive
  • False Negatives

RS.price.prcode = “reg”

Rs.price.price → RT.music.price

Rs.price.price → RT.music.sale

RS.price.prcode = “sale”

Christiano Santiago

context schema matching c ontext sensitive1
Context Schema MatchingContext-Sensitive
  • Two techniques for selecting contextual matches:
    • MultiTable: find the single match with the highest confidence for every target attribute
    • QualTable: find the best matches on a per-table basis

Christiano Santiago

context schema matching c ontext sensitive2
Context Schema MatchingContext-Sensitive
  • Experimental Results

“Because of its poor performance, MultiTable is not considered further”

Christiano Santiago

conclusion
Conclusion
  • Current schema matching approaches still have to improve for large and complex schemas.
  • The large search space increases the likelihood for false matches as well as execution times.
  • Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.

Christiano Santiago

questions
Questions

Christiano Santiago

ad