1 / 20

Generic Schema Matching with Cupid

Generic Schema Matching with Cupid. Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference. Schema Matching. Schema Matching (Cont.). Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other

vivek
Download Presentation

Generic Schema Matching with Cupid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27th VLDB Conference

  2. Schema Matching

  3. Schema Matching (Cont.) • Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other • Applications • Schema integration • Data translation • XML message mapping • Data warehouse loading • Goal

  4. Taxonomy • Schema vs. Instance based • Element vs. Structure granularity • Linguistic based • Constraint based • Matching cardinality • Auxiliary information • Individual vs. Combinational

  5. Cupid • Schema-based • Automated linguistic-based matching • Both element-based and structure-based • Biased toward similarity of atomic elements • Exploits internal structure • Exploits keys, referential constraints and views • Makes context-dependent matches of a shard type • 1:n mapping

  6. Similarity Coefficient Computation • First Phase: Linguistic matching • Names • Data types • Domains • Linguistic similarity coefficient:lsim • Second Phase: Structural matching • Contexts • Linguistic similarity coefficients • Structural similarity coefficient:ssim • Hybrid (wsim = w_struct * ssim + (1-w_struct) * lsim)

  7. Linguistic Matching • Normalization • Tokenization • Expansion • elimination • Categorization • Data types • Schema hierarchy • Linguistic contents • Comparison—Linguistic Similarity Coefficient (lsim) • Thesaurus • Sub-string matching

  8. Bottom-up Mutually Recursive Structural Matching

  9. Example

  10. Example (Cont.)

  11. Example (Cont.)

  12. General Schemas • Schema Graphs • Elements • Relationships(containment, aggregation, and IsDerivedFrom) • Matching Shard Types (context dependent mappings) • Matching Referential Constraints

  13. Matching Shard Types

  14. Matching Referential Constraints

  15. Other Features • Optionality • Views • Initial Mappings • Lazy Expansion • Pruning Leaves

  16. Comparative Study • Algorithms • MOMIS • DIKE • Cupid • Canonical Examples • Real World Example

  17. Canonical Examples • Identical schemas • Atomic elements with same names, but different data types • Atomic elements with same data types, but different names (a prefix or suffix is added) • Different class names, but atomic elements same names and data types • Different nesting of the data – similar schemas with nested and flat structures • Type substitution or context dependent mapping

  18. Real World Example

  19. Experimental Conclusions • Linguistic matching • Thesaurus • Linguistic similarity with no structure similarity • Granularity of similarity computation • Leaves • Structure information beyond the immediate vicinity • Context-dependent mappings • Performance parameters

  20. Future Work • A Truly Robust Solution • Machine learning applied to instances • Natural language technology • Pattern matching to reuse known matches • Immediate Challenges • Off-the-shelf thesaurus • Schema annotations • Automatic tuning of the control parameters • Scalability analysis and testing • More comparative analysis of algorithms

More Related