1 / 18

A Survey of Approaches to Automatic Schema Matching

A Survey of Approaches to Automatic Schema Matching. Erhard Rahm Philip A. Bernstein. The VLDB Journal 10:334-350 (2001). The Problem. Schema matching Input schemas Output mappings Motivations Manual schema matching Generic and customizable schema matching. Application Domains.

ervin
Download Presentation

A Survey of Approaches to Automatic Schema Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)

  2. The Problem • Schema matching • Input schemas • Output mappings • Motivations • Manual schema matching • Generic and customizable schema matching

  3. Application Domains • Schema Integration: Structures and Terminological relationships • Data warehouses: Source-to-warehouse Transformation • E-commerce: Message Translation • Semantic query processing: A Run-time Scenario

  4. The Match Operator • Representations of Input Schemas and Output Mapping • Schema representation • Schema elements • Structure • Mapping representation • Mapping elements • Mapping expressions • Matching Function • Mathematically unsatisfying • Heuristics

  5. Architecture for Generic Match Tool 2 (E-business schemas) Tool 1 (Portal schemas) Tool 3 (Data warehousing schemas) Global libraries (dictionaries, schemas, …) Schema import/export Generic Match Implementation Internal schema representation

  6. Classification of Approaches • Individual matchers • Instance vs Schema • Element vs Structure Matching • Language vs Constraint • Matching Cardinality (1:1, 1:n, n:1, and n:m) • Auxiliary Information • Combinations of multiple matchers

  7. Schema-level Approaches • Granularity of match (element-level vs. structure-level) • Match cardinality • Linguistic approaches • Constraint-based approaches • Reusing schema and mapping information

  8. Granularity of match

  9. Match Cardinality

  10. Linguistic Approaches • Name Matching • Equality of names • Equality of canonical name representations • Equality of synonyms • Equality of hypernyms • Similarity of names based on common substrings, edit distance, pronunciation, and soundex • User provided name matches • Description Matching • Ex. S1: empn //employee name • Ex. S2: name //name of employee

  11. Constraint-based Approaches

  12. Reusing Schema and Mapping Information

  13. Instance-level Approaches • Linguistic characterization • Information retrieval techniques • Ex. Extracting keywords and themes • Constraint-based characterization • Numeric value ranges • Numeric value averages • Character patterns (PhoneNr, ISBNs,, SSNs…)

  14. Combining Different Matchers • Hybrid matchers • Hard-wired combination of multiple matching criteria • Better performance • Composite matchers • Independent basic matchers • Flexible execution order

  15. Sample Approaches • SEMINT • LSD • SKAT • TranScm • DIKE • ARTEMIS • CUPID

  16. Sample Approaches • SEMINT • LSD • SKAT • TranScm • DIKE • ARTEMIS • CUPID

  17. SEMINT LSD TranScm Cupid BYU Approach Schema Type Relational, files XML SGML, OO XML, relational OSM Metadata representation Attribute-based XML Labeled graph Extended ER OSM Match granularity 1:1 1:1 1:1 1:1 and 1:n 1:1 and n:m Schema-level match Name-based * * * * Constraint-based * * * * Structure matching * * * * Instance-level match Text-oriented * * Constraint-oriented * * * Reuse/auxiliary information used * * * * Combination of matches Hybrid Composite Hybrid Hybrid Composite Manual work/ user input * * * * * Application area Data integration Data Integration Data Translation Generic Generic Remarks Neural network

  18. Conclusion • Propose a taxonomy that covers many of the existing approaches • Suggest quantitative work on the relative performance and accuracy of different approaches

More Related