1 / 22

A survey of approaches to automatic schema matching

A survey of approaches to automatic schema matching. Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.

torie
Download Presentation

A survey of approaches to automatic schema matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001

  2. Schema matching: produce a mapping between elements of two schemas such that the elements in the mapping correspond semantically to each other. Problem Schema 1 Schema 2 A real-world problem: Schema integration, Data warehouses, E-commerce, Semantic query processing

  3. Problem (cont.) • Manual schema matching: tedious, time-consuming, error-prone and therefore expensive. • Automated schema matching: the solution The paper surveys approaches for automated schema matching and presents a taxonomy.

  4. Problem and applications Match operator Classification Schema level matchers Instance level matchers Combining matchers Prototype implementations Conclusion Critique Overview

  5. Match is an abstract operator for implementing schema matching Input: two input schemas Output: a set of mapping elements Match is based on heuristics that approximate what the user considers to be a good match Implementations of match produces ’match candidates’ Not possible to determine all matches automatically Match

  6. Match (cont.) User acceptance Match Schema 1 Schema 2

  7. Generic Match Architecture

  8. Classification

  9. Classification

  10. Element-level Linguistic approaches: Similarity of names, e.g. FirstName  first_name Equality of synonyms, e.g. car  automobile Equality of hypernyms, i.e. book  publication, article  publication Description matching: S1: empn // employee name S2: name // name of employee Constraint-based approaches: Data types, e.g. varchar  text Value ranges Uniqueness Structural-level Schema-level matchers

  11. Classification

  12. Linguistic characterization Keywords, frequencies of words, combinations, etc. Instance-level matchers Schema 1 Schema 2 match

  13. Constraint-based characterization Character patterns and numerical value ranges Instance-level matchers Schema 1 Schema 2 match

  14. Classification

  15. The best result is archived by combining multiple matchers Two types: Hybrid matchers Composite matchers Combining matchers • Hybrid matcher Datatypes Names Value ranges

  16. The best result is archived by combining multiple matchers Two types: Hybrid matchers Composite matchers Combining matchers • Composite matcher Name matcher Datatypematcher

  17. Prototype implementations

  18. 15 contraint-based, 5 contant-based matching criteria Each criteria is mapped to a range [0..1] for every element. Yields an N-dimensional point for N matching criteria Example: SemInt Data type CName 1 Company C# CustID 0 Field length 0 1

  19. Proposes a taxonomy Characterizes and compares previous implementations using this taxonomy Useful for: Programmers who need to implement Match Researchers looking to develop better algorithms Proposes subjects for further research: Test of performance and accuracy of existing approaches Better utilization of instance-level information Conclusion

  20. Good: Provides a good overview of the subject, Fig. 2 and Table 5 in particular Good at pointing out subjects that should be researched further Taxonomy is easy to understand and is explained well Could be improved: Does not compared performance or correctness of implementations No examples in the descripton of existing implementations Lacking good examples of structural level matching Relative performance of implementations are mentioned only once: ”Cupid performed somewhat better overall”. Cupid is developed by the authors. Critique

  21. Questions? Questions?

  22. Questions?

More Related