1 / 26

Discovering simple mappings between Relational database schemas and ontologies

Discovering simple mappings between Relational database schemas and ontologies. Wei Hu, Yuzhong Qu {whu, yzqu}@seu.edu.cn Institute of Web Science School of Computer Science and Engineering Southeast University, China. Outline. Introduction Our approach Evaluation Related work

ahanu
Download Presentation

Discovering simple mappings between Relational database schemas and ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu{whu, yzqu}@seu.edu.cn Institute of Web Science School of Computer Science and Engineering Southeast University, China ISWC2007, Nov. 14.

  2. Outline • Introduction • Our approach • Evaluation • Related work • Summary and future work ISWC2007, Nov. 14.

  3. Introduction • The popularity of ontologies is rapidly growing since the emergence of the Semantic Web. • Swoogle collected more than 10,000 ontologies so far. • Falcons indexed more than 2 million classes/properties. • However, most of the world’s data today are still locked in data stores, and are not published as an open Web of inter-referring resources. [Ref.4. Creating a science of the Web. 2006] • About 77.3% data on the current Web are stored in relational databases. [Ref.6. SIGMOD Record. 33(3) (2004)] • So, it is necessary to establish interoperability between (Semantic) Web applications using relational databases and ontologies for creating a Web of data. ISWC2007, Nov. 14.

  4. Introduction – By an example • Left part: relations, attributes, primary keys, foreign keys. • Right part: classes, properties (data valued or object properties) ISWC2007, Nov. 14.

  5. Introduction (cont’d) • Manually discovering such simple mappings is tedious and improbable at the Web scale. • So (semi-) automatic approaches have been proposed. • Not well consider the characteristics of relational data models and ontology model • The mappings are not accurate enough. • Most of the present approaches cannot construct semantic mappings • The (missed) semantic mappings are useful in various practical applications. ISWC2007, Nov. 14.

  6. Introduction – the contribution • We propose a new approach to discovering simple mappings • It constructs virtual documents for the entities • To discover mappings by comparing virtual documents . • It validates mapping consistency • To eliminate certain incorrect mappings. • It explores contextual mappings • Can be transformed directly to view-based mappings with selection conditions. • Be useful for applications in real world domains. [Ref. 5. Putting context into schema matching. VLDB'06] ISWC2007, Nov. 14.

  7. Introduction – Terminology • R denotes a relation, and A denotes an attribute. • type(A): the domain name of A; • rel(A): the relation which specifies A; • pk(R): the attributes appeared as the primary keys of R; • ref(A): the attributes referenced by A; • C represents a class, and P represents a property. PDdenotes a data valued property and POdenotes an object property. • d(P): the domain(s) of P; • r(P): its range(s) of P. ISWC2007, Nov. 14.

  8. Introduction – Terminology (cont’d) • A mapping m is a5-tuple:< id, u, v, t, f >, where: • id is a unique identifier; • u is an entity in {R} ∪ {A}, and v is an entity in {C}∪ {P}; • t is a relationship, e.g. equivalence and subsumption, holding between u and v; • f is a confidence measure in the [0, 1] range. • Examples • < 1, writes, hasAuthor, , 1.0 > • < 2, id, hasID, , 1.0 > • < 3, Paper, JournalPaper, , 0.8 > ISWC2007, Nov. 14.

  9. Outline • Introduction • Our approach • Evaluation • Related work • Summary and future work ISWC2007, Nov. 14.

  10. Overview of the approach • Phase 1: Classifying entity types (A preprocess step) • Heuristically classifies entities into different groups, coordinates different characteristics. • Phase 2: Discovering simple mappings • Constructs virtual documents for entities, calculating confidence measure via TF/IDF model. • Phase 3: Validating mapping consistency • Use <relation, class> mappings to validate the consistency of <attribute, property> ; • Also, the comparability between the data types of attributes and data valued properties. • Phase 4: Constructing contextual mappings • <relation, class> + sample instances  contextual mappings. ISWC2007, Nov. 14.

  11. Phase 1: Classifying entity types • Relation: strong entity relation (SER), weak entity relation (WER), regular relationship relation (RRR), specific relationship relation (SRR). • Attribute: foreign key attribute (FKA), non-foreign key attribute (NFKA). [Ref.9. Data & Knowledge Engineering. 12 (1994)] • Group 1: {{SER}∪{WER}}×{C};Group 2: {{RRR}∪{SRR}}×{PO};Group 3: {FKA}×{PO};Group 4: {NFKA}×{{PD}∪{PO}}. • Coordinate different characteristics • Reifying n-arity relationship (n>2) • Others. ISWC2007, Nov. 14.

  12. Phase 2: Discovering simple mappings • We construct virtual documents for the entities in both the relational schema and the ontology to capture their structural information. • A virtual document represents a collection of weighted tokens, which are derived not only from the description of the entity itself, but also from the descriptions of its neighbors. The weights of the tokens indicate their importance, and could then be viewed as a vector in the TF/IDF model. • Rationality: the semantic information of a relational schema is characterized mainly by its ICs; an OWL ontology can be mapped to an RDF graph, which also indicates the semantic information in its structure. ISWC2007, Nov. 14.

  13. Discovering simple mappings (cont’d.) • Relations and attributes: • Classes and properties: ISWC2007, Nov. 14.

  14. Phase 3: Validating mapping consistency • Using mappings between <relations, classes> to validate the consistency of <attributes, properties> mappings. • Attributes cannot stand alone without relations. • The restriction construct in an OWL ontology specifies local domain and range constraints on the classes. ISWC2007, Nov. 14.

  15. Phase 4: Constructing contextual mappings • Focus on a special type of mappings – contextual mappings • Directly translated to conditional mappings or view-based mappings. ISWC2007, Nov. 14.

  16. Constructing contextual mappings (cont’d.) ISWC2007, Nov. 14.

  17. Outline (cont’d.) • Introduction • Our approach • Evaluation • Related work • Summary and future work ISWC2007, Nov. 14.

  18. Evaluation – Data sets • Data sets: http://www.cs.toronto.edu/~yuana/research/maponto/relational/testData.html [Ref.1. MapOnto] • We implemented our approach in Java, called Marson. ISWC2007, Nov. 14.

  19. Evaluation – Experimental methodology • Experiment 1. Discovering simple mappings: • Marson vs. Simple, VDoc, Valid, RONTO • Simple: not constructing virtual documents, not checking mapping consistency; • VDoc: constructing virtual documents, not validating mapping consistency; • Valid: not constructing virtual documents, validating mapping consistency; • RONTO: an existing prototype, distinguish the types of entities, using I-Sub. • F1-Measure: a combination of precision and recall. • Testing various thresholds for each approach, and selecting the best ones. • Experiment 2. Constructing contextual mappings • Collecting instances from the Web for the first three data sets: • More than 50 instances for each relation and class. • Comparing with the mappings established by experienced volunteers. ISWC2007, Nov. 14.

  20. Evaluation – Experiment 1 • Under Intel Pentium IV 2.8GHz processor, 512MB DDR2 memory, Windows XP Professional, and Java SE 6, Marson takes about 5 seconds to complete all the five tests (including the parsing time). ISWC2007, Nov. 14.

  21. Evaluation – Experiment 2 • In Case 1, missing< academic_staff, Professor (subclasses of Faculty ) >. • Not finding the mapping <academic_staff, Faculty>: • Without background knowledge. ISWC2007, Nov. 14.

  22. Evaluation – Experiment 2 (cont’d.) • In Case 2: finding <the relation Event, the class Conference> • When the values of the attribute type in Event equals to “Research Session” or “Industrial Session”, the subsumption relationship between Event and Conference can be converted to the equivalence relationship. ISWC2007, Nov. 14.

  23. Outline (cont’d.) • Introduction • Our approach • Evaluation • Related work • Summary and future work ISWC2007, Nov. 14.

  24. Related work • Interested by both Database and Semantic Web communities. • At an early stage: visual toolkits, help users specify mappings manually. • At present: discovering mappings (semi-) automatically. • For example, COMA, RONTO: • Not considering the structural differences in models; • Not validating the consistency between mappings. • Other research directions: • Describing system framework, e.g., OntoGrate; • Defining mapping expression language, e.g., R2O; • Extending OWL with ICs; • Inferring complex mappings, e.g., MapOnto. ISWC2007, Nov. 14.

  25. Summary and future work • Summary • An approach to discovering simple mappings; • An algorithm to build contextual mappings; • Experiments to evaluate our approach. • Future work • Instance matching; • Machine learning techniques for mining semantic mappings; • Others. ISWC2007, Nov. 14.

  26. Thanks for your attention! Any comments are welcome! http://iws.seu.edu.cn/ Tools: Marson, Falcon-AO, OntoSum Services: Falcons (Searching the SW with CSpaces) ISWC2007, Nov. 14.

More Related