1 / 25

Semantic integration of data in database systems and ontologies

Technical university of Liberec Faculty of mechatronics. Semantic integration of data in database systems and ontologies. Ing. Petra Šeflová. Integration of data - merging a set given schemas into global schema Semantic integration - part of concept integration of data

marty
Download Presentation

Semantic integration of data in database systems and ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technical university of Liberec Faculty of mechatronics Semantic integration of data in database systems and ontologies Ing. Petra Šeflová

  2. Integration of data - merging a set given schemas into global schema Semantic integration - part of concept integration of data - be focusing on data exchange between applications in the light of their meaning, content and required business rules

  3. Find houses with four bathrooms and price under $500.000 realestate.com Source schema wrapper mediated schema Source schema wrapper homeseekers.com Source schema wrapper greathomes.com A data integration system in the real estate domain. Integration of data Example

  4. Applications • Catalog integration in B2B applications • E-commerce • Bioinformatics • P2P Databases • Agent communications • Web services Integration

  5. Key commonalities application of Semantic integration • Use structured representation (e.g. relational schemas and XML DTDs) • Must resolve heterogenities with respect to the schema and their data • Enable their manipulation • Merging the schemas • Computing differences • Enable translation of data and queries across the schemas/ontologies

  6. Database schema • Present definition physical system layout (database) • Ontology • System of knowledge about world • Claimless on coherence (lot of partial ontology) • Frequently specific created artefact • Definition of Gruber: Ontology is formal, explicit specification sharing conceptualization.

  7. Problems of Semantic integration • Semantic of elements can be inferred from only a few information sources • Creators of data • Dokumentation • Associated schema and data • Schema element are typically matched based on clues in the schema and data • Schema and data clues are often incomlpete • Matching is often subjective, depending in the application

  8. Matching process • Take as input two schemas/ontologies, each consisting of a set discrete entities, and determine as output the relationships holding between these entities

  9. Schema S Houses Schema T Agents Example : The schema of two relational database S and T on house listing, and the semantic correspondence between them

  10. Matching techniques Two groups • Rule-based • Learning-based

  11. Rule-based solutions • Many of the early as well as current matching solutions employ hand-crafted rules • Exploit schema information • Element names • Data types • Structures • Integrity constraints • Can provide a quick and concise method to capture valuable user knowledge about domain

  12. Rule-based solutions • Benefits • „relatively inexpensive“ • Do not require training • Operate only on schema • Drawback • They cannot exploit data instance effectively • They cannot exploit previous matching efforts For example : • TranScm • DIKE • MOMIS • CUPID

  13. TranScm • Employs rules such as „two elements match if they have the same name (allowing synonyms) and the same number of subelements • DIKE • Computes similarity between two schema element based on similarity of the characteristics of the element and similarity of related elements • MOMIS • Compute similarity of schema elements as a weighted suma of the similarity of name,data type and substructure • CUPID • Employs rules that categorize elements based on names, data types and domains

  14. Learning-based solutions • Exploit both schema and data information • They do exploit previous matching efforts • Examples: • SemInt system • LSD system • iMAP system • Autocomplex • Automatch

  15. SemInt • Uses a neuralnetwork learning approaches • It matched schema elements based on attribute specifications and statistic of data content • LSD • Employs Naive Bayes over data instance • Develop novel learning solution exploit the hierarchical nature of XML data • iMAP • Matches the schemas of two sources by analyzing the description of objects that are found in both sources • Autoplex and Automatch • Use a Naive Bayes learning approach that exploits data instances to match element

  16. The Matching dimensions • Input dimension • Process dimension • Output dimensions

  17. Input dimension • Concern the kind of input on which algorithm operate • First dimension • Algorithms depending on the data/ conceptual model in which ontologies or schemas are expressed • Second dimension • Depend on the kind of data algorithms exploit • Different approaches exploit different information of the input data/conceptual models • Schema-level information • Instance data • Exploit both

  18. Process dimensions • Classification of the matching process could be based on its general properties • It depends on the approximate or exact nature of its computation • Exact algorithms compute the absolute solution to a problem • Approximate algorithms sacrifice exactness to performance • Three large classes based on intrinsic input, external resources or some semantic theory • Syntactic • External • Semantic

  19. Output dimensions • Concern the form of the result they produce • One-to-one correspondence • Is any relation suitable • Has it to be final mapping element • System deliver a graded answer • Correspondences hold with 98% confidence • Correspondences hold with 4/5 probability • All-or-nothing answer • Correspondences using distance measuring • Kind of relations between entities a system can provide • Equivalence • Subsumption • Incompatibility

  20. Schema-Based Matching Techniques Element-level Structure-level Syntactic External Syntantic External Semantic Granuality/Input Interpretation layer String- Based Language- Based Linguistic Resource Contraint- Based Alignment reuse Upper Level Formal ontologies Graph- Based Taxonomy- Based Repository of Structure Model- Based Linguistic Internal Relational Basic Techniques layer Semantic Terminological Structural Schema-Based Matching Techniques Classification of elementary schema-based matching approaches

  21. Element-level vs structure-level • Element-level matching techniques compute mapping elements by analyzing entities in isolation • Ignoring their relation with other entities • Structure-level techniques compute mapping elements by analyzing how entities appear together in a structure

  22. Internal vs external techniques • Interal • Exploiting information which comes only with input schema/ontologies • Syntactic interpretation of input • Sematic interpretation of input • External • Exploit auxiliary (external) resources of domain to interpret the input • Resources : • Human input • Some thesaurus expressing the relationship between terms

  23. Schema Matching vs Ontology Matching Differences • Database schema often do not provide explicit semantics for their data • Semantics is usually specified explicitly at design-time • Usually performed with the help of techniques trying to guess the meaning encoded in the schemas • Ontologies are logical systems that themselves obey some formal semantics • Primarily try to exploit knowledge explicitly encoded in the ontologies

  24. Schema Matchin vs Ontology Matching Commonalities • Ontologies and schemas are similar in the sense : • Provide a vocablurary of terms that describes a domain of interest • Constrain the meaning of terms used in vocablurary • Schema and ontologies are found in such enviroment as the Semantic web

  25. Sources : • Natalya F.Noy : Semantic Integration: A survey of Ontology-Based Approaches • AnHai Doan, Alon Y. Haley: Semantic Integration in the Database Community: A Brief Survey • P.Schvaiko, J. Euzenat: A Survey of schema-based Matching Approaches • G. Antonious, F. van Harmelen: A Semantic Web Primer • R. Araújo, H. Sofia Pinto: Toward Semantics-based ontology similarity • H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Shuster, H. Neumann and S. Húbner: Ontology-based integration of information – A survey existing Approaches • E. Rahm, P.A. Bernstein: A survey of approaches to automatic schema matching

More Related