1 / 38

Joachim Hammer and Dennis McLeod Received 17 th January, 1993 Revised 5 th April, 1993

An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems. Joachim Hammer and Dennis McLeod Received 17 th January, 1993 Revised 5 th April, 1993. Presenter : Apurv Upasani. Overview. Introduction Related Research

gyala
Download Presentation

Joachim Hammer and Dennis McLeod Received 17 th January, 1993 Revised 5 th April, 1993

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems Joachim Hammer and Dennis McLeod Received 17th January, 1993 Revised 5th April, 1993 Presenter : ApurvUpasani

  2. Overview • Introduction • Related Research • The Federated Database Context • The Interoperability Context for Semantic Heterogeneity Resolution • The Object Database Model • Mechanism for Semantic Heterogeneity Resolution • The Unification of Remote and Local information • Concluding Remarks • Subsequent Research & Future Work

  3. Introduction An approach to resolving semantic heterogeneity in a federationof autonomous, heterogeneousdatabase systems • Need for creation of federated database systems • Storing large amount of heterogeneous data • Allowing sharing and exchange of data

  4. Important Terms Database Systems – A system which provides capability of defining a database, querying and manipulation of database, provide control over database i.e.. semantic integrity, concurrency, recovery, access control , and finally provide ability to store database and contents of the database. Heterogeneous Database Systems – A subset of multiple database system in which each component database system is modeled in a different way than others, either syntactically, schematically and/or semantically. Autonomous Database System – Self governing database system that has its set of own rules. Can be Single or Multiple

  5. Federated Database Systems (FDBS) Many definitions ! A federated database is a relational database whose data is stored in multiple data sources (such as separate relational databases). The data appears as if it were all in a single large database and can be accessed through traditional SQL queries. Changes to the data can be explicitly directed to the appropriate data source. http://www.dbforums.com/db2/1642674-distributed-vs-federated.html FDBS is a collection of co-operating heterogeneous, autonomous database systems. Key Characteristic of FDBS Co-operation among systems which is reflected by controlled & sometimes limited integration among its autonomous components. This is called interoperability. Loosely coupled collection, stress on autonomy and flexible sharing patterns

  6. Semantic Heterogeneity Differencesin the meaning of datathat makes it difficult to identify various relationships that exist between similar or related objects. Goal : To resolve the semantic heterogeneity by : Determine relationships between objects that model similar information Determine possible conflicts in their representation which may pose problems during unification of shared data. Approach ???? - Minimal Object Data Model

  7. Related Research Research on heterogeneous “distributed” databases in 1980s. DB1 Views HDBS interact Tight coupling DB2 • Problems in this approach • Not useful as FDBS supports a loosely coupled architecture. • Unlike views, heterogeneous components in FDBS provide complimentary information. • Schema integration in FDBS is more complex than view integration because of naming of objects in heterogeneous components • Integrating schemas consist of inter-database object correspondence.

  8. Related Research • Basic Principle of Integrated attributes (Larson et al.) • Any pair of objects whose identifying attributes can be integrated can themselves be integrated. • Problem : Cardinality constraints, integrity constraints and allowable operations. • Using Behavior to resolve domain and schema mismatch problems(Kent et.al) • Using an object oriented programming language to create mappings between common concepts. • Problem : Creation of language that would be sophisticated enough to do this. • Using Support-Path Methods (Mehta et al) • Using support path methods to access distant information in federation of database components • Use mappings between objects to access information • Problem: Large overhead of calculating and maintaining mappings • No mention of how the relationship between objects is determined

  9. The Federated database context An example scenario Federation of Travel Agencies (FOTA)

  10. The Federated database context

  11. Semantic Heterogeneity Spectrum Meta-Data Language (Conceptual Database Model) EER vs CIOM DB1 - Person (SSN, Name, Email) Marriage(Person1, Person2) DB2 - Person(Email , Name , Spouse) Meta-Data Specification (Conceptual Schema) Object Comparability Hotels in Northeastern US & Accommodation in New England may contain similar information. • Low-level data format units of measure ie. Pounds vs dollars Tools (Database Management System) DB1 on Oracle 10g , DB2 on MySQL, DB3 on SQL Server

  12. Causes of Semantic Heterogeneity • Different perspectives • Different user groups use different viewpoints when modeling the same information.

  13. Equivalent constructs Several combinations of constructs can model the same real –world domain equivalently. • Incompatible design specifications Different design specifications lead to different schemas

  14. The Interoperability context for Semantic Heterogeneity Resolution • Resource discovery and identification • Resolution of semantic Heterogeneity • Sharing and transmission Performed iteratively

  15. Interoperability Resolution Mechanism

  16. Resource discovery and identification • Sharing advisor uses the semantic dictionary to return the information that it considers relevant to the component initiating the inquiry. • The goal of the advisor is to identify relevant information in other components that is identical, similar or equivalent to the requested information

  17. Resolution of semantic Heterogeneity • Meta-functions • Return structural information about object (supertype,subtype properties etc). • Lexicon ( Local Dictionary) • Contains semantic description of every sharable type of object in database • Semantic Dictionary • Describes relationships between terms in local lexicon.

  18. Sharing and Transmission • Two approaches • Maintain the local copy of the foreign object in the importing database. • Use a local surrogate of the shared object so that no physical copies are made.

  19. The Object Database Model • What type of model must be used ? • Must be a common model • Must be semantically expressive enough to capture the intended meanings of conceptual schemas • Must be simple enough so that it can be readily understood and implemented. • Advantage of simple model • Can be implemented using variety of already existing object oriented DBMS saving both time and effort. • Hence we choose Minimal Object Data Model (MODM)

  20. Minimal Object Data Model (MODM) • Generic functional object model, which supports the usual object based constructs. • Supports aggregation, classification, generalization, inheritance of stored functions and user defined functions. • Does not support dynamic binding of functions, overloading of operations, constraints on types and functions and remote transparency. • Ability to encapsulate the functionality of shared objects, extensibility and object uniformity. Extremely important during unification phase.

  21. Relationships Among Objects There might be various relationships that can exist among objects that model same or similar concepts in different components in a federation Common Concepts Related Concepts Identical Generalization/Specialization Equivalent Positive Association Compatible Incompatible

  22. Mechanism for Semantic Heterogeneity Resolution • Goal : To provide a mechanism to support the semantic heterogeneity when sharing information among components in the federation • We limit our investigation to the sharing of type objects (type level sharing) • Most of the previous research focused on two areas : • Structural Equivalence • Behavioral Equivalence

  23. Three ponged approach to relative object equivalence Three pronged approach Semantic Dictionary Meta-functions Local Lexicon Components that agree to participate in the federation must agree on the common interface (MODM) that can provide the functionalities described above

  24. Meta-functions These are set of functions that return the meta data information of objects in remote database components.

  25. Local Lexicon Each component maintains a local lexicon where it maintains the semantic information of objects that are part of the export schema Knowledge represented in static collection of facts in form of : <term> relationship descriptor <term>

  26. Semantic Dictionary • Created and maintained by sharing advisor • Local lexica contains only semantic information of the type objects. Semantic dictionary contains a partial knowledge about the relationships between all terms in the local lexica in the federation.

  27. Unification of local and remote information • Extremely important step while importing a type object. • Two step process while adding imported meta-data: • Conflict Resolution • Removes inconsistencies (naming,modelling,scaling) between the imported type and the target schema • Unification • Process of merging/integrating/unifying the foreign object into the local schema gracefully and as naturally as possible.

  28. Conflict Resolution • Attempt to resolve inconsistencies between the imported type(s) and the target schema before unification step. • Automatic conflict resolution not possible in all cases. Specific cases include : • Operation on atomic data values • Attempt to resolve type (iv) heterogeneities (Domain Mismatch) • Renaming • Attempts to resolve the problem of homonyms (same name in different concepts) and synonyms (same concept, multiple names)

  29. Unification Process of unifying the foreign object with the corresponding local object in the target schema. The target schema must be restructured to achieve the following result : 1) Complete 2) Minimal 3) Understandable • 3 scenarios describe the complexity of the schemas to be integrated • Import a single foreign object • Import one or more foreign objects • There exists a relationships between the objects that are to • be unified

  30. Unification - Import a single foreign object • F does not exist in Cimp’s schema • F Can be added into Cimp’s schema without further modification. • Some additional work must be required such as importing value types if they don’t exist in schema. • F is semantically equivalent to L in Cimp’s schema • Make F a sub-type of L and add necessary functions to both F and L • Case where F is identical to L is merely a simplification and requires importing only the relevant type instances. • F is related to an object L in Cimp’s schema • Create a super-class which has commonalities between F and L, and then creates 2 sub classes which have unique properties of F and L respectively Notations : Cimp – Importing Component Schema Fi – ith foreign object Cexp – Exporting Component Schema Li – ith local object

  31. Unification - Import several inter-related objects • F1 and F2 do not exist in Cimp’s schema • F1 and F2 type objects and all their instances can be added. • Eg. Trains and Train-fares can be added from Agency C to Agency B • Either F1 or F2 is semantically equivalentto L in Cimp’s schema • Assume that F1 is equivalent to L1 • First new subtype of L1 is created in order to hold imported instances from F1. • Then F2 is added to as L2 in Cimp schema. • Finally new functions relating L1’s subtype and L2 are created. • Eg. Agency D importing Air-Travel and Price from Agency C. Since Air-Travel and Flight related, create subtype Air-Travel under Flights, import Price as L2 in C and created relationship between Air-Travel and Price

  32. Unification - Import several inter-related objects • Either F1 or F2 is related to L in Cimp’s schema • Assume that F1 is equivalent to L1. • Functions common to F1 and L1 are associated with a common super-type (S1). • Since F2 was related to F1, it is also now related to the S1 • Eg. Agency D importing Sightseeing and Cities from Agency C. Since Sightseeing in C is related to Entertainment in D,create a super-type Things-To-Do and make Sightseeing and Entertainment as sub-type. Cities, being a stored function of both sub-types is connected to Things-To-Do. • Both F1 and F2 are related to L1 and L2 in Cimp’s schema • Both F1 and F2 type objects are imported separately. • If the association between F1 and F2 is important, it can be added afterwards.

  33. Unification – Modeling relationship between objects • Consider the case where component Cimp wishes to import types from Cexp whose relationships are modeled differently from the corresponding type relationships in Cimp’s local schema. • Example: L1 and L2 related through ternary relationship that includes an additional type L3 whereas the corresponding foreign type F1 and F2 are related directly through functions and their inverses. • Apply sub setting method

  34. Concluding Remarks For the mechanism to operate effectively, each participating component must meet following conditions : MODM must be supported at federation interface, including sharing of meta-data functions. A local lexicon must be provided by each component. The approach is based on the following services provided by the federation to the components: Sharing advisor Disambiguation algorithm Unification tool Unification tool places foreign object into local meta-data framework Comp DB1 Comp DB2 Sharing Advisor selects relevant foreign objects to integrate invokes Fetches the data from lexicon to semantic dictionary Lexicon Lexicon Semantic Dictionary Federation Framework Importing Component DB New component DB

  35. Concluding Remarks • The results of this research may have a direct and practical impact on information sharing among heterogeneous databases specifically in • the following areas: • Framework • For accommodating semantic heterogeneity. • Use of functional object based model for describing sharable data and relationships to the real world concepts. • Architecture • Based in such a way that it resolves semantic heterogeneity by making use of meta-functions, local lexica and semantic dictionary. • Existing components and Autonomy • Requires no change to the conceptual schemas and existing DBMSs. • Each database can function independently.

  36. Subsequent Research • Semantic Heterogeneity in Multidatabase Systems : A review and proposed • Meta-Data structure - Te- Wei Wang,2004 • Resolving Semantic Heterogeneity in Schema Integration: an Ontology Based Approach - FarshadHakimpour & Andreas Geppert • Reconciliation of temporal semantic heterogeneity in evolving information systems - Hongwei Zhu & Stuart E. Madnick,2008 • Reconciling Semantic Heterogeneity in Web Services Composition - Xitong Li ,Stuart Madnick, Hongwei Zhu ,Yushun Fan , Sept 2009 • Integration and Disambiguation Techniques for Semantic Heterogeneity Reduction on the Web - Jorge Gracia del Rio, Doctoral Thesis, 2009

  37. Future Work • Research Issues in Federated Database Systems (S Conrad et.al), EFDBS Workshop,1987 • Use of ODMG Object model as standard data model • Use of CASE tools or existing tools in schema integration • Semantics and implementation techniques for inter and intra model mappings • Integration of semi-structured data into files • Integration and reuse of behavior implemented in component databases • Support for versioning in FDBS • Tech support for integration with legacy systems • The Federated Database System for Multimedia ,Dalen Kambur,Interoperable Systems Group, Dublin University

  38. Thank You

More Related