1 / 27

Integrating Multiple Data Sources using a Standardized XML Dictionary

Integrating Multiple Data Sources using a Standardized XML Dictionary. Ramon Lawrence University of Manitoba umlawren@cs.umanitoba.ca Supervisor: Dr. Ken Barker. TR Labs - Winnipeg. Outline. Introduction, Motivation, and Background Integration architecture components

Download Presentation

Integrating Multiple Data Sources using a Standardized XML Dictionary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence University of Manitoba umlawren@cs.umanitoba.ca Supervisor: Dr. Ken Barker TRLabs - Winnipeg

  2. Outline • Introduction, Motivation, and Background • Integration architecture components • Integration architecture • Example integration • Applications to the WWW • Future work and conclusions • Demonstration of Unity

  3. Introduction • Integration of data is required when accessing multiple databases within an organization or on the WWW. • Our focus is automatically combining database schema using schema integration. • Schema integration requires knowledge of data semantics and use of metadata.

  4. Motivation • Organizations have several database systems which must interoperate. • Users often access multiple Web databases whose knowledge must be integrated and presented in a useful form. • Data warehouses and OLAP systems require data semantics to be understood and data to be cleansed and summarized.

  5. Background • Schema integration involves combining diverse database schema into an integrated view by resolving conflicts. • Schema conflicts include naming, structural, and semantic conflicts. • Schema integration is required for database interoperability, but it is currently a manual process.

  6. GTS GTS GTS GTS LDBS LDBS LDBS LDBS MDBS Architecture Global Transactions • Global Transaction Manager (GTM) • processes global transactions • insures information in all LDBSs is consistent • submits subtransactions to the GTSs for each LDBS GTM subtransactions • Global Transaction Servers (GTSs) • one for each LDBS • converts subtransactions from the GTM into a form usable by the LDBS and vice versa • Local Database Systems (LDBSs) • databases combined into MDBS • unchanged as still process local transactions Local Transactions

  7. Previous Work • Research systems: • integrating systems by logical rules (Sheth) • defining global dictionaries (Castano) • Carnot Project using the Cyc knowledge base • Industrial systems and standards: • Metadata Interchange Specification (MDIS) • XML, BizTalk, E-commerce portals

  8. Architecture Objective • The objective of our architecture is to provide a system for automatically integrating diverse relational schemas into a multidatabase • Desirable properties: • individual mappings - information sources integrated one-at-a-time and independently • global view constructed for query transparency • handles schema conflicts - including semantic, structural, and naming conflicts • automated global integration - global view constructed efficiently and automatically

  9. The Idea • The major idea is that schema conflicts can be resolved if we: • eliminate all naming conflicts • define a language capable of determining schema equivalence and performing transformations • With these two properties, schema conflicts can be resolved automatically at the global level

  10. Architecture Components: The Global Dictionary • A global dictionary (GD) provides standardized terms to capture data semantics. • Hierarchy of terms related by IS-A or Has-A links • Contains base set of common database concepts, but new concepts can be added • A GD term is a single, unambiguous semantic definition. • Several GD entries for a single English word are required if the word has multiple definitions.

  11. Architecture Components:Using the Global Dictionary • GD terms are used to build semantic names to describe the semantics of schema elements. • Semantic names have the form: • semantic name = “[“CT [[;CT] | [,CT]] “]” CN • CT = context term, CN = concept name • each CT and CN is a single term from the GD • Semantic names are included in specifications describing a data source.

  12. Architecture Components:X-Specs • Database metadata and semantic names are combined into specifications called X-Specs: • stored and transmitted using XML • contains information on a relational schema • organized into database, table, and field levels • stores semantic names to describe and integrate schema elements

  13. Architecture Components:Integrating X-Specs • Each database to be integrated is described using a X-Spec. • Identical concepts in different databases are identified by similar semantic names. • Concepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases.

  14. Integration Architecture • Our integration architecture consists of two separate phases: • capture process: X-Specs are constructed for each data source independently • integration process: X-Specs are combined using the integration algorithm which matches semantic names using the global dictionary

  15. Integration Architecture:The Capture Process • Capture process involves: • automatically extracting the schema information and metadata using a specification editor • assigning semantic names to each schema element (tables and fields) to capture their semantics

  16. Integration Architecture:The Capture Process Relational Schema Automatic Extraction X-Spec Specification Editor DBA Lookup of terms Global Dictionary

  17. Integration Architecture:The Integration Process • Integration process involves: • automatically identifying identical concepts by matching semantic names • constructing a global view of database concepts consisting of a hierarchy of concept terms • resolving structural differences during query generation and submission (e.g. a concept may be represented as a table in one database and a field (attribute) in another)

  18. Integration Architecture:The Integration Process …………. Client Client Integration Site Subtransactions X-Spec X-Spec …….. RDBMS RDBMS

  19. Integration Architecture Benefits • The benefits of the two phase architecture are: • Dynamic integration: schemas integrated as needed • X-Specs are constructed only once and independent of each other • Automatic conflict resolution by integrating based on semantic name rather than physical structure • Users are isolated from system names and organization by querying through a global view using semantic names for concepts

  20. Integration Example • Two claims databases to be integrated: • ABC Company: Claims_tb(claim_id, claimant, net_amount, paid_amount) • XYZ Company: T_claims(id, customer, claim_amt), T_payments(cid, pid, amount) • First step is to construct X-Specs for each database.

  21. Integration Example:ABC Database X-Spec

  22. Integration Example:XYZ Database X-Spec

  23. Integration Example:Integrated View • Global view after integration: • [Claim] • Id • Net amount • [Customer] • name • [Payment] • id • amount

  24. Integration Example:Discussion • Important points: • system and field names are not presented to the user who queries based on semantic names • database structure is not shown to the user • different physical representations for the same concept are combined (e.g. payment (attribute) in ABC with payment table in XYZ database) • hierarchially related concepts (customer vs. claimant) are combined based on their IS-A relationship in the global dictionary

  25. Applications to the WWW • Integrating diverse data sources is involved in constructing a data warehouse and other operational systems. • The WWW is a diverse organizations of databases which users access. • Automatically integrating web data sources by a browser or portal reduces query complexity and integration of results for the user.

  26. Conclusions • Automatic integration of database schema is possible by using a global dictionary of terms and constructing semantic names for schema elements. • Integration of data sources has applications to the WWW and construction of data warehouses.

  27. Future Work • The integration architecture is evolving with standards on XML and captures metadata information in XML documents. • The system is being tested on sample problems, and a query mechanism is work-in-progress. • We are refining a prototype of the system called Unity.

More Related