Integrating data sources on the world wide web
1 / 24

Integrating data sources on the World-Wide Web - PowerPoint PPT Presentation

  • Uploaded on

Integrating data sources on the World-Wide Web. Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary umlawren,[email protected] Introduction. Integration of data is required when accessing multiple databases within an organization or on the WWW.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Integrating data sources on the World-Wide Web' - duena

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Integrating data sources on the world wide web

Integrating data sources on the World-Wide Web

Ramon Lawrence and Ken Barker

U. of Manitoba, U. of Calgary

umlawren,[email protected]


  • Integration of data is required when accessing multiple databases within an organization or on the WWW.

  • Our focus is automatically combining database schema using schema integration.

  • Schema integration requires knowledge of data semantics and use of metadata.


  • Organizations have several database systems which must interoperate.

  • Users often access multiple Web databases whose knowledge must be integrated and presented in a useful form.

  • Data warehouses and OLAP systems require data semantics to be understood and data to be cleansed and summarized.


  • Schema integration involves combining diverse database schema into an integrated view by resolving conflicts.

  • Schema conflicts include naming, structural, and semantic conflicts.

  • Schema integration is required for database interoperability, but it is currently a manual process.

Previous work
Previous Work

  • Research systems:

    • integrating systems by logical rules (Sheth)

    • defining global dictionaries (Castano)

    • Carnot Project using the Cyc knowledge base

  • Industrial systems and standards:

    • Metadata Interchange Specification (MDIS)

    • XML, BizTalk, E-commerce portals

Architecture components the global dictionary
Architecture Components: The Global Dictionary

  • A global dictionary (GD) provides standardized terms to capture data semantics.

    • Hierarchy of terms related by IS-A or Has-A links

    • Contains base set of common database concepts, but new concepts can be added

  • A GD term is a single, unambiguous semantic definition.

    • Several GD entries for a single English word are required if the word has multiple definitions.

Architecture components using the global dictionary
Architecture Components:Using the Global Dictionary

  • GD terms are used to build semantic names to describe the semantics of schema elements.

  • Semantic names have the form:

    • semantic name = “[“CT [[;CT] | [,CT]] “]” CN

    • CT = context term, CN = concept name

    • each CT and CN is a single term from the GD

  • Semantic names are included in RIM specifications describing a data source.

Architecture components the relational integration model
Architecture Components:The Relational Integration Model

  • Database metadata and semantic names are combined into Relational Integration Model (RIM) Specifications (RIM Specs)

    • contains information on a relational schema

    • organized into database, table, and field levels

    • stores semantic names to describe and integrate schema elements

Architecture components integrating rim specs
Architecture Components:Integrating RIM Specs

  • Each database to be integrated is described using a RIM specification.

  • Identical concepts in different databases are identified by similar semantic names.

  • Concepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases.

Integration architecture
Integration Architecture

  • Our integration architecture consists of two separate phases:

    • capture process: RIM specs are constructed for each data source independently

    • integration process: RIM specs are combined using the integration algorithm which matches semantic names using the global dictionary

Integration architecture the capture process
Integration Architecture:The Capture Process

  • Capture process involves:

    • automatically extracting the schema information and metadata using a specification editor

    • assigning semantic names to each schema element (tables and fields) to capture their semantics

Integration architecture the capture process1
Integration Architecture:The Capture Process









DBA Lookup

of terms



Integration architecture the integration process
Integration Architecture:The Integration Process

  • Integration process involves:

    • automatically identifying identical concepts by matching semantic names

    • constructing a global view of database concepts consisting of a hierarchy of concept terms

    • resolving structural differences during query generation and submission (e.g. a concept may be represented as a table in one database and a field (attribute) in another)

Integration architecture the integration process1
Integration Architecture:The Integration Process




Integration Site


RIM spec

RIM spec




Integration architecture benefits
Integration Architecture Benefits

  • The benefits of the two phase architecture are:

    • Dynamic integration: schemas integrated as needed

    • RIM Specs are constructed only once and independent of each other

    • Automatic conflict resolution by integrating based on semantic name rather than physical structure

    • Users are isolated from system names and organization by querying through a global view using semantic names for concepts

Integration example
Integration Example

  • Two claims databases to be integrated:

    • ABC Company: Claims_tb(claim_id, claimant, net_amount, paid_amount)

    • XYZ Company: T_claims(id, customer, claim_amt), T_payments(cid, pid, amount)

  • First step is to construct RIM specs for each database.

Integration example abc database rim spec
Integration Example:ABC Database RIM Spec

Integration example xyz database rim spec
Integration Example:XYZ Database RIM Spec

Integration example integrated view
Integration Example:Integrated View

  • Global view after integration:

    • [Claim]

      • Id

      • Net amount

      • [Customer]

        • name

      • [Payment]

        • id

        • amount

Integration example discussion
Integration Example:Discussion

  • Important points:

    • system and field names are not presented to the user who queries based on semantic names

    • database structure is not shown to the user

    • different physical representations for the same concept are combined (e.g. payment (attribute) in ABC with payment table in XYZ database)

    • hierarchially related concepts (customer vs. claimant) are combined based on their IS-A relationship in the global dictionary

Applications to the www
Applications to the WWW

  • Integrating diverse data sources is involved in constructing a data warehouse and other operational systems.

  • The WWW is a diverse organizations of databases which users access.

  • Automatically integrating web data sources by a browser or portal reduces query complexity and integration of results for the user.


  • Automatic integration of database schema is possible by using a global dictionary of terms and constructing semantic names for schema elements.

  • Integration of data sources has applications to the WWW and construction of data warehouses.

Important changes
Important Changes

  • The integration architecture is constantly being refined. Some notable differences in this presentation versus the paper:

    • Our integration system uses XML to represent a RIM spec which is renamed as a X-Spec.

    • An integration site is used as a central portal for integration and management.

    • No longer using semantic distance calculations between terms.

    • Format of semantic name has been simplified.

Future work
Future Work

  • The integration architecture is involving with standards on XML and now captures metadata information in XML documents.

  • The system is being tested on sample problems, and a query mechanism is work-in-progress.

  • We are refining a prototype of the system called Unity.