technologie van informatiesystemen tis l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Technologie van Informatiesystemen TIS PowerPoint Presentation
Download Presentation
Technologie van Informatiesystemen TIS

Loading in 2 Seconds...

play fullscreen
1 / 56

Technologie van Informatiesystemen TIS - PowerPoint PPT Presentation


  • 205 Views
  • Uploaded on

Technologie van Informatiesystemen TIS college 3 Inhoud Inleiding, 30/11 Web engineering & Web information systems, 7/12 Data transformatie & Data integratie, 14/12 ERP, Smulders (Deloitte), 21/12 + 11/1 Flower, Berens (Pallas Athena), 25/1 + 1/2 Biztalk, van den Boom (Microsoft), 15+22/2

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Technologie van Informatiesystemen TIS' - jacob


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
inhoud
Inhoud
  • Inleiding, 30/11
  • Web engineering & Web information systems, 7/12
  • Data transformatie & Data integratie, 14/12
  • ERP, Smulders (Deloitte), 21/12 + 11/1
  • Flower, Berens (Pallas Athena), 25/1 + 1/2
  • Biztalk, van den Boom (Microsoft), 15+22/2
inhoud3
Inhoud
  • Inleiding, 30/11
  • Web engineering & Web information systems, 7/12
  • Data transformatie & Data integratie, 14/12
  • ERP, Smulders (Deloitte), 21/12 + 11/1
  • Flower, Berens (Pallas Athena), 25/1 + 1/2
  • Biztalk, van den Boom (Microsoft), 15+22/2

Philippe Thiran

philippe thiran computer science department technische universiteit eindhoven the netherlands
Philippe Thiran

Computer Science Department

Technische Universiteit Eindhoven

The Netherlands

Data TransformationData Integration
data transformation integration
Data Transformation & Integration
  • Agenda
    • Problem Statement
      • Existing database systems
      • Heterogeneity, distribution, autonomy
    • Data Transformation
      • Schema conversion
      • Query conversion: Wrapper
    • Data Integration
      • Schema integration
      • Query processing: Multidatabase and Federation
problem statement

Problem Statement

Existing database systems

Heterogeneity, distribution, autonomy

problem statement existing database systems
Problem StatementExisting Database Systems
  • Existing Database Systems
    • Data are recorded in existing database systems
    • Existing database systems are:
      • Mission critical (essential to the organization business)
      • To be operational at all times
      • Inflexible
    • Typically, existing database systems are:
      • Very large (millions of lines of code)
      • Old (often more than 10 years old)
      • Written in old programming language like COBOL, PL/1, SQL!
      • Built around an old DBMS
problem statement existing database systems8
Problem StatementExisting Database Systems
  • Existing Database Systems
    • Data are recorded in existing database systems
    • Answer of old requirements
  • New functions and services
      • New user requirements
      • New technology (Web)
  • Communication among them?
problem statement existing database systems9
Problem StatementExisting Database Systems
  • Existing Systems: New Services
    • How to deal with existing database systems ?
      • Abandon the existing systems: migration to a new system
      • Keep and modify the existing systems
      • Keep the existing systems and wrap them: autonomy
  • Existing Systems: Communication
    • How to integrate existing database systems?
problem statement data integration

Distribution

Distributed databases

Autonomy

Centralized

DBMS

Heterogeneity

Problem StatementData Integration
  • Data Integration Problems
    • Integrating database systems is very hard and costly
    • Three main dimension of the problem:
      • Distribution
      • Autonomy
      • Heterogeneity
data integration problem statement

Distribution

Autonomy

Heterogeneity

Data IntegrationProblem Statement
  • Autonomy
    • Autonomy refers to the distribution of control
    • Four dimensions of autonomy:
      • Design: own data models and own transaction management technique
      • Communication: nor knowledge of the existence of other system nor how to communicate with them
      • Execution: independently of the other systems
      • Association: each system decides how much of its data and processing capabilities it will share with the other system
data integration problem statement12

Distribution

Autonomy

Heterogeneity

Data IntegrationProblem Statement
  • Heterogeneity
    • Heterogeneity may exist at three basic levels:
      • DBMS level. Data is managed by a variety of DBMS based on different data models and data languages
        • Data models : relational model, hierarchical model and file model
        • Data languages : SQL, DL/1, COBOL programs
      • Platform level. Different hardwares, different network protocols
      • Semantic level. Different designer viewpoints in modelling the same objects of the application domain. Incompatible design specifications which lead to different naming, types or integrity constraints
data integration generic integration architecture
Data IntegrationGeneric Integration Architecture
  • Schema Hierarchy

Homogenizes and unions import

schemas

Integrated

Schema

Common Model

Import

Schema 1

Import

Schema 2

Import

Schema 3

View on export schema available for

non-local access

Export

Schema 1

Export

Schema 2

Export

Schema 3

Unifies data models

Database

Schema 1

Database

Schema 2

Data

Schema 3

Local Models

DB1

DB2

File System

OO DBMS

Relational DBMS

data integration generic integration architecture14
Data IntegrationGeneric Integration Architecture
  • Schema Hierarchy

Integrated

Schema

Data and Schema Integration

Common Model

Import

Schema 1

Import

Schema 2

Import

Schema 3

Export

Schema 1

Export

Schema 2

Export

Schema 3

Data and Schema Transformation

Database

Schema 1

Database

Schema 2

Data

Schema 3

Local Models

DB1

DB2

File System

OO DBMS

Relational DBMS

data transformation

Data Transformation

Schema Conversion

Query Conversion: Wrapper

data transformation schema conversion
Data TransformationSchema Conversion
  • Introduction
    • Schema conversion
    • Query/Data conversion

Common

Data

Model

Export

Schema 1

Export

Schema 2

Query1

Data1

Data2

Query2

Local

Data

Models

Database

Schema 1

Database

Schema 2

Query1’

Data1’

Data2’

Query2’

Data

Source 1

Data

Source 2

data transformation schema conversion17
Data TransformationSchema Conversion
  • Schema Conversion
    • Schema transformation
      • Transformation of a schema expressed in a data model (Ms) into an equivalent schema expressed in another data model (Mt)
      • Examples
        • ER model  Relational model (lecture ISO)
        • Relational model XML Schema (see later)
      • Schema transformation operators
      • Schema conversion consists in applying the relevant transformations on the relevant constructs of the schema expressed in Ms in such a way that the final result complies with Mt
data transformation schema conversion18

B

A

B1

A1

A

B

B2

A1

B1

id:

B1

B1

B2

ref:

B1

id:

B1

R

1-1

0-N

Data TransformationSchema Conversion
  • Schema Conversion
    • Schema transformation
      • A (schema) transformation basically is an operator by which a source data structure C is replaced with a target structure C'.
      • Example of a semantics-preserving transformation: transforming a relationship type into an attribute
data transformation schema conversion19

B

A

B1

A1

A

B

B2

A1

B1

id:

B1

B1

B2

ref:

B1

id:

B1

R

1-1

0-N

Data TransformationSchema Conversion
  • Schema Conversion
    • 2 main schema transformations for ER model  Relational model
data transformation schema conversion20
Data TransformationSchema Conversion
  • Schema Conversion
    • Exercice: FromER model  Relational model
data transformation schema conversion21
Data TransformationSchema Conversion
  • Schema Conversion
    • Exercice: FromER model  Relational model
data transformation schema conversion22
Data TransformationSchema Conversion
  • Schema Conversion
    • Exercice: FromER model  Relational model
data transformation wrappers

Common

Data

Model

Common Data Model

Common Query Language

Export Schema

Data TransformationWrappers

Wrapper

Local

Data

Models

Database Schema

Data

Source

  • Definition
    • A wrapper controls a (legacy) data source
    • Basically a wrapper is a software component that offers an homogeneous query interface based on a common data model (XML for the Web)
    • It converts data and queries from the common data model to a local data model

It offers an adequate way for solving the DBMS heterogeneity that appears when one wants to integrate existing and heterogeneous data systems

data transformation wrappers24
Data TransformationWrappers
  • Definition (ctd)
    • A data wrapper is basically defined as a converter of data and queries
    • That is, a wrapper:
      • Offers an export schema in the common data model
      • Accepts queries against the export schema
      • Translates them into queries understandable by the data system
      • Transforms the results of the local queries into a format understood by the application

Query

Data

Common

Data

Model

Common Data Model

Common Query Language

Export Schema

Wrapper

Local

Data

Models

Local Data Model

Local Query Language

Database Schema

Data

Source

data transformation wrappers25
Data TransformationWrappers
  • Categories of Wrappers
    • There exists no standard approach to build wrappers
    • Functionality
      • One-way: only transformation of data (e.g., for data warehouses)
      • Two-way: transformation of requests and data
    • Development
      • Hard-wired wrappers, for specific data sources
      • Semi-automated generation: wrapper development tools
      • Automatically generated wrappers
    • Availability
      • Standalone programs (data conversion, data migration)
      • Components of a federation (see later)
      • Database interface for foreign data
data transformation wrappers26
Data TransformationWrappers
  • Wrappers and the Web
    • Wrapper interface
      • Data format: XML
      • Common data model: XML DTD and Schema
      • Common query language: XPath, XQuery, none
    • Wrapper mapping
      • Generally between relational data and XML
      • Two translation types
        • Automated
        • Defined by the user
      • XML- or SQL-oriented query language
data transformation wrappers27
Data TransformationWrappers
  • XML Views of Relational Databases
    • Automated translation

Order

Item

Payement

<db>

<order>

<row><id>10</id><custname>Philips</custname><custum>7734</custnum></row>

<row><id>9</id><custname>Unilever</custname><custum>7725</custnum></row>

</order>

<item>

<row><oid>10</oid><desc>Ship</desc><cost>24000</cost></row>

<row><oid>10</oid><desc>Generator</desc><cost>8000</cost></row>

</item>

<payement>

similar to <order> and <item>

</payement>

</db>

data transformation wrappers28
Data TransformationWrappers
  • XML Views of Relational Databases
    • User-defined Translation

Order

<order id=’10’>

<custname> Philips </custname>

<items>

<item description=“Ship”>

<cost> 24000 </cost>

</item>

<item description=“Generator”>

<cost> 800 <cost>

</item>

</items>

</payments>

<payement due=’1/10/01’>

<amount> 20000 </amount>

</payement>

</payements>

</order>

<order id =‘9’>

</order>

Item

Payement

data transformation wrappers29
Data TransformationWrappers
  • XML Views of Relational Databases
    • Exercises
      • What is the XML Document of this relational database?
data transformation wrappers30
Data TransformationWrappers
  • XML Views of Relational Databases
    • Exercises
      • What is the XML Document of this relational database?

<!ELEMENT Catalog (Order*, Product*)>

<!ELEMENT Order (Customer, Date, Total?, Detail+)>

<!ATTLIST Order

OrderID ID #REQUIRED>

<!ELEMENT Customer ANY>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT Total (#PCDATA)>

<!ELEMENT Detail (Quantity, Amount)>

<!ATTLIST Detail

Product IDREF #REQUIRED>

<!ELEMENT Quantity (#PCDATA)>

<!ELEMENT Amount (#PCDATA)>

<!ELEMENT Product (Supplier+)>

<!ATTLIST Product

Reference ID #REQUIRED

Label CDATA #IMPLIED

UnitPrice CDATA #REQUIRED>

<!ELEMENT Supplier ANY>

data transformation wrappers31
Data TransformationWrappers
  • XML Views of Existing Relational Databases
    • Mapping definition
      • SQL-oriented query language

For $b in

SQL(select * from Order where Custname=“’ +$x + ‘””)

return <order> {$b/Id} <Custname>{$x}</Custname></order>

Order

Order

Id

Custname

data transformation wrappers32
Data TransformationWrappers
  • XML Views of Existing Relational Databases
    • XML View definition
      • Bottom-up (from the relational schema)
      • Top-Down (from a given XML schema)
    • Mappings between XML views and relational schemas
      • Automated (algorithm)
      • Manual (defined by the user)
data transformation wrappers33
Data TransformationWrappers
  • XML Views of Existing Relational Databases
    • Examples
data integration

Data Integration

Generic Integration Architecture

Schema Integration

Query Processing: multidatabase and federation

data integration35

Data Integration

Generic Integration Architecture

Schema Integration

data integration generic integration architecture36
Data IntegrationGeneric Integration Architecture
  • Schema Hierarchy

Homogenizes and unions import

schemas

Integrated

Schema

Common Model

Import

Schema 1

Import

Schema 2

Import

Schema 3

View on export schema available for

non-local access

Export

Schema 1

Export

Schema 2

Export

Schema 3

Unifies data models

Database

Schema 1

Database

Schema 2

Data

Schema 3

Local Models

DB1

DB2

File System

OO DBMS

Relational DBMS

data integration generic integration architecture37
Data IntegrationGeneric Integration Architecture
  • Component Architecture

Application 1

Application 2

Application 3

Integrated

Schema

Common DDL/DML

Meditor

Offers an abstract integrated view of sources

Reconciles independent data structures to yield a unique, coherent, view of the data

Import

Schema 1

Export

Schema 1

Wrapper

Wrapper

Wrapper

Local DDL/DML

Database

Schema 1

DBMS 1

DBMS 2

DBMS 3

Controls a local data source

Offers an homogeneous query interface based on a common data model

DB1

DB2

DB3

data integration generic integration architecture38
Data IntegrationGeneric Integration Architecture
  • Aspects to Consider for Integration
    • General Issues
      • Bottom-up vs. top-down engineering
        • From existing schema to integrated or vice-versa
        • Schema integration vs. schema matching
      • Virtual vs. materialized integration
      • Read-only vs. read-write access
      • Transparency
        • Language, schema, location
    • Data Model related issues
      • Types of sources
        • Structured, semi-structured, unstructured
      • Common data model of integrated system
      • Tight vs. loose integration
        • Use of a global schema
      • Query model
data integration schema integration
Data IntegrationSchema Integration
  • Methodology
    • Bottom-up process
    • Four main steps
      • Preparing the local schemas
      • Detecting what is common between the components of local schemas
        • Correspondence (what is common)
      • Solving the conflicts
        • Conflict (what is incompatible)
      • Integrating the different schemas according to the correspondences and conflicts detected in the previous steps
data integration schema integration40
Data IntegrationSchema Integration
  • Concept of Correspondence
    • Two complementary views of correspondence:
      • Structural correspondence (schema level: concepts)
      • Instance correspondence (instance level: data)
    • Structural correspondence
      • Five types of structural correspondence:
        • Identity
        • Independence
        • Complementarity
        • Subtyping
        • Common supertype
data integration schema integration41
Data IntegrationSchema Integration
  • Concept of Correspondence
    • Instance correspondence
      • Four types of instance correspondence:
        • Disjointed: the instances classes are disjointed
        • Inclusion: the set of one class is included to another class
        • Equivalence: the classes contain the same instances
        • Overlapping: the classes share some instances but not all
data integration schema integration42
Data IntegrationSchema Integration
  • Concept of Conflict
    • Conflicts occur in three possible ways : syntactic (naming conflicts), structural, semantic or instance
    • Syntactic conflicts (resolution: use of an ontology)
      • Synonyms. Two identical objects (entities, attributes, relationships) that have different names are synonyms
      • Homonyms. Two different objects that have identical names are homonyms
    • Structural conflicts (resolution: mapping function or transformation)
      • Domain. Two identical objects have different domains (Differences in dimension, units and scales)
      • Structure. The same concept is presented by different data structures (e.g., different attributes)
data integration schema integration43
Data IntegrationSchema Integration
  • Concept of Conflict
    • Structural conflict
      • In the left-hand schema, Address is an compound attribute, whereas in the right-hand one, Address is represented by an entity type
      • Resolution: transformation

Site 2

Site 1

data integration schema integration44
Data IntegrationSchema Integration
  • Concept of Conflict
    • Semantic conflicts
      • A semantic conflict appears when a contradiction appears between two representations A and B of the same application domain concept or between two integrity constraints (resolution?)
      • Example
        • In the left-hand schema, Customer is identified by CustId, whereas in the right-hand one, it is identified by Name

Site 1

Site 2

data integration schema integration45
Data IntegrationSchema Integration
  • Concept of Conflict
    • Instance conflicts
      • Instance conflicts are specific to existing data
      • Modelling constructs A and B that are recognized as corresponding can cover sets with different scopes
      • Examples
        • ZIP codes of addresses can be written like “NL-5600 MB” or “56oo MB” or “5600”
        • Different ZIP codes can be recorded for the same address (encoding errors)
        • Resolution: Data transforming… cleaning?
data integration46

Data Integration

Query Processing: multidatabase and federation

data integration integration architecture
Data IntegrationIntegration Architecture
  • Three Classical Architectures
    • Multidatabases
      • No integrated schema
      • Integrated access to different relational DBMS
    • Federated Databases
      • Integrated schema
      • Integrated access to different DBMS
      • Integrated access to different data sources (on the Web)
    • Data Warehouses
      • Materialized integrated data sources
      • Not here
data integration query processing
Data IntegrationQuery Processing
  • Classical Architecture: Multidatabase
    • Enable transparent access to multiple (relational) databases
      • Hides distribution, different SQL variants
      • Processes queries and updates against multiple databases (2-phase commit)
      • Does not provide any type of global schema (does not hide the different database schemas)
      • Example: IBM DataJoiner

DataJoiner

Sybase

Open Client

Oracle

SQL*Net

TCP/IP

Network

Sybase

Server

Oracle

Server

data integration query processing49
Data IntegrationQuery Processing
  • Classical Architecture: Multidatabase
    • Multidatabase schema

Multidatabase Schema

Sybase

Oracle

Source 1

Source 2

data integration query processing50
Data IntegrationQuery Processing
  • Classical Architecture: Multidatabase
    • Query processing

Multidatabase Schema

SELECT p2.title

FROM Sybase.PUBLICATIONS p1, Oracle.PAPERS p2

WHERE p1.title = p2.title

SELECT title

FROM PUBLICATIONS

SELECT title

FROM PAPERS

Sybase

Oracle

Source 1

Source 2

Sybase

Data

Oracle

Data

data integration query processing51
Data IntegrationQuery Processing
  • Classical Architecture: Multidatabase
    • Main properties
      • Transparency
        • Low level of transparency provided to the user

(The user is responsible for finding the relevant information, understanding each database schema, detecting and resolving the semantic conflicts, and finally, building the required view of the data in the sources)

      • Autonomy
        • Not intrusive against the autonomy of the data sources
        • Suitable when component systems are strongly autonomous
      • Methodology
        • Simplicity since there is no schema integration
      • Maintenance and evolution
        • No integrated schema maintenance
data integration query processing52
Data IntegrationQuery Processing
  • Classical Architecture: Federation
    • Integrated schema(s) and unique interface
      • Hides the semantic and location heterogeneity
      • Wrapper/Mediator hierarchy
        • Wrapper
          • Controls a local data source
          • Offers an homogeneous query interface based on a common data model
        • Mediator
          • Offers an abstract integrated view of several sources
          • Reconciles independent data structures to yield a unique, coherent, view of the data
    • Research projects
      • Tsimmis (Stanford)
      • Garlic (IBM)
      • Oasis (Dublin University)
data integration query processing53

Authors

Publication

ANR

PNR

Title

Title

FirstName

Authors

Surname

Journal

Affiliation

Pages

id:

ANR

id:

PNR

Data IntegrationQuery Processing
  • Classical Architecture: Federation
    • Typical example

Meditor

Views

Integrated schema

Import schemas

Wrapper (provides export schema)

Wrapper (provides export schema)

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“authors” type=“string”/>

<element name=“pages” type=“string”/>

</complexType>

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“author” type=“string”/>

</complexType>

<!ELEMENT Book(title,author)>

<!ELEMENT title(#PCDATA)>

<!ELEMENT author(#PCDATA)>

Oracle SQL DBMS

XML DBMS

data integration query processing54
Data IntegrationQuery Processing
  • Classical Architecture: Federation
    • Typical example

Views

  • <complexType name=“Book”>
  • <element name=“title” type=“string”/>
  • <element name=“author” type=“string”/>
  • </complexType>

Integrated schema

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“authors” type=“string”/>

</complexType>

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“author” type=“string”/>

</complexType>

Import schema DB2

Import schema DB1

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“authors” type=“string”/>

<element name=“pages” type=“string”/>

</complexType>

<complexType name=“Book”>

<element name=“title” type=“string”/>

<element name=“author” type=“string”/>

</complexType>

data integration query processing federation
Data IntegrationQuery Processing: Federation

Submit query Q

Return result A

Q = FOR $b IN //Book

RETURN $b/author

A1’={<author> … <\author>}

A = A1’  A2

Q1 = FOR $b IN //Book

RETURN $b/authors

Q2 = FOR $b IN //book

RETURN $b/author

A1

A2

A1=

{<authors> … <\authors>}

Q1

Q2

A2

Q1’ = SELECT a.name

FROM AUTHORS A

Q2’ = //book/author

A2=

{<author> … <\author>}

Q2’

Q1’

XML

DBMS

ORACLE

SQL DBMS

data integration query processing56
Data IntegrationQuery Processing
  • Classical Architecture: Federation
    • Main properties
      • Transparency
        • High level of transparency provided to the user. The user is not aware of the distribution and the heterogeneity of the integrated data sources
      • Autonomy
        • Each local data source have control over its sharable information
      • Methodology
        • Problems of defining an integrated schema
    • Web as Loosely Coupled Federation
      • Many different, widely distributed information systems
      • Heterogeneity
        • Structural homogeneous: XML
        • Semantically heterogeneous: no explicit schemas (ontology?)
      • Autonomy
        • Runtime autonomy: pages change on average every 4 weeks, dangling links
      • Distribution
        • Replication (proxies) and caching frequently used