Lessons from the tsimmis project
Download
1 / 49

Lessons from the TSIMMIS Project - PowerPoint PPT Presentation


  • 421 Views
  • Updated On :

Lessons from the TSIMMIS Project. Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego. Overview. TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward.

Related searches for Lessons from the TSIMMIS Project

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lessons from the TSIMMIS Project' - LionelDale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lessons from the tsimmis project l.jpg

Lessons from the TSIMMIS Project

Yannis Papakonstantinou

Department of Computer Science & Engineering

University of California, San Diego


Overview l.jpg
Overview

  • TSIMMIS’ goals, technical challenges, and solutions

  • Insufficiencies of the TSIMMIS’ framework

  • Going forward


Information resides on heterogeneous information sources l.jpg
Information Resides on Heterogeneous Information Sources

Personal

database

Ticker

Tape

WWW

Dialog

  • different interfaces

  • different data representations

  • redundant and conflicting information


Goal system providing integrated view of heterogeneous data l.jpg
Goal: System Providing Integrated View of Heterogeneous Data

Integration System

  • collects and combines information

  • provides integrated view, uniform user interface

Personal

database

Ticker

Tape

WWW

Dialog


Slide5 l.jpg

The Wrapper and Mediator Architecture

Client

Common

Data

Model

portfolios for each company

Mediator

stock market prices

business reports

Wrapper

Wrapper

Ticker

Tape

Dialog


Slide6 l.jpg

The Data Warehousing Approach to Integration

Client

Stored

Integrated

View

Mediator

Wrapper

Wrapper

Ticker

Tape

Dialog


The lazy integration approach l.jpg
The Lazy Integration Approach

Query Decomposition,

Translation and

Result Fusion

Client

IBM portfolio

Mediator

IBM price

IBM related reports

(in common model)

Wrapper

Wrapper

IBM related reports

Ticker

Tape

Dialog


Slide8 l.jpg

Wrappers & Mediators from High-Level Specifications

Client

Mediator Specification

Interpreter

Mediator

Mediator

Specification

Wrapper

Generator

Wrapper

Wrapper

Wrapper

Specification

Source

Source


Slide9 l.jpg

Challenge: Sources Without a Well-Structured Schema

Examples

  • semistructured

    • irregular

    • deeply nested

    • cross-referenced

  • incomplete schema knowledge

    • autonomous

    • dynamic

  • HTML pages

  • SGML documents

  • genome data

  • chemical structures

  • bibliographic information

  • results of the integration process


Slide10 l.jpg

Challenge: Different and Limited Source Capabilities

Client

retrieve IBM data

Mediator

(U = A + B)

retrieve IBM data

retrieve IBM data

Wrapper

(A)

Wrapper

(B)


Slide11 l.jpg

Mediator has to Adapt to Query Capabilities of Sources

Client

retrieve IBM data

Mediator

(U = A + B)

retrieve IBM data

retrieve IBM data

retrieve everything

(A) does not

allow selection

Wrapper

(A)

Wrapper

(B)


Slide12 l.jpg

Part B

  • Semistructured Data Representation

  • Mediator Generation

  • Wrapper Generation

  • Capabilities-Based Rewriting


Slide13 l.jpg

Representation of Semistructured Information using OEM

semantic

object-id

label

Set Value

<http://www/~doe, faculty, {&f1,&l1,&r1}>

<&f1, first_name, “John”>

<&l1, last_name, “Doe”>

<&r1, rank, “professor”>

Atomic Value

structural

object-id


Graph representation of oem data l.jpg
Graph Representation of OEM Data

<http://www/~doe, faculty, {&f1,&l1,&r1}>

<&f1, first_name, “John”>

<&l1, last_name, “Doe”>

<&r1, rank, “professor”>

http://www/~doe

faculty

first_name “John”

last_name “Doe”

rank “professor”


Oem structures represent arbitrary labeled graphs l.jpg
OEM Structures Represent Arbitrary Labeled Graphs

http://www/~smith

faculty

name “Mary Smith”

project “Air DB”

paper

author

name “John Doe”

author

name “Mary Smith”

title “Thin Air DB”

http://www/~doe

faculty

first_name “John”

last_name “Doe”

rank “professor”


Slide16 l.jpg

Overview

  • Semistructured Data Representation

  • Mediator Generation

    • Example of mediator specification

    • Language expressiveness

    • Implementation and performance

  • Wrapper Generation

  • Capabilities-Based Rewriting


Slide17 l.jpg

Merge Information Relating to a Faculty

faculty

name “John Doe”

rank “professor”

birthday “April 1”

papers

...

s1

s2

faculty

name “John Doe”

rank “professor”

papers

...

person

name “John Doe”

birthday “April 1”


Slide18 l.jpg

Mediator Specification Example

faculty

name “John Doe”

rank “professor”

birthday “April 1”

papers

...

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

s1

s2

faculty

name “John Doe”

rank “professor”

papers

...

person

name “John Doe”

birthday “April 1”


Slide19 l.jpg

Mediator Specification Example: Semantics of Rule Bodies

faculty

name “John Doe”

rank “professor”

birthday “April 1”

papers

...

<N faculty {<L V>}> :- <faculty {<name N> <LV>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

s1

s2

faculty

name “John Doe”

rank“professor”

papers

...

person

name “John Doe”

birthday “April 1”


Slide20 l.jpg

Mediator Specification Example: Semantics of Rule Heads

“John Doe”

faculty

name “John Doe”

rank“professor”

birthday “April 1”

papers

...

<N faculty {<LV>}> :- <faculty {<name N> <LV>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

s1

s2

faculty

name “John Doe”

rank“professor”

papers

...

person

name “John Doe”

birthday “April 1”


Slide21 l.jpg

Incrementally Add to Semantically Identified Object

“John Doe”

faculty

name“John Doe”

rank “professor”

birthday “April 1”

papers

...

<N faculty {<LV>}> :- <faculty {<name N> <LV>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

s1

s2

faculty

name“John Doe”

rank “professor”

papers

...

person

name “John Doe”

birthday “April 1”


Slide22 l.jpg

Irregularities & Incomplete Schema Knowledge

“John Doe”

faculty

name “John Doe”

rank “professor”

birthday “April 1”

papers

faculty

name “Mary Smith”

project “Air DB”

“Mary Smith”

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

s1

faculty

name “John Doe”

rank “professor”

papers

faculty

name “Mary Smith”

project “Air DB”

s2

person

name “John Doe”

birthday “April 1”


Slide23 l.jpg

Second Rule Attaches More Subobjects to View Objects

“John Doe”

faculty

name “John Doe”

rank “professor”

birthday “April 1”

papers

...

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

<N faculty {<LV>}> :- <person {<name N> <LV>}>@s2

s1

s2

faculty

name “John Doe”

rank “professor”

papers

...

person

name “John Doe”

birthday“April 1”


Slide24 l.jpg

Language Expressiveness

  • Information fusion problems solved by MSL

    • Irregularities

    • Incomplete knowledge of source structure

    • Transformation of cross-referenced structures

    • Inconsistent and redundant data

    • Use of arbitrary matching criteria

  • Theoretical analysis of expressiveness

    • Consider the relational representation of OEM graphs. Then MSL is equivalent to “SQL + special form of transitive closure”


Slide25 l.jpg

Inconsistent and Redundant Information

“John Doe”

faculty

name “John Doe”

rank “associate”

rank “assistant”

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

AND NOT <faculty {<name N> <L V1>}>@s1

s1

s2

faculty

name “John Doe”

rank “associate”

person

name “John Doe”

rank “assistant”


Slide26 l.jpg

Overview

  • Semistructured Data Representation

  • Mediator Generation

    • Example of mediator specification

    • Language expressiveness

    • Implementation and performance

  • Wrapper Generation

  • Capabilities-Based Rewriting


Mediator specification interpreter architecture l.jpg
Mediator Specification Interpreter Architecture

Result

Query

Mediator

Specification

Query Rewriter

logical datamerge

program

Cost-Based Optimizer

plan

Datamerge Engine

Queries to

Wrappers

Results


Query rewriting when known origins of information l.jpg
Query Rewriting When Known Origins of Information

  • <N faculty {<salary S>}> :- :- <faculty {<name N> <salary S>}>@s1 <N faculty {< rank R >}> :- <person {<name N> <rank R>}>@s2

  • <well-paid {<name N> <salary X>}> :- <N faculty {<salary X> <rank assistant>}> AND X>65000


Query rewriter pushes conditions to sources l.jpg
Query Rewriter PushesConditions to Sources

  • <N faculty {<salary S>}> :- :- <faculty {<name N> <salary S>}>@s1 <N faculty {< rank R >}> :- <person {<name N> <rank R>}>@s2

  • <well-paid {<name N> <salary X>}> :- <N faculty {<salary X> <rank assistant>}> AND X>65000

  • logical datamerge program <well-paid {<name N> <salary X>}> :- (<faculty {<name N> <salary X>}> ANDX>65000)@s1AND <person {<name N> <rank assistant>}>@s2


Passing bindings local join plans l.jpg
Passing Bindings & Local Join Plans

Passing Bindings

s1

s2

<salary X> :-

<faculty {<name $N>

<salary X>}>

AND X>65000

<name N> :- <person

{<rank assistant>}>

Local Join

s1

s2

<a {<s X> <n N>}>:-

<faculty {<name N>

<salary X>}>

AND X>65000

N

<name N> :- <person

{<rank assistant>}>


Slide31 l.jpg

Query Decomposition When Unknown Origins of Information

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

<X faculty {<S Y>}> :- <X faculty {<birthday “1/20”> <S Y>}>


Slide32 l.jpg

Plan Considers All Possible Sources of birthday

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1

<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

<X faculty {<S Y>}> :- <X faculty {<birthday “1/20”> <S Y>}>

s1

s2

birthday

name

name

birthday


Slide33 l.jpg

Overview

  • Semistructured-Data Representation

  • Mediator Generation

  • Wrapper Generation

  • Capabilities-Based Rewriting


Query translation in wrappers l.jpg
Query Translation in Wrappers

SELECT * FROM person

SELECT * FROM person

WHERE name=“Smith”

Wrapper

Query Translator

Result

Translator

find -all

find -n Smith

Source


Rapid query translation using templates and actions l.jpg
Rapid Query Translation Using Templates and Actions

SELECT * FROM person

SELECT * FROM person

WHERE name=“Smith”

SELECT * FROM person

{emit “find -all” }

SELECT * FROM person

WHERE name=$N

{emit “find -n $N”}

Template

Interpreter

Result

Translator

find -all

find -n Smith

Source


Slide36 l.jpg

Description of Infinite Sets of Supported Queries

  • uses recursivenonterminals

  • Example:

    • job description contains word w1 and word w2 and ...

    • SELECT subset(person) FROM person WHERE \CJob\CJob: job LIKE $W AND \CJob \CJob: TRUE


Slide37 l.jpg

Overview

  • Semistructured-Data Representation

  • Mediator Generation

  • Wrapper Generation

  • Capabilities-Based Rewriting


Capabilities based rewriter in mediator architecture l.jpg
Capabilities-Based Rewriter in Mediator Architecture

Query

logical datamerge program

Mediator

Specification

Query

Rewriter

Capabilities-

Based

Rewriter

supported

plans

Cost-Based

Optimizer

optimal plan

Datamerge

Engine

Wrapper

Supported Queries

Description

Wrapper

Supported Queries

Description


Slide39 l.jpg

Capabilities-Based Rewriter Finds Supported Plans

SELECT * FROM A

WHERE salary>65000

Supported Queries

SELECT * FROM A


Slide40 l.jpg

Capabilities-Based Rewriter Finds Most-Selective Supported Plans

SELECT * FROM B

WHERE salary>65000

Supported Queries

SELECT * FROM B

WHERE salary >65000

SELECT * FROM B


Slide41 l.jpg

Capabilities-Based Rewriter Architecture

Query

Query Capabilities

Description

Component SubQuery

Discovery

Component SubQueries

Plan Construction

Plans (not fully optimized)

Plan Refinement

Algebraically optimal plans


What tsimmis achieved l.jpg
What TSIMMIS Achieved

  • system for integration of heterogeneous sources

  • challenges and solutions

    • semistructured data & incomplete schema knowledge

      • appropriate specification language and query processing algorithms

    • limited and different query capabilities

      • query translation algorithm

      • capabilities-based query rewriting algorithm


Slide43 l.jpg

Overview

  • TSIMMIS’ goals, technical challenges, and solutions

  • Insufficiencies of the TSIMMIS’ framework

  • Going forward


Insufficiencies of the tsimmis framework l.jpg
Insufficiencies of the TSIMMIS framework

  • OEM was really unstructured data

    • some loose and partial schematic info may pay off tremendously

  • too “databasy” user/mediator/source interaction


Slide45 l.jpg

Overview

  • TSIMMIS’ goals, technical challenges, and solutions

  • Insufficiencies of the TSIMMIS’ framework

  • Going forward


Web emerges as a distributed db and xml as its data model l.jpg
Web emerges as a Distributed DB and XML as its Data Model

XMAS Query

Language

Also export:

1. Schemas & Metadata

(XML-Data, RDF,…)

2. Description of

supported queries

XML View

Document(s)

XML View

Document(s)

XML View

Document(s)

Data

Source

Wrapper

Native XML

Database

Legacy

Source


Definition of integrated views l.jpg
Definition of Integrated Views

Integrated

XML View

View Definition in

XMAS

Mediator

XML View

Document(s)

XML View

Document(s)

XML View

Document(s)

Data

Source

Data

Source

Data

Source


Non materialized views in the mix mediator system l.jpg
Non-Materialized Views in the MIX mediator system

Blended Browsing &

Querying (BBQ) GUI

Application

XMAS query

XML document

Integrated

View DTD

DOM for Virtual XML Doc’s

View Definition in

XMAS

MIX Mediator

DTD

Inference

Query

Processor

Source DTD

XML Source

XML Source


Slide49 l.jpg

Application

XML Document

Fragments

Blended Browsing &

Querying (BBQ) GUI

DOM (VXD)

Client API

XMAS

Query

View DTD

MIX Mediator

XMAS Mediator

View Definition

Resolution

Unfolded Query

DTD

Inference

Simplification

Translation to Algebra

Optimization

DTD

Execution

XMAS

Query

XML

Document

Fragments

XML

Source 1

RDB2XML

Wrapper

XML

Source 2

RDB


ad