cse 636 data integration n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSE 636 Data Integration PowerPoint Presentation
Download Presentation
CSE 636 Data Integration

Loading in 2 Seconds...

play fullscreen
1 / 47

CSE 636 Data Integration - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

CSE 636 Data Integration. XML Distributed Query Processing Slides by Yannis Papakonstantinou. Overview. The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators Issues Overview An Algebra-Based Architecture Navigation-driven Evaluation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSE 636 Data Integration' - cael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cse 636 data integration

CSE 636Data Integration

XML Distributed Query Processing

Slides by Yannis Papakonstantinou

overview
Overview
  • The Virtual XML View Approach towards Data Integration
  • Query Processing in XML Mediators
    • Issues Overview
    • An Algebra-Based Architecture
    • Navigation-driven Evaluation
slide3

Data Integration Requirements in eBusiness Applications

  • It starts with …“Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, …
  • Then the problem comes up…“The applications uses information assets widely distributed across my enterprise”
  • If only….“Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded”
slide4

View-Based Approach: Wrappers Export Basic Source Views

customer_table

customer

name

John

id

56

city

Chicago

customer

name

George

id

58

city

Chicago

<customer_table>

<customer>

<name>John</name>

<id>56</id>

<city>Chicago</city>

</customer>

<customer>

<name>George</name>

<id>58</id>

<city>Chicago</city>

</customer>

</customer_table>

Client

Application

Integrated (XML) View

Mediator

(XML) View

(XML) View

Wrapper

Wrapper

Customers

Rel. DB

Orders

Rel. DB

slide5

Wrappers Export Basic Source Views

order_table

order

id

1034

cid

56

item

chips

order

id

1567

cid

56

item

salsa

Client

Application

Integrated (XML) View

Mediator

(XML) View

(XML) View

Wrapper

Wrapper

Customers

Rel. DB

Orders

Rel. DB

slide6

Mediators Export Integrated Views, Tailored to Application Needs

customers

customer

name

John

id

56

city

Chicago

orders

order

id

1034

item

chips

order

customer

order_table

order

id

1034

cid

56

item

chips

order

id

1567

cid

56

item

salsa

customer_table

customer

name

John

id

56

city

Chicago

customer

name

George

id

58

city

Chicago

Client

Application

Integrated (XML) View

Mediator

(XML) View

(XML) View

Wrapper

Wrapper

Customers

Rel. DB

Orders

Rel. DB

slide7

Virtual Views: Query-Driven Mediator Operation

Find all Chicago

customer names,

along with their

ordered items

Application

Retrieve Chicago

customer names

and id’s

Retrieve all cid’s

and item names

of orders

Mediator

Wrapper

Wrapper

Customers

Database

Orders

Database

slide8

customers

customer

name

John

ordered_items

item

chips

item

salsa

customer

On-Demand (Query-Driven)Mediator Operation

customer

name

John

id

56

order

cid

56

item

chips

order

cid

56

item

salsa

Application

Mediator

Wrapper

Wrapper

Customers

Database

Orders

Database

multiple plans are possible
Multiple Plans are Possible
  • Retrieve customers
  • For each customer find matching orders
slide10

A New Kind of Query Processing Problem

  • Build and Run “Optimal” Plan
    • Consisting of operators that
      • Collect source info using supported queries and commands
      • Combine info into XML result
slide11

Challenges in Query Processing & Optimization

  • Operate within the Limited and Different Capabilities of the Sources
    • Describe sets of supported queries
    • Use most efficient supported queries
  • Optimize plans/queries sent to sources
    • Estimate Costs of Plans
    • Adapt Plans Along the Way
    • Beyond Conjunctive Queries
    • Compose Queries/Views Efficiently
  • Schema inference & optimization
  • Combine navigation & querying
slide12

From Limited Wrappers to Efficient Plans for Extended Query Sets

Queries supported

by mediator

Queries supported

by wrapper

  • Answering Queries Using Views
  • But with Infinite Sets of Views
  • Increasing Relevance due to Web Services

all queries

over schema

Source

Data &

Schema

Source

Data &

Schema

slide13

Challenges in Query Processing & Optimization

  • Operate within the Limited and Different Capabilities of the Sources
    • Describe sets of supported queries
    • Use most efficient supported queries
  • Optimize plans/queries sent to sources
    • Estimate Costs of Plans
    • Adapt Plans Along the Way
    • Beyond Conjunctive Queries
    • XQuery processing
  • Schema inference & optimization
  • Combine navigation & querying
    • Build iterator models for low memory footprint
slide14

Navigation-Driven Evaluation of Query Result

customers

customer

name

John

id

56

city

Chicago

orders

order

id

1034

item

chips

order

customer

order_table

order

id

1034

cid

56

item

chips

order

id

1567

cid

56

item

salsa

customer_table

customer

name

John

id

56

city

Chicago

customer

name

George

id

58

city

Chicago

slide15

Navigation-Driven Evaluation

right(p)

down(p)

p

Input: client

navigations

view definition

ans = q( s1 … sn )

Client

result

Lazy Mediator

Output: source

navigations

s1

sn

...

XML source

XML source

slide16

Navigation-Driven Evaluation

Input: client

navigations

view definition

ans = q( s1 … sn )

Client

result

Lazy Mediator

Output: source

navigations

s1

sn

...

XML source

XML source

slide17

Navigation-Driven Evaluation

Input: client

navigations

view definition

ans = q( s1 … sn )

Client

result

Lazy Mediator

Output: source

navigations

s1

sn

...

XML source

XML source

slide18

Navigation-Driven Evaluation

Input: client

navigations

view definition

ans = q( s1 … sn )

Client

result

Lazy Mediator

Output: source

navigations

s1

sn

...

XML source

XML source

slide19

Navigation-Driven Evaluation

Input: client

navigations

view definition

ans = q( s1 … sn )

Client

result

Lazy Mediator

Output: source

navigations

s1

sn

...

XML source

XML source

mixing querying navigation
Mixing Querying & Navigation

customers

customer

name

John

id

56

city

Chicago

orders

order

id

1034

item

chips

order

customer

Find details of all

salsa orders below

visited node

slide21

Challenges in Mixing Querying & Navigation

  • Two-dimensional navigation
    • Reminds of cursors but there are multiple continuation points
  • Controlling size + shape
  • Contextualizing queries by navigation
overview1
Overview
  • The Virtual XML View Approach towards Data Integration
  • Query Processing in XML Mediators
    • Issues Overview
    • An Algebra-Based Architecture
    • Navigation-driven Evaluation
slide23

An Algebra-Based Query Processor Architecture

Client

XQuery

Navigation

Requests

Results

XQuery

Views

Translation to Algebra

Algebra Plan

Source Schemas

& Types

Source

Description

Rewriter/Optimizer

Physical Algebra Plan

Functions

Plan Execution Engine

Function

Description

Queries & Fetch

Requests to Sources

slide24

Query Processing on Tuple-Oriented Algebra Enables…

  • Well-known efficient physical implementations of the operators
  • Join optimization
  • Nested data by nested plans or group-by
  • Efficient iterator model
xquery queries views for xml
XQuery: Queries & Views for XML

<customers>

{

for $cust in document(“db”)/customer

return

<customer>

{

$cust/id,

for $order in document(“db”)/order

where $order/cid = $cust/id

return <order> { $order/id } </order>

}

</customer>

}

</customers>

access and navigation
Access and Navigation

$db1 $cust $cust_id

ct c1 i1

ct c2 i2

$db1 $cust

ct c1

ct c2

ct

c1

i1

$db1

ct

c2

i2

getD $cust, id  $cust_id

db

customer_table

customer

name

John

id

56

customer

name

George

id

58

getD $db1, customer  $cust

source db, [$db1]

slide27

Simplification Using Schema Inference

$db1 $cust_id

ct i1

ct i2

ct

$db1

ct

Since $cust_id  $cust and

$cust is “useless” otherwise

db

customer_table

customer

name

John

id

56

customer

name

George

id

58

getD $db1, customer/id  $cust_id

i1

i2

source db, [$db1]

nested plans
Nested Plans

Plan p

$db1 $cust_id $orders

ct i1 [o11…]

nestedSrc $part

$db1 $cust_id

ct i1

$db1 $cust_id

ct i2

$db1 $cust_id

ct i1

$db1 $cust_id

ct i2

$db1 $cust_id $part

ct i1

ct i2

$db1 $cust_id

ct i1

ct i2

ct i2 [o21…]

apply $part, p  $orders

for $part

joins and selections
Joins and Selections

$db1 $cust_id

ct i1

$cust_id

$db1 $cust_id $db2 $order $cust_id2 $order_id

$cust_id2=?

$db2 $order $cust_id2 $order_id

getD $order, id  $order_id

getD $order, cid  $cust_id2

getD $db2, order  $order

nestedSrc $part

source db, [$db2]

constructors
Constructors

… $order_id $oidL

… o1 [o1]

… o2 [o2]

… $oidL $oidE

… [o1] e1

… [o2] e2

e2

order

e1

order

$orders

[e1, e2]

listify $oidE  $orders

o2

crEl order, $oidL  $oidE

o1

crList $order_id  $oidL

… $order_id

… o1

… o2

plan decomposition
Plan Decomposition
  • Within Rewriting Optimizer
  • Rules replacing “leaf” trees
  • May move commutable parts
  • Catch: No projection limitation
slide34

Replacing Nested Plans with GroupBy/Outerjoin Combinations

apply $part, p  $R

apply $part, p  $R

p3

p3

nestedSrc $part

groupBy S(p1)  $part

p2

nestedSrc $part

for $part

p1

p1

p2

overview2
Overview
  • The Virtual XML View Approach towards Data Integration
  • Query Processing in XML Mediators
    • Issues Overview
    • An Algebra-Based Architecture
    • Navigation-driven Evaluation
slide37

Building Navigation-Driven Evaluation on the Algebra

Client

Source

access

Source

access

Source

Source

slide38

Think of Each Operator as a Lazy Mediator

$db1 $cust $cust_id

ct c1 i1

ct c2 i2

$db1 $cust

ct c1

ct c2

root

tuple

$db1

customer_table

customer

name

John

id

56

customer

name

George

id

58

c1

$cust

$cust_id

i1

tuple

getD $cust, id  $cust_id

c2

$db1

$cust

i2

$cust_id

slide39

Navigation-Driven Evaluation of Operators

  • Augmented with
  • nextTuple(p)
  • p.attr

Input: client

navigations

result

Lazy Operator

Output: source

navigations

s1

sn

...

Result of

Operator below

Result of

Operator below

slide40

Use of Semantic Id’s in Navigation-Driven Evaluation

<f’1, f’2, …, f’n>

Operator

State

V1:

V2:

Vn:

Other: …

Proceed

down/right

f’1

f’2

f’n

r/d(<f1, f2, …, fn>)

Operator

State

V1:

V2:

Vn:

Other: …

f1

f2

fn

slide41

Fragments Reduce the “Set State” – “Produce State” Overhead

root

customer

Hole

3

name,

“John”

order

Hole

2

oid,

123

lineitem

lineitem

lineitem

Hole

1

slide42

Fragments Reduce the “Set State” – “Produce State” Overhead

root

customer

Hole

3

name,

“John”

order

Hole

5

order ordnum=16

oid,

123

lineitem

lineitem

lineitem

Hole

1

Hole

4

lineitem

lineitem

slide43

Controlling the Size and Shape of Fragments

Client

listify

Client-Server

Interaction

Controller

listify

Source

access

Source

access

Source

Source

fragmentation strategies
Fragmentation Strategies
  • Fixed Fragment Size
    • Ideal for depth-first, left-to-right navigation
  • Adaptive Fragment Size
    • Assign larger pieces to those who use them
slide46

Response Performance for Breadth-First and Depth-First

Depth First traversal

Breadth First traversal

references
References
  • Navigation-Driven Evaluation of Virtual Mediated Views
    • Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov
    • EDBT 2000
  • Architecture and Implementation of an XQuery-based Information Integration Platform
    • Yannis Papakonstantinou, Vasilis Vassalos
    • IEEE Data Eng. Bull. 25(1), 2002
  • XML queries and algebra in the Enosys integration platform
    • Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov
    • Data Knowl. Eng. 44(3), 2003