Answering tree pattern queries using views
Download
1 / 39

Answering Tree Pattern Queries Using Views - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Answering Tree Pattern Queries Using Views. Laks V.S. Lakshmanan , Hui (Wendy) Wang , and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com. Outline. Motivation Problems Studied Without schema With schema Recursive schemas Related Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Answering Tree Pattern Queries Using Views ' - jaimin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Answering tree pattern queries using views

Answering Tree Pattern Queries Using Views

Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao

University of British Columbia

Vancouver, BC

Amazon.com


Outline
Outline

  • Motivation

  • Problems Studied

  • Without schema

  • With schema

  • Recursive schemas

  • Related Work

  • Summary & Future Work


Motivation 1 3
Motivation 1/3

  • Integration of existing data sources.

    • Local as view (LAV) – one of the well-known approaches.

    • Each source = a materialized view over some global database.

    • Answer to query over global DB = answer to query using (materialized) views.


Motivation 2 3
Motivation 2/3

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

Source = View “//Trials//Trial” over some DB containing

clinical data – trials, their status, patient data, etc.

Consider query Q: //Trials[//Status]//Trial over [unknown] original DB.

How can and should we answer it using above source?


Motivation 3 3

?

?

Motivation 3/3

<PharmaLab> (1)

<Trials @type=“T1”> (2)

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

</Trials>

<Trials @type=“T2”> (13)

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

</Trials>

</PharmaLab>

One possible original DB

//Trials//Trial

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>


Motivation 3 31

?

?

Motivation 3/3

Q: //Trials[//Status]//Trial

<PharmaLab> (1)

<Trials @type=“T1”> (2)

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

</Trials>

<Trials @type=“T2”> (13)

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

</Trials>

</PharmaLab>

One possible original DB

//Trials//Trial

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>


Motivation 3 32
Motivation 3/3

<PharmaLab> (1)

<Trials @type=“T1”> (2)

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

</Trials>

<Trials @type=“T2”> (13)

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

</Trials>

</PharmaLab>

One possible original DB

Contained rewriting

◦ “●[//Status]”

{ (3) }

//Trials//Trial

<Trial> (3) <Patient> (4) John Doe </Patient> …

<Status> (10) Complete </Status> </Trial>

<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>

<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>


Problems studied 1 3
Problems Studied 1/3

  • Equivalent Rewriting: Given Q and views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs.

    • Appropriate for query optimization.

  • Contained Rewriting: Given Q and V, find an expression E s.t. V◦E  Q overall all possible input DBs, and V◦E is maximal among all such rewritings.

    • Most appropriate for information integration [Halevy, Lenzerini, Pottinger & Halevy].


Problems studied 2 3
Problems Studied 2/3

  • No Schema: Given Q and V, find a maximally contained rewriting (MCR) of Q using V.

  • With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V.

  • Focus: Tree Pattern Queries (XP/,//, [ ]).

    Schema without cycles, union, and recursion.


Problems studied 3 3

//a

//a[//b]/c 

b

c

Problems Studied 3/3

  • Given Q & V:

    • RΞ V ◦ E Q.

Rewriting query

Compensation query

  • Want MCR in the absence and in the presence of a schema.


Without schema 1 6

/b

d

Q1

Without Schema 1/6

  • Question 1: Does an MCR always exist?

/a

/a

No MCR for

Q1 and for Q2.

What went

wrong?

b

b

c

d

Q2

V

distinguished (answer) node


Without schema 2 6

//Trials

Status

Patient

Without Schema 2/6

(1)

f

//Trials

//Trials

(1)

V

Trial

Trial

(2)

(3)

E

V

Q

Status

Patient

(2)

(3)

Unfulfilled obligations

Clip Away Tree (CAT)

f – useful embedding


Without schema 3 6
Without Schema 3/6

Theorem: Q, V – tree pattern queries. Then Q is answerable using V iff there is a useful embedding from Q to V.

Testing Existence of MCR:

//a

1

1,2

//a

a

2

1:{2}, 2:{}

1:{2,3}, 2:{3}

a

a

b

6

a

b

2:{6}

2:{6}, 3:{4}

3

b

b

c

6:{7}

c

4

7

c

4:{5}, 6:{7}

Q

5

c

V

d

e


Without schema 4 6
Without Schema 4/6

 Two embeddings – corresponding irredundant CRs.

//a

a

//a

b

a

a

b

b

c

a

need  for

expressing

MCR!

b

c

a

e

c

b

c

d

e

c

d


Without schema 5 6
Without Schema 5/6

  • Can test existence of MCR in poly time.

  • However, MCRs can be exponentially large (closure issue).

//a

//a

a

a

a

How many

irredundant CRs

are possible?

b

b

V

b

Q

c

c

c

d

e


Without schema 5 61
Without Schema 5/6

//a

//a

//a

a

a

a

a

b

b

V

b

b

Q

c

c

c

c

d

e

d

e


Without schema 5 62
Without Schema 5/6

//a

//a

//a

a

a

a

a

b

b

V

b

b

Q

c

c

c

c

d

e

a/b/c/e

d


Without schema 5 63
Without Schema 5/6

//a

//a

//a

a

a

a

a

b

b

V

b

b

Q

c

c

c

c

d

e

a/b

e

c

e


Without schema 5 64
Without Schema 5/6

//a

//a

//a

a

a

a

a

b

b

V

b

b

Q

c

c

c

c

d

e

a/b

a/b/c/e

MCR = union of exponential

# CRs in the worst case!

c

e


Without schema 6 6
Without Schema 6/6

  • Summary:

  • Can test existence of MCR in poly time.

    • Exact characterization.

  • MCR may be union of exponentially many CRs in the worst case.

  • Algorithm for generating MCR.


With schema 1 6
With Schema 1/6

  • Given Query Q, view V, schema S.

  • Infer all constraints C implied by S.

  • Chase V w.r.t. C.

  • Look for MCR of Q w.r.t. chased view.


With schema 2 6
With Schema 2/6

Auctions

*

  • E.g. constraints:

  • c_a has ≤ 1 bids

  • child

  • Every Auction having

  • a person desc also has

  • an item desc.

  • every path from

  • Auction to name

  • goes via bids.

Auction

*

?

open_auction

closed_auction

+

?

bids

+

+

person

item

name


With schema 3 6
With Schema 3/6

//Auction

//Auction

bids

bids

o_a

c_a

person

item

bids

bids

name

Q

V


With schema 3 61
With Schema 3/6

Auctions

//Auction

*

Auction

o_a

c_a

?

*

open_auction

closed_auction

bids

bids

+

?

bids

person

item

+

+

p

i

person

item

name

n

name


With schema 4 6
With Schema 4/6

//Auction

//Auction

o_a

c_a

bids

bids

person

item

bids

bids

name

person

item

p

i

Q

MCR = identity query.

name

n


With schema 5 6
With Schema 5/6

Another Example:

Auctions

*

Auction

//Auction

*

?

closed_auction

item

name

open_auction

Q

+

?

//Auction

item

bids

buyer

+

person

person

V

name

How to answer Q using V?


With schema 5 61
With Schema 5/6

Another Example:

Auctions

*

Auction

//Auction

*

?

closed_auction

item

name

open_auction

Q

+

?

//Auction

item

bids

buyer

+

person

item

person

So what’s the

compensation query?

name

name


With schema 5 62
With Schema 5/6

Another Example:

Auctions

*

Auction

//Auction

*

?

closed_auction

item

name

open_auction

Q

+

?

//Auction

item

bids

buyer

+

person

item

person

name

MCR = V ◦ “●//name”

name


With schema 6 6
With Schema 6/6

  • Challenges and Highlights:

    • Naïve chase can explode.

      • Make chase context aware.

  • Exact characterization of schema w/o recursion and union in terms of constraints.

  • Efficient algo. for inferring the constraints.

  • Efficient algo. for chase.

  • And for finding MCR.

  • MCR is unique, if it exists.


Recursive schemas 1 2
Recursive Schemas 1/2

a

//a

//a

?

b

b

b

b

*

*

V

d

c

c

d

Q

What is the MCR?


Recursive schemas 2 2
Recursive Schemas 2/2

a

//a

//a

?

b

b

b

b

*

*

V

d

c

c

d

Q

//a

b

c

d


Recursive schemas 2 21
Recursive Schemas 2/2

a

//a

//a

?

b

b

b

b

*

*

V

d

c

c

d

Q

//a

b

b

c

d


Recursive schemas 2 22
Recursive Schemas 2/2

a

//a

//a

?

b

b

b

b

*

*

V

d

c

c

d

Q

//a

b

b

d

c


Recursive schemas 2 23
Recursive Schemas 2/2

a

//a

//a

?

b

b

b

b

*

*

V

d

c

c

d

Q

//a

MCR = union of

four CRs.

Behavior similar

to no schema.

b

b

b

c

d


Related work 1 2
Related Work 1/2

  • QAV for relational – huge body of work [Halevy 01].

  • Regular path queries and semi-structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99].

  • Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].


Related work 2 2
Related Work 2/2

  • Key differences b/w equivalent & contained rewriting:

  • Unique rewriting (even w/o schema).

  • MCR may involve union of (possibly exponentially many) CRs.

  • Study of contained rewriting in presence of schema.

  • Lot of work on semantic caching [Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … .


Summary future work 1 2
Summary & Future Work 1/2

  • QAV using (maximally) contained rewriting ( information integration).

  • Without schema: existence, characterization, closure, generation of MCR.

  • With Schema: extract essence using constraints, chase, similar problems as above.

  • Impact of recursion.

  • Experiments.


Summary future work 2 2
Summary & Future Work 2/2

  • Impact of wildcard, disjunction, order …

  • Impact of union, recursion, …

  • Other integration models (e.g., GLAV)

  • QAV for XQuery.


ad