data integration n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data integration PowerPoint Presentation
Download Presentation
Data integration

Loading in 2 Seconds...

play fullscreen
1 / 58

Data integration - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

Data integration. Most slides are borrowed from Dr. Chen Li, UC Irvine. Motivation. Biblio sever. Legacy database. Plain text files. Support seamless access to autonomous and heterogeneous information sources. Comparison Shopping. Applications. Comparison shopping. Lowest price of the

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Data integration


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data integration

Data integration

Most slides are borrowed from Dr. Chen Li, UC Irvine

motivation
Motivation

Biblio sever

Legacy database

Plain text files

Support seamless access to autonomous and heterogeneous information sources.

applications

Comparison Shopping

Applications
  • Comparison shopping

Lowest price of the

DVD: “The Matrix”?

  • Supply-chain management

Buyer 1

Supplier 1

Buyer 2

Supplier 2

Integrator

Supplier M

Buyer M

mediation architecture
Mediation architecture

Mediator

Wrapper

Wrapper

Wrapper

Source 1

Source 2

Source n

challenges
Challenges
  • Sources are heterogeneous:
    • Different data models: relational, object-oriented, XML, …
    • Different schemas and representations. E.g.,

“Keanu Reeves” or “Reeves, Keanu” or “Reeves, K.” etc.

  • Describe source contents
  • Use source data to answer queries
  • Sources have limited query capabilities
  • Data quality
  • Performance
  • … …
research projects
Research projects
  • Garlic (IBM),
  • Information Manifold (AT&T)
  • InfoSleuth (MCC),
  • Tsimmis, InfoMaster (Stanford)
  • Internet Softbot/Razor/Tukwila (U Wash.)
  • Hermes (Maryland)
  • Telegraph / Eddies (UC Berkeley)
  • Niagara (Univ Wisconsin)
  • DISCO, Agora (INRIA, France)
  • SIMS/Ariadne (USC/ISI)
  • Emerac/Havasu (ASU)
industry
Industry
  • Nimble Technology
  • Enosys Markets
  • IBM
  • BEA
virtual integration
Virtual integration
  • Leave the data in the sources
  • When a query comes in:
    • Determine the relevant sources to the query
    • Break down the query into sub-queries for the sources
    • Get the answers from the sources, filter them if needed and combine them appropriately
  • Data is fresh
  • Otherwise known as

On Demand Integration

Slides from Dr. Michalis Petropoulos

virtual integration architecture

Wrapper

Wrapper

Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

Query

Result

End User

Mediation

Language

Optimization

& Execution

Mediator

Global

Schema

Web Services

XML

1

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

virtual integration architecture1

Wrapper

Wrapper

Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

Query

Result

End User

Mediation

Language

Optimization

& Execution

Mediator

Global

Schema

Web Services

2

XML

1

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

virtual integration architecture2

Wrapper

Wrapper

Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

Query

Result

End User

Mediation

Language

Optimization

& Execution

3

Mediator

Global

Schema

Web Services

2

XML

1

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

virtual integration architecture3

Wrapper

Wrapper

Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

4

Query

Result

End User

Mediation

Language

Optimization

& Execution

3

Mediator

Global

Schema

Web Services

2

XML

1

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

virtual integration architecture4

Wrapper

Wrapper

Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

4

Query

Result

5

End User

Mediation

Language

Optimization

& Execution

3

Mediator

Global

Schema

Web Services

2

XML

1

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

virtual integration architecture5
Virtual Integration Architecture

Design-Time

Run-Time

Mapping Tool

Query

Reformulation

4

Query

Result

5

End User

Mediation

Language

Optimization

& Execution

3

6

Mediator

Global

Schema

Web Services

2

XML

1

Wrapper

Wrapper

Data

Source

Data

Source

Local

Schema

Local

Schema

Slides from Dr. Michalis Petropoulos

outline
Outline
  • Basics: theories of conjunctive queries
  • Global-as-view (GAV) approach to data integration
  • Local-as-view (LAV) approach to data integration
conjunctive queries cq s in datalog

head

Conjunctive Queries (CQ’s) in Datalog
  • Most common form of query; equivalent to select-project-join (SPJ) queries
  • Useful for data integration
  • Form: q(X) :- p1(X1), p2(X2),…, pn(Xn).
  • Head q(X) represents the query answers
  • Body p1(X1), p2(X2),…, pn(Xn)represents the query conditions
    • The head is true if all the subgoals are true.
    • Each pi(Xi) is called a subgoal. Xi is a vector of variables or constants.
    • Shared variables represent join conditions
    • Constants represent “Attribute=const” selection conditions
    • A relation can appear in multiple predicates (subgoals)

body

q(X) :- p1(X1), p2(X2), …, pn(Xn)

subgoals

conjunctive queries
Conjunctive queries
  • Head and subgoals are atoms.
  • An atom consists of a predicate applied to zero or more arguments
  • Predicates represent relations.
  • An atom is true for given values of its variables iff the arguments form a tuple of the relation.
  • Whenever an assignment of values to all variables makes all subgoals true, the rule asserts that the resulting head is also true.
conjunctive queries example
Conjunctive Queries: example
  • Schema

student(name, courseNum), course(number, Instructor)

  • SQL

SELECT name

FROM student, course

WHERE student.courseNum=course.number AND instructor=‘Li’;

  • Equal to:

ans(SN) :- student(SN, CN), course(CN,’Li’).

    • Predicates student and course correspond to relations names
    • Two subgoals: student(SN, CN) andcourse(CN,’Li’)
    • Variables: SN, CN. Constant: ‘Li’
    • Shared variable, CN, corresponds to “student.courseNum=course.number”
    • Variable SN in the head: the answer to the query
why not sql
Why not SQL
  • Datalog is more concise
  • Let us state some general principles
    • e.g., containment of rules that are almost impossible to state correctly in SQL.
    • Will see that later
  • Recursion is much easier to express in Datalog.
answer to a cq
Answer to a CQ
  • For a CQ Q on database D, the answer Q(D) is a set of heads of Q if we:
    • Substitute constants for variables in the body of Q in all possible ways
    • Require all subgoals to be true
  • Example: ans(SN) :- student(SN, CN), course(CN,’Li’).
    • Tuples are also called facts:

student(Jack, 184), student(Tom,215), …, course(184,Li), course(215,Li), …

    • Answer “Jack”: SNJack,CN184
    • Answer “Tom”: SNTom,CN215
    • Answer “Jack”: SNJack,CN215 (duplicate eliminated)

Course

Student

query containment
Query containment
  • For two queries Q1 and Q2, we say Q1 is contained in Q2, denoted Q1Q2, if any database D, we have Q1(D)Q2(D).
  • We say Q1 and Q2 are equivalent, denoted Q1Q2, if Q1(D)Q2(D) and Q2(D)Q1(D).
  • Example:

Q1: ans(SN) :- student(SN, CN), course(CN, ’Li’).

Q2: ans(SN) :- student(SN, CN), course(CN, INS).

We have: Q1(D)Q2(D).

another example
Another example

Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y).

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y).

  • We have: Q2 Q1
  • Proof:
    • For any DB D, suppose p(x,y) is in Q2(D). Then there is a w such that

r(x,w), b(w,w), and r(w,y) are in D.

    • For Q1, consider the substitution: X x, W w, Z w, Y y.
    • Thus the head of Q1 becomes p(x,y), meaning that p(x,y) is also in Q1(D).
  • In general, how to test containment of CQ’s?
    • Containment mappings
test containment
Test containment
  • Two approaches:
    • Containment mappings.
    • Canonical databases.
  • Really the same in the simple CQ case covered so far.
  • Containment test is NP-complete, but CQ’s tend to be small so here is one case where intractability doesn’t hurt you.
containment mappings
Containment mappings
  • A containment mapping from Q2 to Q1: Map variables of Q2 to variables of Q1, such that:
    • Head of Q2 becomes head of Q1;
    • Each subgoal of Q2 becomes somesubgoal of Q1.
      • It is not necessary that every subgoal of Q1 is the target of some subgoal of Q2.
  • Q1 Q2iff there is a containment mapping from Q2 to Q1.
    • Note that the containment mapping is opposite the containment --- it goes from the larger (containing CQ) to the smaller (contained CQ).
  • Example:

Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y).

Q2: p(X,Y) :- r(X,W), b(W,W), r(W,Y).

    • Containment mapping from Q1 to Q2: X  X, Y  Y, W  W, Z  W
    • No containment mapping from Q2 to Q1:
      • For b(W,W) in Q2, its only possible target in Q1 is b(W,Z)
      • However, we cannot have a mapping WW and WZ, since each variable cannot be mapped to two different variables
a slightly different example
A slightly different example
  • Predicates can be repeated

Q1: p(X,Y):- r(X,Z), g(Z,Z), r(Z,Y).

Q2: p(A,B):- r(A,C), g(C,D), r(D,B).

Containment mapping m:m(A)=X;m(B)=Y;m(C)=m(D)=Z.

another example1

X

Y

Z

Another Example

Q1: p(X,Y):- r(X,Y), g(Y,Z).

Q2: p(A,B):- r(A,B), r(A,C).

Q1 looks for:

Q2 looks for:

A

B

C

example continued

And not

every subgoal

need be a

target.

Notice two

subgoals can

map to one.

Example - Continued

Q1: p(X,Y):- r(X,Y), g(Y,Z).

Q2: p(A,B):- r(A,B), r(A,C).

Containment mapping:m(A)=X;m(B)=m(C)=Y.

example concluded
Example - Concluded
  • Q1: p(X,Y):- r(X,Y), g(Y,Z).
  • Q2: p(A,B):- r(A,B), r(A,C).
  • No containment mapping from Q1 to Q2.
    • g(Y,Z) cannot map anywhere, since there is no gsubgoal in Q2.
  • Thus, Q1 properly contained in Q2.
extending cq s
Extending CQ’s
  • CQ’s with built-in predicates:
    • We can add more conditions to variables in a CQ.
    • Example:

student(name, GPA, courseNum), course(number,instructor,year)

Q1(SN) :- student(SN, G, CN), course(CN, ’Li’), G>=3.5.

Q2(SN) :- student(SN, G, CN), course(CN, ’Li’), G>=3.5, Y < 2002.

Q2(SN)  Q1(SN).

  • Datalog queries:
    • a (possibly infinite) set of CQ’s with (possibly) recursion
    • Example: parent(Parent, Child)
    • Query: finding all ancestors of Tom

ancestor(P,C) :- parent(P, C).

ancestor(P,C) :- ancestor(P,X), parent(X, C).

result(P) :- ancestor(P, ‘tom’).

slide30
Although CQ theory first appeared at a database conference, the AI community has taken CQ’s to heart.
  • CQ’s, or similar logics like description logic, are used in a number of AI applications.
    • Again, their design theory is really containment and equivalence.
outline1
Outline
  • Basics: theories of conjunctive queries
  • Global-as-view (GAV) approach to data integration
  • Local-as-view (LAV) approach to data integration
gav approach to data integration
GAV approach to data integration
  • Readings:
    • Jeffrey Ullman, Information Integration Using Logical Views, ICDT 1997.
    • RamanaYerneni, Chen Li, Hector Garcia-Molina, and Jeffrey Ullman, Computing Capabilities of Mediators, SIGMOD 1999.
global as view approach
Global-as-view Approach

med(Dealer,City,Make,Year) = R1 R2

Mediator

R1(Dealer,City)

R2(Dealer, Make, Year)

  • Mediator exports views defined on source relations
  • med(Dealer,City,Make,Year) = R1 R2
  • A query is posted on mediator views:
    • SELECT * FROM med
    • WHERE Year = ‘2001’;
    • ans(D,C,M, ‘2001’) :- med(D,C,M,‘2001’).
  • Mediator expands query to source queries:
  • SELECT * FROM R1, R2
  • WHERE Year = ‘2001’;
  • ans(D,C,M,’2001’) :- R1(D,C), R2(D,M, ‘2001’).
gav approach
GAV Approach
  • Project: TSIMMIS at Stanford
  • Advantages:
    • User queries are easy to define
    • Query transformation generation is straightforward
  • Disadvantages:
    • Not all source information is exported:
    • Not easily scalable: every time a new source is added, mediator views need to be changed.
  • Research issues
    • Efficient query execution?
    • Deal with limited source capabilities?
limited source capabilities
Limited source capabilities
  • Complete scans of relations not possible
  • Reasons:
    • Legacy databases or structured files: limited interfaces
    • Security/Privacy
    • Performance concerns
  • Example 1: legacy databases with restrictive interfaces

title

author

Given an author,

return the books.

Ullman

DBMS

TeX

Knuth

problems
Problems
  • How to describe source restrictions?
  • How to compute mediator restrictions from sources?
  • How to answer queries efficiently given these restrictions?
  • How to compute as many answers as possible to a query?
computing mediator restrictions
Computing mediator restrictions
  • Motivation: do not want users to be frustrated by submitting a query that cannot be answerable by the mediator
  • Example:
    • Source 1: book(author?, title, price)
      • Capability: “bff”. b—boufnd, f--free
      • i.e., we must provide an author, and can get title and price info
    • Source 2: review(title?, reviewer, rate)
      • Capability: “bff”
      • i.e., we must provide a book title, and can get other info
    • Mediator view:

MedView(A?,T,P,RV,RT) :- book(A,T,P),review(T,RV,RT).

    • Query on the mediator view:
      • Ans(RT) :- MedView(A, ‘db’, P, RV, RT).
      • I.e., “find the review rates of DB books”
    • But the mediator cannot answer this query, since we do not know the authors.
    • There are various ways to compute the mediator view
  • We want to tell the user beforehand what queries can be answered
outline2
Outline
  • Basics: theories of conjunctive queries;
  • Global-as-view (GAV) approach to data integration;
  • Local-as-view (LAV) approach to data integration.
local as view lav approach
Local-as-view (LAV) approach

Mediator

sources

  • There are global predicates, e.g., “car,” “person,” “book,” etc.
  • They can been seen as mediator views
  • The content of each source is described using these global predicates
  • A query to the mediator is also defined on the global predicates
  • The mediator finds a way to answer the query using the source contents
example
Example

Mediator

S1(Dealer,City)

S2(Dealer,Make,Year)

  • Global predicates: Loc(Dealer,City),Sell(Dealer,Make,Year)
  • Source content defined on global predicates:
    • S1(Dealer,City) :- Loc(Dealer, City).
    • S2(Dealer,Make,Year) :- Sell(Dealer, Make, Year).
    • In general, each definition could be more complicated, rather than direct copies.
  • Queries defined on global predicates.
  • Q: ans(D,M,Y) :- Loc(D, ’windsor’), Sell(D, M, Y).
    • Users do not know source views.
  • The mediator decides how to use source views to answer queries.
    • “Answering queries using views”:
    • ans(D, M, Y) :- S1(D,’windsor’), S2(D,M,Y).
answering queries using views
Answering queries using views

Mediator

Query

V1(Dealer,City):- Loc(Dealer, City).

V2(Dealer,Make,Year):-Sell(Dealer, Make, Year).

V3(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y).

V4(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y), Y<1970.

  • Source views can be complicated: SPJs, arithmetic comparisons,…
  • Not easy to decide how to answer a query using source views
    • Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).
    • Rewriting:
    • ans(D,M) :- V3(D,‘windsor’, M,Y).
    • ans(D,M) :- V1(D,’windsor’), V2(D,M,Y).
    • “Equivalent rewriting”: compute the “same” answer as the query
    • A rewriting can join multiple source views
arithmetic comparisons
Arithmetic comparisons

Mediator

V(D,C,M,Y):- Loc(D,C),Sell(D,M,Y),Y<1970.

  • Comparisons can make the problem even trickier
  • Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).
    • Rewriting: ans(D,M) :- V(D,‘windsor’, M,Y).
    • Contained rewriting: only retrieve cars before 1970.
  • Query: ans(D,M):- Loc(D, ‘windsor'), Sell(D,M,Y), Y < 1960.
    • Rewriting: ans(D,M) :- V(D,‘windsor’, M, Y), Y < 1960.
local as view lav

Create View R5 AS

SELECT B.ISBN, B.Title

FROM Book B

WHERE B.Genre = ‘Humor’

R5(ISBN, Title)

:-Book(ISBN, Title, ‘humor’, Year).

Local-as-View (LAV)
  • Create View R1 AS
  • SELECT B.ISBN, B.Title, A.Name
  • FROM Book B, Author A
  • WHERE A.ISBN = B.ISBN
  • AND B.Year < 1970
    • R1(ISBN, Title, Name):-
    • Book(ISBN, Title, Genre,Year),
    • Author(ISBN, Name), Year<1970.

Global Schema

Book

ISBN

Title

Genre

Year

Author

ISBN

Name

Humor Books

Books before 1970

Source

1

Source

2

Source

3

Source

4

Source

5

Local Schema

Local Schema

Local

Schema

Local

Schema

Local

Schema

R5

ISBN

Title

R1

ISBN

Title

Name

lav details
LAV details
  • Query: Find authors of humor books

Q(Name):-Book(ISBN,Title,”humor”,YEAR), Author(ISBN, Name)

  • Views:

R1(ISBN, Title, Name):-

Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970.

R5(ISBN, Title) :-Book(ISBN, Title, ‘humor’, Year).

  • Rewriting of Q using views:

Q’(Name):-R1(ISBN, Title, Name), R2(ISBN, Title)

  • Expansion of Q’

Q’’(Name):- Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970,

Book(ISBN, Title, ‘humor’, Year).

Q’’’(Name):- Author(ISBN, Name), Year<1970, Book(ISBN, Title, ‘humor’, Year).

  • Q’’’ is contained in Q
query rewritings
Query Rewritings
  • Given a query Q and a set of views V:
    • A conjunctive query P is called arewritingof Q using V if P only uses views in V, and P computes a partial answer of Q. That is: PexpQ. A rewriting is also called acontained rewriting (CR).
    • A conjunctive query P is called an “equivalent rewriting” (ER) of Q using V if P only uses views in V, and P computes the exact answer of Q. That is: Pexp Q.
  • How to rewrite a query?
    • Bucket algorithm is the starting point
bucket algorithm
Bucket algorithm
  • It is the basic method for query rewriting
  • Each subgoal must be “covered” by some view
  • Make a list of candidates (buckets) per query subgoal
  • Consider combinations of candidates from different buckets
  • Not all combos are “compatible”
  • Keep the compatible ones and minimize them
  • Discard the ones contained in another
  • Take their union
the bucket algorithm example
The Bucket Algorithm: Example

//Students taking 500 or higher courses after aug 98.

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs≥ 500, Qtr ≥ Aut98

//students and profs

V2(Std,Prof,Crs,Qtr):- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)

//students before aug 94

V3(Std,Crs):- reg(Std,Crs,Qtr), Qtr ≤ Aut94

//profs before aug 97

V4(Prof,Crs,Title,Qtr):- reg(Std,Crs,Qtr), course(Crs,Title),

teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

//students, profs after aug 95, courses higher than 300.

q(S,C,P):- teaches(P,C,Q), reg(S,C,Q), course(C,T),

C ≥ 300, Q ≥ Aut95

Step 1: For each query subgoal, put the relevant sources into a bucket

the bucket algorithm example1
The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),

Crs ≥ 500, Qtr ≥ Aut98

V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)

V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94

V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),

teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),

C ≥ 300, Q ≥ Aut95

PProf, CCrs, QQtr

Note: Arithmetic predicates don’t pose a problem in this step

Buckets

teaches

reg

course

V2

V4

the bucket algorithm example2
The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title):- reg(Std,Crs,Qtr), course(Crs,Title),

Crs ≥ 500, Qtr ≥ Aut98

V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)

V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94

V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),

teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),

C ≥ 300, Q ≥ Aut95

SStd, CCrs, QQtr

Note: V3 doesn’t work: arithmetic predicates not consistent

V4 doesn’t work: S not in the output of V4

Buckets

teaches

reg

course

V2

V1

V4

V2

the bucket algorithm example3
The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),

Crs ≥ 500, Qtr ≥ Aut98

V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)

V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94

V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),

teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),

C ≥ 300, Q ≥ Aut95

CCrs, TTitle

Buckets

teaches

reg

course

V2

V1

V1

V4

V2

V4

the bucket algorithm example4
The Bucket Algorithm: Example
  • Step 2:
  • Try all combos of views, one each from a bucket
  • Test satisfaction of arithmetic predicates in each case
    • e.g., two views may not overlap, i.e., they may be inconsistent
  • Desired rewriting = union of surviving ones
  • Query rewriting 1:
  • q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T)
    • no problem from arithmetic predicates (none in V2)
    • May or may not be minimal (why?)

teaches

reg

course

V2

V1

V1

V4

V2

V4

the bucket algorithm example5
The Bucket Algorithm: Example
  • Unfolding of rewriting 1:
  • q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’),r(S”,C,Q’),
  • c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98
  • Black r’s can be mapped to green r:S’S, S”S, Q’Q
  • Black c can be mapped to green c: just extend above mapping to TT’
  • Minimized unfolding of rewriting 1:
  • q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98
  • Minimized rewriting 1:
  • q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)
the bucket algorithm example6
The Bucket Algorithm: Example

teaches

reg

course

  • Query Rewriting 2:
  • q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)
  • q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q),
  • r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98,
  • r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97
  • This combo is infeasible: consider the conjunction of arithmetic predicates in V1 and V4
  • Query rewriting 3:
  • q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’)

V2

V1

V1

V4

V2

V4

teaches

reg

course

V2

V1

V1

V4

V2

V4

the bucket algorithm example7
The Bucket Algorithm: Example
  • Unfolding of rewriting 3:
  • q3’(S,C,P) :- r(S’,C,Q),t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’),
  • c(C,T), t(P”,C,Q’), Q’ ≤ Aut97
  • The green subgoals can cover the black ones under the mapping: S’S, S”S, P’P, P”P, Q’Q
  • Minimized rewriting 3:
  • q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q)
  • Verify that there are only two rewritings that are not covered by others
  • Maximally Contained Rewriting:
  • q’ = q1m  q3m
the bucket algorithm example 2
The Bucket Algorithm: Example 2

Query:

q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y)

Views:

V4(A) :- cites(A,B), cites(B,A)

V5(C,D) :- sameTopic(C,D)

V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)

Note: Should we list V4(X) twice in the buckets?

Buckets

cites

cites

sameTopic

V4

V4

V5

V6

V6

V6

bucket algorithm1
Bucket algorithm
  • Query:
    • q(x):-car(x), sell(x, d), loc(d, ’windsor’).
  • Views:
    • v1(x) :- car(x).
    • v2(x) :- car(x), sell(x, d).
    • v3(x,d) :- sell(x, d), loc(d, ’windsor’).
    • v4(x) :- sell(x, d), loc(d, ’windsor’).

q(x):-v1(x), v2(x), v3(x,d).

q(x):-v1(x), v3(x,d).

q(x):-v1(x), v4(x).

q(x):-v2(x), v3(x,d).

q(x):-v2(x), v4(x).

projects using the lav approach
Projects using the LAV approach
  • Projects: Information Manifold, Infomaster, Tukwila, …
  • Advantages:
    • Scalable: new sources easy to add without modifying the mediator views
    • All we need to do is to define the new source using the existing mediator views (predicates)
  • Disadvantages:
    • Hard to decide how to answer a query using views
  • Reading: Alon Halevy, Answering Queries Using Views: A Survey.