Debugging schema mappings with routes
Download
1 / 62

Debugging Schema Mappings with Routes - PowerPoint PPT Presentation

Debugging Schema Mappings with Routes Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan) SPIDER : A S chema Map pi ng De bugge r Demo group B Today 14:00-15:30 Thursday 11:00-12:30 I Source instance Schema Mappings

Related searches for Debugging Schema Mappings with Routes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Debugging Schema Mappings with Routes ' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Debugging schema mappings with routes l.jpg

Debugging Schema Mappings with Routes

Laura Chiticariu

UC Santa Cruz

(joint work with Wang-Chiew Tan)


Spider a s chema map pi ng de bugge r l.jpg
SPIDER: A Schema Mapping Debugger

Demo group B

Today 14:00-15:30

Thursday 11:00-12:30


Schema mappings l.jpg

I

Source instance

Schema Mappings

  • A schema mapping is a logical assertion that describes the correspondence between two schemas

    • Key element in data exchange and data integration systems

  • Data Exchange [FKMP05]

    • Translate data conforming to a source schema S into data conforming to a target schema T so that the schema mapping M is satisfied

M

Schema S

Schema T

J

Target instance


Debugging a data exchange today l.jpg

I

Source instance

Debugging a Data Exchange Today

M

  • Debugging at the (low) level of the implementation

    • Specific to the data exchange engine

    • Specific to the implementation language: XQuery, SQL, etc

  • Debugging at the level of schema mappings

    NO SUPPORT!!!

Schema S

Schema T

XQuery/XSLT/Java

J

Target instance


Debugging schema mappings l.jpg

I

Source instance

Debugging Schema Mappings

M

  • Debugging schema mappings: the process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings

Schema S

Schema T

J

Target instance


Outline l.jpg
Outline

  • Overview

  • Motivation

  • Debugging schema mappings with routes

    • Motivating example

    • What are routes?

    • Computing routes

    • Related work

  • Performance evaluation

  • Conclusions


Motivation l.jpg
Motivation

  • Schema mappings are good

    • Higher-level, declarative programming constructs

    • Hide implementation details, allow for optimization

    • Typically easier to understand vs. SQL/XSLT/XQuery/Java

    • Serve a similar goal as model management [Bernstein03, MBHR05]

  • Uniformity in specifying and debugging

    • Reduce programming effort by allowing a user to specify and debug at the level of schema mappings

  • Schema mappings are often generated by schema matching tools

    • Close to user’s intention, but may need further refinements

    • Hard to understand without the help of tools


Language for schema mappings l.jpg
Language for Schema Mappings

  • Tuple generating dependencies (tgds)

    • 8x ((x) !9y(x,y))

  • Equality generating dependencies (egds)

    • 8x ((x) ! x1 = x2)

  • Remarks:

    • Widely used for relational schema mappings in data exchange and data integration [Kolaitis05,Lenzerini02]

    • TGDs generalize LAV, GAV and are equivalent to GLAV assertions in the terminology of data integration

    • Extended to handle XML data exchange [PVMHF02]


Relational schema mappings fkmp03 l.jpg

I

Source instance

Relational Schema Mappings [FKMP03]

  • Schema mapping M = (S, T, st[t)

  • S, T: relational schemas with no relation symbols in common

  • Source-to-target dependencies st:

    • Source-to-target tgds (s-t tgds) S(x)!9y T(x,y)

  • Target dependencies t:

    • Target tgds: T(x)!9y T(x,y)

    • Target egds:  T(x)!x1 = x2

∑st

∑t

Schema S

Schema T

J

Target instance


Example schema mapping l.jpg
Example Schema Mapping

S:

T:

MANHATTAN CREDIT

CardHolders:

cardNo ²

limit ²

ssn ²

name ²

Dependents:

accNo ²

ssn ²

name ²

Source-to-target dependencies, st:

m1: CardHolders(cn,l,s,n) !

9L (Accounts(cn,L,s)  Clients(s,n))

m2: Dependents(an,s,n) ! Clients(s,n)

Target dependencies,t:

m3: Clients(s,n) !A L (Accounts(A,L,s))

FARGO FINANCE

Accounts:

² accNo

² creditLine

² accHolder

Clients:

² ssn

² name

m1

fk1

m3

m2

Solution for I under

the schema mapping

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents


Example debugging scenario 1 l.jpg
Example Debugging Scenario 1

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents

Unknown credit limit?

A route for the Accounts tuple

Accounts

CardHolders

123

L1

ID1

m1

123

$15K

ID1

Alice

Clients

ID1

Alice

15K is not copied over to the target

m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) ^ Clients(s,n))


Example debugging scenario 112 l.jpg
Example Debugging Scenario 1

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents

Unknown credit limit?

A route for the Accounts tuple

Accounts

CardHolders

123

L1

ID1

m1

123

$15K

ID1

Alice

Clients

ID1

Alice

15K is not copied over to the target

m1: CardHolders(cn,l,s,n) ! (Accounts(cn,l,s) ^ Clients(s,n))


Example debugging scenario 2 l.jpg

Route for Accounts tuple with accNo A2

Dependents

Accounts

Clients

m2

m3

123

ID2

Bob

ID2

Bob

A2

L2

ID2

Example Debugging Scenario 2

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents

Unknown account number?

123 is not copied over to the target

as Bob’s account number

m2: Dependents(an,s,n) ! Clients(s,n)


Example debugging scenario 214 l.jpg

Route for Accounts tuple with accNo A2

Dependents

Accounts

Clients

m2

m3

123

ID2

Bob

ID2

Bob

A2

L2

ID2

Example Debugging Scenario 2

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents

Unknown account number?

123 is not copied over to the target

as Bob’s account number

m’2: CardHolders(an,l,s’,n’)^ Dependents(an,s,n)

! Accounts(an,l,s)^ Clients(s,n)


Debugging schema mappings with routes15 l.jpg
Debugging Schema Mappings with Routes

  • Main intuition: routes describe the relationships between source and target data with the schema mapping

  • Definition: Let:

    • M be a schema mapping

    • I be a source instance

    • J be a solution for I under M and Jsµ J

      A route for Js with M and (I,J) is a finite non-empty sequence of satisfaction steps

      (I,;) ! (I,J1) ! … ! (I,Jn)

      such that:

    • Jiµ J, mi2st [ t, where 1· i· n

    • Jsµ Jn

mn, hn

m1, h1

m2, h2


Example of satisfaction step l.jpg
Example of Satisfaction Step

Target instance J

Source instance I

CardHolders

Accounts

Clients

Dependents

Unknown credit limit?

Accounts

CardHolders

m1, h1

Clients

m1: CardHolders(cn, l, s, n) !9L (Accounts(cn, L, s ) ^ Clients(s, n ))

h1={cn ! ‘123’, l ! $15K, s ! ID1, n ! Alice, L ! L1}


Compute all routes l.jpg
Compute all routes

  • The schema mapping M is fixed

  • Input: source instance I, a solution J for I under M, a set of target tuples Jsµ J

  • Output: a forest representing all routes for Js

  • Algorithm idea:

    • For each tuple t in Js, consider every possible 2st[t and h for witnessing t

    • Do the same for all target tuples encountered during the process until tuples from the source instance are obtained


Compute all routes a simple example l.jpg

6, x  a

T4(a) T6(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)


Compute all routes a simple example19 l.jpg

4, x  a

T3(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)

6

T4(a) T6(a)


Compute all routes a simple example20 l.jpg

7

T5(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)

6

T4(a) T6(a)

4

T3(a)


Compute all routes a simple example21 l.jpg

5

T4(a) T1(a)

1

S1(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)

6

T4(a) T6(a)

4

T3(a)

7

T5(a)


Compute all routes a simple example22 l.jpg

2

S2(a)

3

T2(a)

5

2

T4(a) T1(a)

S2(a)

1

S1(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)

6

T4(a) T6(a)

4

T3(a)

7

T5(a)


Compute all routes a simple example23 l.jpg

5

T4(a) T1(a)

1

S1(a)

Compute all routes: A simple example

T7(a)

  • st:

    • 1: S1(x) ! T1(x)

    • 2: S2(x) ! T2(x) Æ T6(x)

  • t:

    • 3: T2(x) ! T3(x)

    • 4: T3(x) ! T4(x)

    • 5: T4(x) Æ T1(x) ! T5(x)

    • 6: T4(x) Æ T6(x) ! T7(x)

    • 7: T5(x) ! T3(x)

  • Source instance, I:

    • S1(a), S2(a)

  • A solution, J:

    • T1(a), …, T7(a)

6

T4(a) T6(a)

4

8

T3(a)

S2(a)

7

3

T5(a)

T2(a)

2

S2(a)

Route for T7(a): 2, 3, 4, 8, 6


Properties of compute all routes l.jpg
Properties of compute all routes

  • Completeness:

    Let F denote the route forest by our algorithm returned on Js. If R is a minimal route for Js, then it is represented in F.

  • Running time: polynomial in the sizes of I, J and Js

    • Every “branch of a tuple” once explored, is never explored again

    • Polynomial number of branches for each tuple since M is fixed

  • Challenge:

    • Exponentially many routes, but polynomial-size representation constructed in polynomial time


Compute one route l.jpg
Compute one route

  • Our experimental results indicate that compute all routes can be expensive

  • Generate one route fast and alternative routes as needed?

  • Our solution: adapt compute all routes to compute only one route

    • Non-exhaustive: Stops when one witness is found. A witness that uses source tuples is preferred

    • Inference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches”

      • Key step for polynomial time analysis

    • Completeness: If there is a route for Js, then our algorithm will produce a route for Js


Related work l.jpg
Related work

  • Commercial data exchange systems

    • e.g., Altova MapForce, Stylus Studio

    • Use “lower-level” languages (e.g., XSLT, XQuery) to specify the exchange

      • Debugging is done at this low level

      • Source tuple centric

  • Data viewer [YMHF01]

    • Constructs an “example” source instance illustrative for the behavior of the schema mapping

      • Complementary to our approach

    • Works only for relational schema mappings


Related work27 l.jpg
Related work

  • Computing routes for target data is related to computing provenance (aka lineage) of data


Empirical evaluation l.jpg
Empirical Evaluation

  • Implementation: on top of the Clio data exchange system from IBM Almaden Research Center

    • Scalable: push computation to the database

    • Handles relational and XML schema mappings [PVMHF02]

  • Testbed:

    • Created relational and XML schema mappings based on the TPCH schema

    • Created schema mappings based on Mondial, DBLP and Amalgam schemas

  • Methodology - measured the influence of:

    • The sizes of I, J and Js

    • The complexity of st[t

      • i.e., the number of tgds and the number of atoms in each tgd

  • Setup: P4 2.8GHz, 2Gb RAM, 256MB DB2 buffer pool

  • Our regret: No benchmark to base our comparisons


Computeoneroute with rel schema mapping influence of the sizes of i and j l.jpg
ComputeOneRoute with Rel. schema mappingInfluence of the Sizes of I and J


Computeoneroute with rel schema mapping influence of the complexity of st t l.jpg
ComputeOneRoute with Rel. schema mappingInfluence of the Complexity of st[t



Experimental results with mondial dblp and amalgam l.jpg
Experimental results with Mondial, DBLP and Amalgam


Experimental results with mondial dblp and amalgam33 l.jpg
Experimental results with Mondial, DBLP and Amalgam

  • Two DBLP schemas and datasets, both XML:

    • DBLP1, DBLP2

  • First relational schema from Amalgam test suite


Experimental results with mondial dblp and amalgam34 l.jpg
Experimental results with Mondial, DBLP and Amalgam

  • Two DBLP schemas and datasets, both XML:

    • DBLP1, DBLP2

  • First relational schema from Amalgam test suite

  • Two Mondial schemas and datasets:

    • one relational (Mondial1), the other XML (Mondial2)

  • Designed st and used the foreign key constraints as t


Experimental results with mondial dblp and amalgam35 l.jpg
Experimental results with Mondial, DBLP and Amalgam

  • Compute one route: under 3 seconds for 1-10 randomly selected tuples

  • Compute all routes: can take much longer

    • 18 seconds to construct the route forest for 10 selected tuples in the target instance of Mondial

    • Compute one route took under 1 second


Conclusions l.jpg
Conclusions

  • Debugging schema mappings with routes

    • Complete, polynomial time algorithms for computing routes

    • Extension for routes for selected source data

  • Routes have declarative semantics, based on the logical satisfaction of tgds

    • What we don’t do: illustrate data merging

  • Future work:

    • Illustrate grouping semantics for nested schema mappings

    • Adapt target instance to changes in the schema mapping and data sources


Spider a s chema map pi ngs de bugge r l.jpg

Compute one/all routes

Alternative routes

Guided computation of routes

Standard debugging features

Breakpoints

“Watch” windows

Schema-level routes

SPIDER: A Schema Mappings Debugger

Demo group B

Today 14:00-15:30

Thursday 11:00-12:30



How do we do it l.jpg
How do we do it?

M

Source

schema S

Target

Schema T

Schema mappings debugger

Source

instance I

Target

instance J

routes

Witness selected target data

with source data and M


How do we do it40 l.jpg
How do we do it?

M

Source

schema S

Target

Schema T

Schema mappings debugger

Source

instance I

Target

instance J

routes

Illustrate consequences of

selected source data with M


Slide41 l.jpg
Key Concept: ROUTES - describe the relationships between source and target data with the schema mapping

M

Source

schema S

Target

Schema T

Schema mappings debugger

Source

instance I

Target

instance J

routes


Slide42 l.jpg

Data

Data

Schema

Schema

Clio

Source

Target

  • A semi-automatic schema mapping system

    • Supports user-guided mapping from source to target with constraints

    • Schema mapping language: a nested extension of tgds and egds

    • Automatically generate XQuery/SQL/XSLT scripts for the actual data transferring based on the schema mapping

    • Generates universal solutions under relational-to-relational schema mappings

  • Implemented our techniques on top of Clio, but…

    • Routes have declarative semantics

    • Independent of Clio’s transformation engine

Mapping

XQuery/SQL/XSLT


Related work43 l.jpg

Provenance

information

Q’

Related work

  • Computing routes for target data is related to computing provenance (aka lineage) of data

Q


Related work44 l.jpg
Related work

  • Computing routes for target data is related to computing provenance (aka lineage) of data

Q

No reengineering

of the query


Related work45 l.jpg

Provenance

information

Eager

Q’

Related work

  • Approaches to computing provenance:

    • Eager: changes the transformation to carry provenance information

      • Requires re-engineering of Q to Q’. No subsequent source access or access to the definition of Q or Q’.

    • Lazy: does not

      • No re-engineering of Q. Subsequent source access and access to the definition of Q may be needed.

Q


Related work46 l.jpg
Related work

  • Computing routes for target data is related to computing provenance (aka lineage) of data


Programming languages vs schema mappings l.jpg
Programming Languages vs. Schema Mappings

  • Debugging programming languages vs. debugging schema mappings

    • Procedural PL

      • We may have a specification (e.g. compute x2 on input x) which completely determines the output

        • Well-defined notion of correct answer

        • The program is an implementation of the specification

        • If the correct answer is not obtained, there’s a bug – need to debug the implementation

      • However, the specification may also not be that concrete

        • E.g., build a visual interface for …

    • Functional PL

      • Debugging is performed by analyzing a trace of the execution

      • Declarative approach for debugging [Nilsson94]

    • Schema mapping IS the specification

      • Infinite number of solutions consistent with the schema mapping

      • Best we can do: look at the target instance – if something looks wrong (e.g., the clients’ names are not copied to the target) go back to the schema mapping and try to refine it (or debug it)


Related work computing provenance of data over sql queries l.jpg
Related Work: Computing Provenance of Data over SQL queries

  • Compute the provenance of relational data in a view in data warehouses [CWW2000]

    • The provenance of a tuple t in a view is described as the tuples in the base tables that witness the existence of t

Provenance answered

using two reverse queries:

R(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3

S(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3

T

View definition:

T(a,c) :- R(a,b) Æ S(b,c)

DB

R

S


Related work computing provenance of data over sql queries49 l.jpg
Related Work: Computing Provenance of data over SQL queries

  • DBNotes: an annotation management system for relational databases

    • Each data value has zero or more annotations

    • pSQL: a query language for propagating annotations

      • 3 propagation schemes: DEFAULT, DEFAULT-ALL, CUSTOM

      • By default, annotations propagate according to provenance

    • Eager approach: annotations propagate along with data as data is transformed through queries

      • Provenance information readily available in the output

    • Automatically trace the provenance and flow of data over multiple transformation steps

      • Systematically maintains provenance annotations that describe the exact location of data values

DB1

DB2

Transformation:

T(a,c):-R(a,b)ÆS(b,c)

R

S

T


Related work computing the provenance of data over schema mappings l.jpg
Related Work: Computing the Provenance of Data over Schema Mappings

  • MXQL system over relational/XML schema mappings

    • Eager approach

      • Additional info about source schema elements and mappings that contribute to the creation of target data is propagated and stored

      • Our approach is lazy: no reengineering

    • Non-automatic approach for answering provenance

      • The additional info needs to be queried using MXQL

      • We automatically compute routes for selected data

    • Data involved in the transformation not considered

      • Our routes contain information

        about schema elements,

        dependencies and data involved


Related work the data viewer l.jpg
Related Work: the Data Viewer Mappings

  • Schema mappingM=(S, T, st[t)

    • S: Dept(dID,dName) and Emp(dID,name)

    • T: DeptEmp(dID,dName,employee)

    • st: Dept(id,n) !9E DeptEmp(id,n,E)

      Dept(id,n) Æ Emp(id,e) ! DeptEmp(id,n,e)

    • t = ;

  • Example source instance created to illustrate M

Employee of

a department

Department that has

at least one employee

(will join with Emp)

Dept

Emp

Employee with

no department

(will not appear

in the target)

Department with

no employee

(will not join with Emp)


Universal solutions fkmp02 l.jpg
Universal Solutions [FKMP02] Mappings

  • Definition: Given two instances K1 and K2, a homomorphism

    h: K1 → K2 is a function h:Const[Var!Const[Varsuch that:

    • h(c) = c for all constants c

    • For every fact R(a1, …, an)2K1, the fact R(h(a1), …, h(an))2K2

  • Example:J1={V(1,N1), V(N2,2)}, J2={V(1,2)}

    • h:J1!J2 is h={1  1, N1 2, N2 1, 2  2}

  • Definition: Let M=(S,T,st[t) be a schema mapping. If I is a source instance, then a universal solution for I is a solution J for I such that for every solution J’ for I, there exist a homomorphism h : J→J’

  • Example:st : R(x)!9N V(x,N)

    U(x)!9N V(N,x)

    • Source instance I={R(1), U(2)}

    • J2={V(1,2)} is not a universal solution for I

    • J1={V(1,N1), V(N2,2)} is a universal solution for I


Homomorphism l.jpg
Homomorphism Mappings

  • Definition: Let (x) be a conjunction of atoms and K be an instance.

    • A homomorphism h: (x) ! K is such that h((x)) = { R(h(z)) 2 K | R(z) is a rel. atom in (x) }

  • Example:

    • Two homomorphisms from

      Accounts(u,v,w) ^ Clients(w,x) to the target instance J

Target instance J

Accounts

Clients


A satisfaction step l.jpg
A Satisfaction Step Mappings

  • Definition: Let  be a tgd 8x(x) !9y(x,y):

    Let K and K1 are instances such that:

    • K1µ K

    • K ²

      Let h: (x) ^(x,y) ! K be a homomorphism such that h is also a homomorphism from (x) to K1.

      Let K2 = K1[ h((x,y)).

      Then the result of satisfying  on K1 with homomorphism hand solution K is K2.

, h

K1 K2


Satisfaction step remark 1 l.jpg
Satisfaction Step: Remark 1 Mappings

  • Satisfaction step  chase step [FKMP02]

    • Definition based on logical satisfaction of tgds, not tied to implementation of the exchange

  • Example:

    • st:EmpPhone(x,y) !9 z Emp(x,y,z) (1)

    • EmpFax(x,z) !9 y Emp(x,y,z) (2)

    • t: Emp(x,y,z) Æ Emp(x,u,v) ! y=u Æ z=v

    • I={ s1: EmpPhone(Mary, p123), s2: EmpFax(Mary, f567) }

    • J={ t: Emp(Mary, p123, f567) }

    • Two routes for t: s1! t and s2! t

      • Both routes make an assumption about the values taken by the existentials (z and y are assumed to e f567 and p123, respectively)

      • The egd is not used in the routes

  • We don’t have satisfaction steps with egds

    • If K satisfies and egd , then K1 also satisfies , since K µ K1

2

1


Satisfaction step remark 2 l.jpg
Satisfaction Step: Remark 2 Mappings

  • Satisfaction step  solution-aware chase step [FKT05]

  • Example:

    • st : S(x) !9 N T(x,N)

    • I={S(1)}

    • J={t1:: T(1,N1), t2: T(1,N2)} is a solution for I

    • A route for J: h I, ;i!h I, {t1} i!h I, {t1,t2} i

      • h1={x  1, N  N1} and h2={x  1, N  N2}

  • No solution-aware chase sequence produces both t1 and t2

, h1

, h2


Computing all routes for target tuples l.jpg
Computing all routes for target tuples Mappings

  • The schema mapping M is fixed

  • Input:

    • source instance I

    • target instance J

    • a set of target tuples Jsµ J

  • Output: a route forest for Js that concisely represents all routes for Js

  • Algorithm idea: reverse chase

    • For each tuple R(a) in Js, consider every possible  and h for witnessing R(a)

    • Do the same for all target tuples encountered during the construction

    • Do not consider the same tuple twice


Computing all routes properties l.jpg
Computing all routes: Properties Mappings

  • Running time: polynomial in the sizes of M, I and J

    • At most |I|+|J| tuples in the forest

    • Polynomial number of branches for each tuple

    • A branch is not explored twice

    • Reverse chase is efficient: push the computation to the database

  • Completeness: the route forest embeds every minimal route for Js

    • A minimal route for Js is a route for Js with no redundant satisfaction steps


Computing one route for j s l.jpg
Computing one route for J Mappingss

  • Running time: polynomial in the sizes of M, I and J

  • Completeness: if there is a route for Js, then the algorithm will find a route for Js

  • Much faster compared to computing all routes

    • No need to explore the entire route forest

    • Possible to construct additional routes as needed


Some implementation details l.jpg
Some implementation details Mappings

  • Scalable approach: steps in routes are discovered by pushing computation to the database engine

  • Example: Source-to-target tgd: S(x,y) !9U9V (T1(x,U) ^ T2(U,V,y))

    • T1(a,b) matched against RHS

    • LHS query:

      • S(a,y) is executed against the source instance using the db

    • RHS query:

      • T1(a,b) ^ T2(b,V,c) is executed against the target instance using the db

    • Each binding for y generates one RHS query

    • Design choice to decouple LHS and RHS queries

  • Extended for XML schema mappings


Comparison with approaches for evaluating datalog l.jpg
Comparison with Approaches for Evaluating Datalog Mappings

  • Top-down techniques: OLDT, QSQ, Rule/goal graphs

    • Similarities: use memoization to avoid redundant computation and infinite loops

    • Major difference: the target instance J is available and we leverage it to:

      • Obtain completely instantiated facts during reverse chase

      • Hence, avoid redundant computation earlier

  • Magic set rewriting technique:

    • Possible to obtain all tuples that contribute to the creation of Js

    • However, need to recover the routes from the evaluation of the magic rules


Top down example l.jpg
Top-down Example Mappings

T3(1,3)

  • st:

    • 1: S1(x,y) ! T1(x,y)

    • 2: S2(x,y,u) ! T2(x,y,u)

  • t:

    • 3: T1(x,y) Æ T2(y,z,u) ! T3(x,z)

  • I: S1(1,2)

  • J: T1(1,2), T3(1,3)

T1(1,y) Æ T2(y,3,u)

y  2

S1(1,y)

T2(2,3,u)

y  2

S2(2,3,u)


ad
  • Login