1 / 24

Implementing Mapping Composition

Implementing Mapping Composition. Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006 Seoul, Korea *Work partially supported by NSF grants IIS0513778 and IIS0415810. Schema mappings.

gizi
Download Presentation

Implementing Mapping Composition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing Mapping Composition Todd J. Green* University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006 Seoul, Korea *Work partially supported by NSF grants IIS0513778 and IIS0415810

  2. Schema mappings • Mapping: a correspondence between instances of different schemas Names SID, Name Students Name, Address m Addresses SID, Address S1 S2 Students Name,Address(Names ⋈ Addresses)

  3. Names SID, Name Names SID, Name Local SID, Address Students Name, Address, Country Addresses SID, Address, Country Foreign SID, Address, Country Applications of mappings Names  Names σCountry = KR(Addresses)  SID,Address(Local)£{KR} σCountry  KR(Addresses)  Foreign • Schema evolution Students Name,Address,Country(Names ⋈ Addresses) ... m12 m23 S1 S2 S3

  4. Applications of mappings • Data integration, data exchange Sn Addresses SID, Address, Country Names SID, Name ... m1 mn Students Name,Address (Names ⋈ Addresses) NamesNames Local SID,Address(Country = KR(Addresses)) Foreign Country KR(Addresses) S1 Sn−1 Foreign SID, Address, Country Students Name, Address, Country Names SID, Name Local SID, Address ...

  5. Requirements for constraints • “First attribute in R is a key for R” 2,4(R ⋈1=3 R) µ2,2(R) • “View V equals R joined with S” VµR ⋈ S, V¶R ⋈ S • “Second attribute of R is a foreign key in S” 2(R) µ1(S) 2,4(S ⋈1=3 S) µ2,2(S) • Data integration, data exchange – GLAV R ⋈ S µT ⋈ U

  6. Names Names σCountry = KR(Addresses)  SID,Address(Local)£{KR} σCountry  KR(Addresses)  Foreign Students Name,Address, Country (Names⋈ (SID,Address(Local)£{KR} [ Foreign)) Names SID, Name Students Name,Address,Country(Names⋈Addresses) Names SID, Name Local SID, Address Students Name, Address, Country m12 m12  m23 m23 Addresses SID, Address, Country Foreign SID, Address, Country S2 Mapping composition S1 S3

  7. Composition is hard • Hard part: write composition in the same language as the input mappings. Depending on language: • Not always possible • Not even decidable whether possible • Strategy 1: use powerful (second-order) mapping language closed under composition [FKPT04] • Not supported by DBMS today • Expensive to check • Source-target restriction • Strategy 2: settle for partial solutions [NBM05] • Containment mappings  easier integration with DBMS • The strategy we adopt in this work

  8. Our contributions • New algorithm for composition problem • Incorporates view unfolding and left-composition (new technique) • Makes best effort in failure cases • Algebraic rather than logic-based mappings • Use of monotonicity to handle more operators • Modular and extensible factoring of algorithm • First implementation of composition • Experimental evaluation

  9. U(∙,∙,∙) S(∙,∙) V(∙,∙) m12 m23 R(∙,∙,∙) T(∙,∙) W(∙,∙) S ⊆(U), T= V – W S1 S2 S3 R⊆ S⋈T Formal definition of composition • Mapping: set of pairs of instances of db schemas • The composition m12 ± m23 is the mapping {hA,Ci : (9B)(hA,Bi2m12 and hB,Ci2m23)} where A,B,C are instances of S1,S2,S3 • Composition problem: find constraints in same language as input mappings giving the composition of the input mappings • Example: S1 = {R}, S2 = {S,T}, S3 = {U,V,W} R⊆S⋈T, S⊆(U), T=V – W ) R⊆(U)⋈(V - W)

  10. Best-effort composition problem • Composition not always possible • “Best-effort” composition problem: compute set of constraints equivalent to input constraints, but with as many symbols from S2 eliminated as possible R⊆U, R⊆V, 1,4(2=3(UU))⊆U, 1,4(2=3(VV))⊆V, U⊆T, V⊆T Can eliminate U (cross out left column) or V(right column), but not both [NBM05]

  11. Composition algorithm overview For each relation Rin S2 • Try to eliminate R via (1)view unfolding Replace = by pairs of ⊆, ⊇ For each relation R in S2 not yet eliminated • Try to eliminate Rvia (2)left compose • Else, try to eliminate R via (3)right compose Output: New constraints and list of relations successfully eliminated

  12. (1) View unfolding • Idea: exploit equality constraints (if we have any) • Standard technique: substitute view definition for occurrences of view relation in mappings T=V – W, R⊆S ⋈T, TX⊆(U) R⊆S ⋈(V – W), (V – W) X⊆(U) • Body must not mention view relation itself • Doesn’t matter what else is in body • Can substitute everywhere

  13. (2) Left compose • “View unfolding” for containment constraints (V) ⊆R – U, R⊆S ⋈ T  (V) ⊆ (S ⋈T) – U • Needs monotonicity of expressions in R. E1⊆E2(R), R⊆E3´E1⊆E2(E3) if E2(R) is monotone in R(and R not in E3) • Partial check for monotonicity “Is S – (T – R) monotone in R?”

  14. Normalization for left compose • Need one constraint of form R⊆E1 • Use identities to normalize, e.g.: • R⊆E1 and R⊆E2 iff R⊆E1E2 • E1E2⊆E3 iff E1⊆E3 and E2⊆E3 • (E1) ⊆E2 iff E1⊆E2Dr • More identities in paper • After left compose, try to eliminate D

  15. (3) Right compose • Dual to left compose, from [NBM05] • Example: S ⋈TR, R – U(V)  (S ⋈T) – U (V) • Monotonicity check needed here too • Normalization may introduce Skolem functions • E1(E2) iff f(E1) E2 • Must eliminate Skolem functions after composition • Lots of effort coding this step!

  16. User-defined operators • User specifies: • Monotonicity of operator in its arguments “If E1 monotone in R and E2 antimonotone in R or independent of R, then E1 * E2 monotone in R” “if E1 monotone in R or independent of R and E2 antimonotone in R, then E1 * E2 monotone in R” • Identities for normalization “E1 * E2E3 iff E1E2E3 ” • User-defined operators and standard relational operators treated uniformly

  17. Implementation • 12K lines of C# code, command-line tool # Test case 13: PODS05 example 2 SCHEMA R(2), S(2), T(2) CONSTRAINTS R <= S, P_{0,2} J_{0,1:1,2} (S S) <= R, S <= T ELIMINATE S; Output: P_{0,2} J_{0,1:1,2}(R R) <= R, R <= T

  18. Experimental evaluation • First attempt at a composition benchmark • Schema editing and schema reconciliation scenarios • “Add a column to R to produce S”: (R) = S • Measure • % of symbols eliminated • Running time • As a function of • Editing primitives allowed, length of edit sequence, presence/absence of keys, starting schema size, … • Synthetic data

  19. Summary of results • Algorithm often effective in eliminating most or even all relation symbols from S2 • Running time in subsecond range even for large problems containing hundreds of constraints • Certain schema editing primitives problematic • Key constraints did not reduce effectiveness, although did increase running time (and output size)

  20. Schema editing • Random starting schema (30 relations of 2-10 attributes) • 100 random edits • 100 different runs, sorted by execution time

  21. Schema reconciliation (1) • Random schema (30 relations of 2-10 attributes), random edits • Point represents median time of reconciliation step of 500 runs

  22. Schema reconciliation (2) • Random schema (variable # relations of 2-10 attributes) • 100 random edits • 100 different runs, sorted by execution time

  23. Related work • [MH03] J. Madhavan, A. Y. Halevy. Composing mappings among data sources. VLDB, 2003. • [FKPT04] R. Fagin, Ph. G. Kolaitis, L. Popa, W.C. Tan. Composing schema mappings: second-order dependencies to the rescue. PODS, 2004. • [NBM05] A. Nash, P. A. Bernstein, S. Melnik. Composition of mappings given by embedded dependencies. PODS, 2005.

  24. Conclusion and future work • We motivated and described the mappingcomposition problem • We presented an implementation of a practical new algorithm for the composition problem • We also presented an experimental evaluation • To do: theoretical analysis of impact of user-defined operators • To do: output constraints from algorithm can be a mess! How to clean up?

More Related