1 / 20

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets. Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997. Carlo Zaniolo Department of Computer Science

adonia
Download Presentation

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS240A: Databases and Knowledge BasesFrom Differential Fixpoints to Magic Sets Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997 Carlo Zaniolo Department of Computer Science University of California, Los Angeles January, 2002

  2. Recursive Predicates r1: anc(X, Y) ¬ parent(X, Y). r2:anc(X, Z) ¬ anc(X,Y), parent(Y,Z). r2 is a recursive rule---a left linear one r1 is the a nonrecursive rule defining a recursive predicate—this is called an exit rule. An alternative definition for anc: r3: anc(X, Y) ¬ parent(X, Y). r4:anc(X, Z) ¬ anc(X,Y), anc(Y,Z). Herer4is a quadratic rule.

  3. Fixpoint Computation The inflationary immediate consequence operator for P: P (I) = TP (I) ÈI We have: P­n (Æ)   =   TP­n (Æ) lfp(TP) = TP­w (Æ) = lfp(P) = P­w (Æ)

  4. Fixpoint Computation (cont.) Naïve FixpointAlgorithm for P(M = Æ, for now ) {S : = M ; S¢: = P(M) while S Ì S¢ { S : = S¢; S¢: = P(S) } } We can replace the first P with E and the second one with R respectively denoting the immediate consequence operators for the exit rules and the recursive ones.

  5. Differential Fixpoint (a.k.a. Seminaive Computation) Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. Finite differences techniques tracing the derivations over two steps: 1.S the set of atoms obtained up to step j-1 2.S’ the set of atoms obtained up to step j 3.dS = R (S) - S = TR (S) - S denotes the new atoms at step j (i.e., the atoms that were not in S at step j-1) 4.d¢S = R (S¢) - S¢ = TR (S¢) - S¢ are the new atoms obtained at step j+1.

  6. Differential Fixpoint Algorithm(M=Æ, for now ) {S := M; dS := TE(M); S¢ := S È dS; whiledS ¹Æ { d¢S := TR(S¢) - S¢; S := S¢ ; dS := d¢S ; S¢ := S È dS } } anc, danc, and anc¢, respectively, denote ancestor atoms that are in S, dS, and S¢ = S ÈdS.

  7. Rule Differentiation • To compute dS¢: = TR ( S¢) - S¢ we can use a TR defined by the following rule: d¢anc(X, Z) ¬ anc¢(X,Y), parent(Y,Z). • This can be rewritten as: d¢anc(X, Z) ¬danc(X,Y),   parent(Y,Z). d¢anc(X, Z) ¬ anc(X,Y),  parent(Y,Z). The second rule can now be eliminated, since it produces only atoms that were already contained in anc¢, i.e., in the S¢ computed in the previous iteration. Thus, for linear rules, replace: d¢S := TR(S¢) - S¢ by d¢S := TR(dS) - S¢. Forn nonlinear rules the rewriting is more complex.

  8. Non Linear Rules ancs(X, Y) ¬ parent(X, Y). ancs(X, Z) ¬ ancs(X,Y), ancs(Y,Z). r: d¢ancs(X, Z) ¬ ancs¢(X,Y), ancs¢(Y,Z). r1:d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). r2:d¢ancs(X, Z) ¬ ancs(X,Y),  ancs¢(Y,Z). Now, we can re-write r2 as: r2,1:d¢ancs(X, Z) ¬ ancs(X,Y), dancs(Y,Z). r2,2:d¢ancs(X, Z) ¬ ancs(X,Y),  ancs(Y,Z). Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1: d¢ancs(X, Z) ¬dancs(X,Y), ancs¢(Y,Z). d¢ancs(X, Z) ¬ ancs(X,Y),  dancs(Y,Z).

  9. Semivaive Fixpoint (cont.) • Analogy with symbolic differentiation • Performance improvements: it is typically the case that n = |dS | << N = |S|»| S¢|. • The original ancs rule, for instance, requires the equijoin of two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.

  10. General Nonlinear Rules A recursive rule of rank k is as follows: r: Q0¬  c0,  Q1,   c1,  Q2, ¼ Qk,  ck Is rewritten as follows: r1: d¢Q0¬  c0,  dQ1,  c1,  Q¢2, ¼ Q¢k,  ck r2: d¢Q0¬  c0,  Q1,  c1,  dQ2, ¼ Q¢k,  ck¼ rk:d¢Q0¬  c0,  Q1,  c1,  Q2,   ¼ dQk,  ck Thus the jth rule has the form: rj:d¢Q0¬¼ Q ¼dQj¼ Q¢

  11. Iterated Fixpoint Computation for program P stratified in n strata Let Pj, 1 £ j £ n denote the rules with their head in the j-th stratum. Then, Mj be inductively constructed as follows: • 1.M0 = Æ and • 2.Mj = UPjw (Mj-1). The naïve fixpoint algorithm remains the same, but M := Mj-1and P is replaced byPj Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint computation. Then, Mn = lfp(TP). For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the stable model for P---not discussed here)

  12. Bottom-Up versus Top-Down Computation anc(X, Y) ¬ parent(X, Y). Compiled Rules anc(X, Z) ¬ anc(X,Y), parent(Y,Z). parent(X, Y) ¬ father(X, Y). parent(X, Y) ¬ mother(X, Y). mother(anne, silvia). Database mother(silvia, marc). • The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal. • But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).

  13. Specialization for Left-linear Recursive Rules ?anc(tom, Desc). anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ anc(Old, Mid), parent(Mid, Young) This is changed into: ? anc(tom, Desc ) anc(Old/tom, Young) ¬ parent(Old/tom, Young). anc(Old/tom, Young) ¬ anc(Old/tom, Mid), parent(Mid, Young). Similar to the pushing selection inside recursion of query optimizers. This works for left-linear rules with the query form: ?anc($Someone, Desc)

  14. Right-linear rules anc(Old, Young) ¬ parent(Old, Young). anc(Old, Young) ¬ parent(Old, Mid), anc(Mid, Young). Descendants of Tom: ? anc(TOM, X) • This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones! • Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies. • The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply. • For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs. • Deductive Database compilers do that.

  15. The Magic Set Method • Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example: sg(A , A). sg(X, Y) ¬ parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

  16. Magic Sets fornon-recursive rules • Find the graduating seniors and their parents’ address: spa(SN, PN, Paddr) ¬ senior(SN), parent(SN, PN), address(PN, Paddr). senior(SN) ¬ student(SN, _, senior),graduating(SN). • To find the address of the parent named `Joe Doe’ ?spa(SN, `Joe Doe’, Paddr) • Suppose that computing parent(X, $Y) is safe and not too expensive.

  17. Magic Set Rewriting spa_q(‘Joe Doe’). m.senior(SN) ¬ spa_q(SN), parent(SN,PN). senior(SN) ¬ m.senior(SN),student(SN, _, senior), graduating(SN). The rest remains unchanged: spa(SN, PN, Paddr) ¬ senior(SN), parent(SN,PN), address(PN,Paddr). ? spa(SN, `Joe Doe’, Paddr).

  18. The Same Generation Example sg(A , A). sg(X, Y) ¬parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who). • This program cannot be computed in a bottom-up fashion because the exit rule is not safe. • We can compute a “magic” set containing all the ancestors of marc and add them to the two rules. • The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound

  19. Magic Sets (Cont.) Magic set rules: m.sg(marc). m.sg(XP) ¬ m.sg(X), parent(XP,X). Transformed rules: sg¢(X, X) ¬ m.sg(X). sg¢(X, Y) ¬ parent(XP,X), sg¢(XP,YP), parent(YP,Y), m.sg(X). Query: ?sg¢(marc, Who). • The rules for the magic predicates are built by using: (1) the query constant as the exit rule (a fact). (2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!

  20. Recursive Methods • There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++

More Related