1 / 43

On the Minimization of XPath Queries

On the Minimization of XPath Queries. Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183. Outline. Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing X P ath queries

kisha
Download Presentation

On the Minimization of XPath Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Minimization of XPath Queries Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183

  2. Outline • Introduction • Trees & Tree Patterns • Problem Statement • A Framework for minimizing XPath queries • Complexity Results • Tractability Results • Conclusions & Future Works

  3. Introduction • XML Queries are usually expressed by means of XPath expressions. • XPath expressions. • A way of navigating an XML tree to return the set of nodes through the paths specified by the expression.

  4. <bib> <book> <book> <authors> <title> <editor> <title> <genre> <editor> <genre> <author> <author> Introduction – cont’d • An XPath expression can be represented graphically as a tree pattern. An XML Tree...

  5. <bib> <book> <author> <title> Introduction – cont’d • For example; “find the titles of all the books for which at least one author is known.” • XPath Expression: bib/book[//author]/title Descendant Edge Output Node A tree pattern

  6. Introduction – cont’d • Efficiency of XPath expression depends on size. • Optimization ~ Minimization • We should minimize the expression.

  7. Introduction – cont’d • Example Query : “Retrieve the editors that published thrillers and whose authors have written a thriller.” • “query containment”. • Reduced Query : “Retrieve the editors that published thriller.”

  8. Introduction – cont’d • Minimization problem for XPath fragments can be efficiently solved as: • It can be reduced to solve a number of instances of containment between pairs of tree patterns. • For these fragments, it can be reduced to find a homomorphism between them.

  9. Trees & Tree Patterns • A tree t is a tuple (rt, Nt, Et, t) where; • Nt  ℕ, set of nodes. • t : Nt  is a node labelling function. • rt Nt is the distinguished root of t. • Et  Nt x Nt, set of edges.

  10. Trees & Tree Patterns – cont’d • Given a tree t = (rt, Nt, Et, t) • Tree t’ = (rt’, Nt’, Et’, t’) is the subtree if : • Nt’  Nt; • The edge (ni, nj) belongs to Et’ iff ni Nt’,nj Nt’ and (ni, nj)  Et.

  11. Trees & Tree Patterns – cont’d • Definition : A tree pattern p is a pair (tp, op), where: • tp= (rp, Np, Ep, p) is a tree. • Ep is partitioned into the two disjoint sets Cp and Dp denoting, respectively, the child and descendent branches; • op Np is a distinguished output node.

  12. Trees & Tree Patterns – cont’d • Grammar for XPath expressions : exp  exp | exp/exp | exp//exp | exp[exp] |  | * | . where  is a symbol in , and the symbol ‘.’ stands for the current node. • Given XPath expression; a[b/*//c]//d a b d * c

  13. Trees & Tree Patterns – cont’d • Given a tree t and a tree pattern p, an embedding e of p into t is a total function e : Np Nt, such that: • e(rp) = rt, • (x; y)  Cp, e(y) is a child of e(x) in t, • (x; y)  Dp, e(y) is a descendant of e(x) in t, and • x  Np, if p(x) = a (where a  *) then t(e(x)) = a.

  14. Trees & Tree Patterns – cont’d • Models and Canonical Models of Tree Patterns • The models of a tree pattern p defined over the alphabet  are the trees of T which can be embedded by p. The set of models of p is Mod(p) = {t  T | p(t)   } • Canonical models of a tree pattern p are models having the same shape as p. That is, a canonical

  15. Trees & Tree Patterns – cont’d Model and Canonical Model of a tree pattern

  16. Trees & Tree Patterns – cont’d • Given two tree patterns p1, p2, we say that p1 is contained in p2 (p1 p2) iff t p1(t)  p2(t). • We say that p1 and p2 are equivalent (p1 p2) iff p1 p2 and p2 p1 (i.e. t p1(t) = p2(t)). • The set of patterns which are equivalent to a given pattern p will be denoted as Eq(p).

  17. Trees & Tree Patterns – cont’d • Notations on tree patterns. A pattern p and its subpatterns spb, spd, spa

  18. Trees & Tree Patterns – cont’d • Tree pattern p whose root has 2 children Subpattern examples

  19. Problem Statement “Given a tree pattern p, construct a tree pattern pmin which is equivalent to p and having minimum size (i.e. size(pmin) = minsize(p))”

  20. Problem Statement– cont’d • a minimum size tree pattern equivalent to p can be found among the subpatterns of p; • the containment between two tree patterns p, q (p  q) is equivalent to the problem of finding a homomorphism from q to p. A homomorphism h from a pattern q to a pattern p is a total mapping from the nodes of q to the nodes of p such that: • h preserves node types (i.e. u  Nqq(u)  `*' ) q(u) = p(h(u))); • h preserves structural relationships (i.e.whenever v is a child (resp. descendant) of u in q, h(v) is a child (resp. descendant) of h(u) in p).

  21. Problem Statement– cont’d A homomorphism between two tree patterns

  22. Problem Statement– cont’d Two tree patterns not related with homomorphism

  23. A framework for minimizing XPath Queries • Two fundamental contribution • Proving that property 1 holds for XP{/, //, [], *} • An algorithm for minimizing a tree pattern query

  24. Proving that Property holds for XP{/, //, [], *} • Various lemmas are introduced • Lemma 1 : Let p and q be two patterns with root r, such that p contained in q. Then, for each subpattern Qj element of P(q) there exists a subpattern Pi element of P(p) s.t Pi contained in Qi.

  25. Example for Lemma 1

  26. Proving that Property holds for XP{/, //, [], *} • Lemma 2 : Let p and be two patterns rooted in r s.t p=q and let m and n, with m>n, be the number of children of r in p and, respectively, q. Then, there exist a set S subset of SP(p) consisting of m-n subpatterns spi, such that p-S = p

  27. Example for Lemma 2

  28. Proving that Property holds for XP{/, //, [], *} • Lemma 3 : Let p and q be two equivalent patterns rooted in r having the same number of child and descendant nodes of r, and let q be of minimum size. Then, there not exists a subpattern spk element of SP(p) such that p – spk = p

  29. Proving that Property holds for XP{/, //, [], *} • Lemma 4 : Let p and q be two eqivalent patterns whose roots have the same number of child and descendant nodes, and let q be of minimum size. For each subpattern Pi element of P(p) there exists a unique subpattern Qj element of P(q) directly connected to rq s.t piqj

  30. Proving that Property holds for XP{/, //, [], *} • Lemma 5 : A pattern p in XP {[], /, //, *} is not of minimum size iff at least one of the following conditions hold: • there exists a pair of subpatterns pi, pj s.t pi contained in pj; • there exists a subpattern pi of p which is not of minimum size.

  31. Proving that Property holds for XP{/, //, [], *} • Theorem 1 : Given a pattern p in XP{/, //, [], *} if minsize(p) = k then there exists a subpattern pmin of p such that p= pmin and size(pmin)=k

  32. An Algorithm for tree pattern minimization Function Minimize(p:a tree pattern):pmin a minimum tree pattern equivalent to p Begin pmin = p; For each pi element of P(pmin) do if (pmin -spi contained in pmin) pmin = pmin – spi; SPnew = 0; For each spi element of SP(pmin) do SPnew = SPnew + Minimize(spi); pmin = assemble (pmin, SPnew); return pmin; End

  33. Upper Bound • Algorithm 1 works in O(b*r*(|p|^2)*((w+1)^(d+1))) • |p| is the size of p • d is the number descendant edges in p • w is one the longest chain of ‘*’ in p • b is the number branches of p as b • r is the maximum degree of any node of p

  34. Complexity Results • In XPath{/, //, [], *} it is not possible to define an algorithm performing much better than Algorithm 1 • Lemmma 6: Let p be a pattern in XP{/, //, [], *} and k is possitive integer. The problem of testing if minsize(p)>k is NP-complete problem

  35. Complexity Results • Theorem 2 : Let p be a pattern in XP{/, //, [], *} and k a positive integer. The problem of testing if there exists a pattern p’ equivalent to p such that size(p’)<= k is coN-complete

  36. Tractability Results • Definition : A limited branched tree pattern p is a tree pattern in XP{/, //, [], *} such that: • Every non leaf node of p may have any number of children; • If a node n has k children n1...nk, then at least k-1 of the patterns spn, (where i element [1...k]) are linear.

  37. b r b a b d * a c b * Example

  38. Tractability Results • Theorem : Let p a limited branched tree pattern. A minimum pattern pmin equivalent to p can be found in polynomial time. (w.r.t. The size of p) • Linear patterns have minimum size • The containment between pairs of linear patterns can be decided in polynomial time.

  39. Tractability Results • Algorithm 2 Function Minimize(p:a boundend branched tree pattern):pmin a minimum tree pattern equivalent to p Begin pmin = p; B = {b1, ...., bm}; while(B != 0) b = deepest(B); q = spb; Redq = 0; For each pi element of P(q) do For each qj of P(q) do if ((i!=j) ^ (qi is linear) ^ (qj not element of Redq) ^(qj contained in qi)) Redq = Redq + qi; q = q – Redq; pmin = replace(pmin, sqb, q); end while return pmin; End

  40. Conclusion • It has been proved the global minimality property, a minimum tree pattern equivalent to a given tree pattern p can be found amoung the subpatterns of p, and thus obtained by prunning “redundant” branches from p. • It has been characerized the complexity of the minimization problem, showing that the corresponding decisional problem is coNP-complete.

  41. Conclusion • It has been studied a “tractable” form of tree pattern which can be minimized in polynomial time. • It has been provided by somealgorithms proposed in the paper.

  42. Future Works • Extending minimization framework to deal with XPath queries that must satisfy some constraints such as join conditions on tree pattern nodes. • The introduction of these constraints makes the minimization problem harder, and global minimality property does not hold.

  43. Questions? Thank you...

More Related