From tree patterns to generalized tree patterns: On efficient evaluation of XQuery. Z.M. Chen, H.V. Jagadish, L.V.S. Lakshmanan, S. Paparizos (VLDB 2003) Fatih Gön 2002701366 Mehmet Şenvar 2003700221 Bogazici University Department of Computer Engineering. Overview.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
From tree patterns to generalized tree patterns: On efficient evaluation of XQuery
Z.M. Chen, H.V. Jagadish, L.V.S. Lakshmanan, S. Paparizos
(VLDB 2003)
Fatih Gön 2002701366
Mehmet Şenvar 2003700221
Bogazici University Department of Computer Engineering
Motivation: Current approach for XQuery evaluation is not efficient.
Need a concise XQuery model as the basis to generate the efficient evaluation physical plan
Main contribution:
Current approaches
Navigational plan (NAV) : traverses down the path by recursively getting all children nodes and filter unwanted before next iteration
Baseline plan (BASE) : use TAX operator which take tree pattern and sequence of trees as input. Some tree patterns may be repeatedly evaluated.
Our approach
Generalized Tree Pattern (GTP) : use GTP as XQuery model to generated an efficient evaluation plan
$p
$p.tag = person &
$s.tag = state &
$l.tag = profile &
$g.tag = age &
$g.content > 25 &
$s.content != ‘MI’
Tree T
Boolean formula F
$s
$l
$g
(a)
$p
Boolean formula F
$p.tag = person &
$w.tag = watches &
$t.tag = watch
Tree T
$w
$t
(b)
FOR $p IN document(“auction.xml”)//person, $l IN $p/profile
WHERE $l/age > 25 AND $p//state != ‘MI’
RETURN <result> {$p//watches/watch} {$l/interest} </result>
(a) An XQuery example
$p
(0)
$p.tag = person & $s.tag = state &
$l.tag = profile & $i.tag = interest &
$w.tag = watches & $t.tag = watch &
$g.tag = age & $g.content > 25 &
$s.content != ‘MI’
(0)
$s
$l
$w
(0)
(1)
$t
$g
$i
(1)
(0)
(2)
(b) Generalized tree pattern
GTP: A pair G=(T,F), where T is a tree and F is a boolean formula.
Group: each maximal set of nodes in a GTP connected to each other by paths not involving optional edges. By convention, group 0 include the GTP root.
A pattern match of G into a collection of trees C is a partial mapping
h: GC such that:
A pattern matchis a mapping from the pattern nodes to nodes in an XML database such that the formula associated with the pattern as well as the structural relationships among pattern nodes.
Universal GTP is a GTP G=(T,F) such that some solid edges may be labeled ‘EVERY’.
‘SOME’ quantifier is already handled.
Eg. FOR $o IN document(“auction.xml”)//open_auction
WHERE EVERY $b in $o/bidder SATISFIES $b/increase > 100
RETURN <result> {$o} </result>
(0)
$o
F_L: pc($o,$b) & $b.tag = bidder
F_R: pc($b,$i) & $i.tag = increase &
$i.content > 100
EVERY
(1)
$b
$b: [F_L
$i: (F_R)]
(2)
$i
Function-free XQuery captured by the following grammar
FLWR ::= ForClause LetClause WhereClause ReturnClause.
ForClause ::= FOR $fv1 IN E1, … , $fvn IN En.
LetClause ::= LET $lv1 := E1, … , $lvn := En.
WhereClause ::= WHERE
(E1, … , En).
ReturnClause ::= RETURN {E1} … {En}.
Ei ::= FLWR | XPATH.
Input: a FLWR expression Exp, a context group number g
Output: a GTP or GTPs with a join formula
if (g’s last level !=0)
let g = g + “.0”;
foreach (“For $fv in E”) do
parse(E,g);
let ng = g;
foreach (“Let $lv := E”) do{
let ng = ng + 1;
parse(E, ng);
}
foreach predicate p in WHERE do {
if (p is “every El satisfies Er” ){
let ng = ng+1;
parse (El, ng);
F_L be the formula associated with the pattern
result from El;
let ng = ng+1;
parse(Er,ng);
F_R be the formula associated with the pattern
result from Er;
}
else{
foreach Ei as p’s argument do
parse(Ei, g);
}
}
foreach “{Ei}” do {
let ng = ng + 1;
parse (E, ng);
}
Procedure parse
Input: FLWR expression or XPath expression E,
context group number g
Output: Part of GTP resulting from E
if (E is FLWR expression)
GTP (E, g);
else buildTPQ(E);
end procedure
Input: a FLWR expression Exp, a context group number g
Output: a GTP or GTPs with a join formula
The GTP can be informally understood as follows:
1)Find matches for all nodes connected to the root by only solid edges
2)Next, find matches to the remaining nodes (whose path to the GTP root involves one or more dotted edges), if they exist.
Index Scan ISp(S) : output each node satisfying the predicate p using an index for input trees S.
Filter Fp(S) : output only the trees satisfying the predicate p given trees S. Order is preserved.
Sort Sb(S) : Sort the input sequence of trees S based on the sorting basis b.
Value Join Jp(S1,S2) : a value-based comparison on the two input sequences of trees via the join predicate p. output sequence order is based on the left S1 input sequence order.
Structural Join SJr(S1,S2) : input tree sequences S1,S2 must be sorted based on the node id. Operator joins S1 and S2 based on the structural relationship r between them for each pair. Output is sorted by S1 or S2 as needed. Outer Structural Join (OSJ) where all S1 is included in the output. Semi structural Join (SSJ) where only S1 is retained in the output.
Group By Gb(S) : input is sorted on the grouping basis b. Group trees based on the grouping basis b.
Merge M(S1,…,Sn) : Sj’s are assumed to have the same cardinality k. For each i<=i<=k, merge tree i from each input under an artificial root and produce an output tree. Order is preserved.
RETURN
ARGUMENT #1
RETURN
ARGUMENT #2
M
G
G
person, profile
person, profile
S
S
person, profile
person, profile
OSJ
SJ
profile/interest
watches/watch
IS
IS
S
interest
watch
OSJ
profile
person/watches
IS
SJ
watches
person/profile
SSJ
SSJ
F : filter
IS : tag index scan
SSJ : structural semi-join
SJ : strcutural join
OSJ : outer structural join
S : sort
M : merge
person//state
profile/age
IS
F
IS
F
content != ‘MI’
content > 25
person
profile
IS
IS
state
age
- simplfy GTP by eliminating nodes using DTD or XML schema
- eliminate duplicate operators (e.g. sorting, duplicate elimination)
Internal node elimination
a//b//c a//c,
$a
$a
$b
$c
$c
if schema implies every path from a to c passes through b.
a/b/c a//c?
Identifying two nodes with same tag
FOR $b IN …//book
WHERE $b/title = ‘DB’
RETURN <x> {$b/title} {$b/year} </x>
$b
$b
$t
$t2
$y
$t
$y
$t2 can be eliminated,
if schema says every book has at most one title child
Eliminate redundant leaves
FOR $a IN …./a[b]
RETURN {$a/c}
$a
$a
$b
$c
$c
$b can be eliminated,
If schema implies every a has at least one b
Elimination of sorting
SJ
“p1”
person
person/profile
person
profile
Provided two sorted input, the output will be in either person order or profile order. Not both in general.
However, if schema implies no person can have person descendants, output of the structural join ordered by person node id will also be in profile node id order.
“p2”
“l2”
profile
“l1”
{p1 – l2, p2 – l1}
Not both in order!!!
Elimination of group-by
{$l/interest}
We must group the return argument results for the FOR variable in general.
However, if schema implies each profile has at most one interest subelement, then grouping on interest can be eliminated.
Elimination of duplicate elimination
watches
“ws1”
$p//watches//watch
If schema implies watches cannot have watches descendants, the duplicate elimination is unnecessary.
watches
watch
“ws2”
“w1”
watch
$p//watches/watch?
“w2”
ws1: {w1,w2}
ws2: {w2}
Note: 1. t can not have t descendants
2. A can only have one child B
simplifies GTP based on child/descendant constraints and avoidance constraints
Let C : set of child/descendant constraints
Let G : GTP
There is a unique GTP Hmin equivalent to G under C, which has the smallest size among all equivalent GTPs.
GTP simplification algoritm will correctly simplfy G to Hmin in polynomial time
Questions ?