1 / 33

Xpath Query Evaluation

Xpath Query Evaluation. Goal. Evaluating an Xpath query against a given document To find all matches We will also consider the use of types Complexity is important Huge Documents. Data complexity vs. Combined Complexity. Two inputs to the query evaluation problem

duaa
Download Presentation

Xpath Query Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Xpath Query Evaluation

  2. Goal • Evaluating an Xpath query against a given document • To find all matches • We will also consider the use of types • Complexity is important • Huge Documents

  3. Data complexity vs. Combined Complexity • Two inputs to the query evaluation problem • Data (XML document) of size |D| • Query (Xpath expression) of size |Q| • Usually |Q| << |D| • Polynomialdata complexity • Complexity that is polynomial in |D|, possibly exponential in |Q| • Polynomial combined complexity • Complexity that is polynomial in |D| and |Q| • Fixed Parameter Tractable complexity • Complexity Poly(|D|)*f(|Q|)

  4. Xpath Query Evaluation • Input: XML Document D, Xpath query Q • Output: A subset of the nodes of D, as defined by Q • We will follow Efficient Algorithms for Processing Xpath Queries / Gottlob, Koch, Pichler, TODS 2005

  5. Simple algorithm process-location-step(n,Q) { S:-= Apply Q.first to n; If |Q|> 1 For each node n’ in s do process-location-step(n’,Q.next) }

  6. Complexity • Worst case: in each step of Q the axis is “following” • So we apply the query in each step on O(|D|) nodes • And we get Time(|Q|)= |D|*Time(|Q|-1) • I.e. the complexity is O(|D|^|Q|)

  7. Early Systems Performance Figure taken from Gottlob, Koch, Pichler ‘05

  8. Internet Explorer 6 Figure taken from Gottlob, Koch, Pichler ‘05

  9. IE6 – performance as a function of document size Figure taken from Gottlob, Koch, Pichler ‘05

  10. Polynomial data complexity • Poly data complexity is sometimes considered good even if exponential in the query size • But can we have polynomial combined complexity for Xpath query evaluation? • Yes!

  11. Two main principles • Query parse trees: the query is divided to parts according to its structure (not to be confused with the XML tree structure) • Context-value tables: for every expression e occurring in the parse tree, compute a table of all valid combinations of context c and value v such that e evaluates to v in c.

  12. Xpath query parse tree descendant::b/following-sibling::* [position() != last()]

  13. Bottom-up vs. Top-down evaluation • We will discuss two kinds of query evaluation algorithms: • Bottom-up means that the query parse tree is processed from the leaves up to the root • Top-down means that the parse tree is processed from the root to the leaves • When processing we will fill in the context-value table

  14. Bottom-up evaluation • Main idea: compute the value for each leaf for every possible context • Propagate upwards until the root • Dynamic programming algorithm to avoid re-evaluation of queries in the same context

  15. Operational semantics • Needed as a first step for evaluation algorithms • Similar ideas used in compilers design • Here the semantics is based on the notion of contexts

  16. Contexts • The domain of contexts is C= dom X {<k,n> | 1<k<n< |dom|} A context is c=<x,k,n> where x is a context node k is a context position n is the context size

  17. Semantics for Xpath expressions • The semantics of evaluating an expression is a 4-tuple where the first 3 elements are the context, and the fourth is the value obtained by evaluation in the context

  18. Some notations • T(t): all nodes satisfying a predicate t • E(e): all nodes satisfying a regular exp. e (applied with respect to a given axis) • Idxx(x,S) is the index of a node x in the set s with respect to a given axis and the document order

  19. Context-value Table • Given a query sub-expression e, the context-value table of e specifies all combinations of context c and value v, such that computing e on the context c results in v • Bottom-up algorithm follows: compute the context-value table in a bottom-up fashion with respect to the query

  20. Bottom-up algorithm

  21. Example 4 times

  22. Complexity • O(|D|^3*|Q|) space ignoring strings and numbers • O(|Q|) tables, with 3 columns, each including values in 1…|D| thus O(|D|^3*|Q|) • An extra O(|D|*|Q|) multiplicative factor for strings and numbers • O(|D|^5*|Q|) time ignoring strings and numbers • It can take O(|D|^2) to combine two nodesets • Extra O(|Q|) in case of strings and numbers

  23. Optimization • Represent contexts as pairs of current and previous node • Allows to get the time complexity down to O(|D|^4* |Q|^2) • Space complexity can be brought down to O(|D|^2*|Q|^2) via more optimizations

  24. Top-down evaluation • Similar idea • But allows to compute only values for contexts that are needed • Same worst-case bounds

  25. Top-down or bottom-up? • General question in processing XML trees • The tradeoff: • Usually easier to combine results computed in children to obtain the result at the parent • So bottom-up traversal is usually easier to design • On the other hand, some of the computation is redundant since we don’t know if it will become relevant • So top-down traversal may be more efficient

  26. Linear-time fragment • Core Xpath includes only navigation • \ and \\ • Core Xpath can be evaluated in O(|D|*|Q|) • Observtion: no need to consider the entire triple, only current context node • Top-down or bottom-up evaluation with essentially the same algorithm • But smaller tables (for every query node, all document nodes and values of evaluation) are maintained.

  27. Types are helpful • Can direct the search • In some parts of the tree there is no hope to get a match to a given sub-expression of the query • As a result we may have tables with less entries. • Whiteboard discussion

  28. Type Checking and Inference • Type checking a single document: straightforward • Polynomial combined complexity if automaton representing type is deterministic, exponential in automaton size but polynomial in document size otherwise • Type checking the results of a (Xpath) query • Inferring the results of a query

  29. Type Inference • An (incomplete) algorithm for type inference can work its way to the top of the query parse tree to infer a type in a bottom-up fashion • Start by inferring a type for the leaves (simple queries), then use it for their parents • Type Inference is inherently incomplete. • Can be performed for some languages that are “regular” in a sense.

  30. Restricted language allowing for type inference • Axes: child, descendant, parent, ancestor, following-sibling, etc. • variables can be bound to nodes in the input tree= then passed as parameters • An equality test can be performed between node ID's, but not between node values.

  31. Type Checking • In addition to inferring a type we need to verify containment in another type. • Type Inference can be used as a tool for Type Checking. • Type Checking was shown to be decidable for the same language fragment, but with high complexity.

  32. Intuitive connection to text • Queries => regular expressions • Types (tree automata) => context free languages • Type Inference => intersection of context free and regular languages, resulting in a context free one • Type checking => Type Inference + inclusion of context free languages (with some restrictions to guarantee decidability)

More Related