- 43 Views
- Uploaded on
- Presentation posted in: General

Containment and Equivalence for an XPath Fragment

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Containment and Equivalence for an XPath Fragment

By

Gerome Mikla

Dan Suciu

Presented By

Roy Ionas

- PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS.
- PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM.
- PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS.

- A simple language for navigating XML documents and selecting a set of nodes
- With XPATH we can query XML data , describe key constraints , express transformations and reference elements in remote documents.
- We can find XPath influence in other XML query languages and features such as XQuery , XSLT , XML schema , XLink , XPointer and more...

- Simple XPath fragment.
- Containment between two XPath fragments.
- Equivalence between two XPath fragments.
- Computability definitions.
- Tree patterns as a proving tool for XPath fragments.

- An XPath statement.
- Contains three most important features for navigating:
- Child and descendant axis. “//” “/”
- Wildcards. “*”
- Qualifiers. “[]”

- We disregard attributes , conditions...
- We identify and compare nodes only by their label.
- We disregard order completely.
- Example: a//*[b//d][c]

- Are these all the features we have in XPath???
- Are these all the features we need for representing navigation in XML documents ?

NO!!!!!

YES!!!!!

At least these are the needed ones for the proof of this article.

- The meaning of Containment between two XPath’s fragments A and B is that for every XML document the result of applying XPath A will be contained in the result of applying XPath B.
- Result is stated as a Set of nodes and does not consider order.
- Can we apply this containment on the entire XML documents world??
- Is there another way to determine containment between two XPath fragments???

- The meaning of Equivalence between two XPath fragments A and B is that for every XML document the result of applying XPath A will equal to the result of applying XPath B.
- The problem of Equivalence can be reduced to the problem of Containment
- Equivalence = containment in both ways between patterns.
- Containment can be computed with an algorithm that computes equivalence and runs in polynomial time.

- From now we will mention only the problem of containment and the results will be valid as well for equivalence.

- NP - stands for “Nondeterministic-Polynomial".
- P class - A class of mathematical problems for which an efficient solution has been found , which is solvable in polynomial time.
- NP class - A class of mathematical problems which most likely has Exponential Complexity, for which no efficient solution has been found (yet), which is not solvable in polynomial time.
- NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ).
- NP complete problem – a problem which belongs to the NP class of problems and is a NP hard problem by itself.

- An unordered tree over the alphabet of the XPath.
- XPath nodes are marked as nodes in the tree pattern.
- Child axis are marked as edges.
- Descendant are marked as edges with double lines.
- K-tuple of nodes called the result type.
- For a tree pattern P The arity of the result tuple is called the of arity of P.
- Pattern tree P is Boolean iff its arity is 0.

- Tree patterns are more elegant and general than XPath fragments.
- We can reduce from XPath to Tree Patterns and via versa quite easily.

Now we can prove attributes using the graph theory.

- For the Xpath expression :
- a//*[b//d][c] will be the next tree

root

a

wildcard

*

child

c

b

descendant

d

- Embedding from Tree pattern to XML tree.
- Imagine it as a function that must:
- preserve root.
- Respects node labels.
- Respects edge relationships.

- After embedding return the information from the nodes marked as return nodes and down.
- For Boolean Patterns return true if such an embedding exists.

a

a

s

*

t

b

c

b

c

d

d

- Testing Containment between two XPath fragments is a NP complete problem.
- Can be proven by a reduction from the 3CNF Co-NP class to our class.

- In almost all the applications we described so far.
- Inference of keys.
- Optimization of XPath queries.

When do we need to test for containment or equivalence between fragments?

I guess we care...

- Finding an algorithm that will be both efficient and complete for this problem is quite difficult ( like proving P = NP ).

- Finding an algorithm which is efficient but not complete.
- Finding an algorithm that is complete but not always efficient.

- An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the following conditions:
- Root preserving.
- For each x in p h(x) in p’ is x or *.
- Child and descendant relations preserving.

- Finding weather a homomorphism between two patterns exist has many efficient algorithms.
- The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p p .
- The existence of homomorphism is always a sufficient condition for containment.
- But is it a necessary condition?

a

a

h(a) = a

h(b) = *

*

b

c

- A Homomorphism between the two tree patterns does not exist even though they are equivalent.

a

a

*

*

b

b

- Fragments contain only *,[]
- Fragments contain only //,[]
- Fragments that contain all three but can be translated to an expression that belongs to one of the above without changing the semantic.

- Sound.
- Efficient.
- Incomplete.

Now we aim searching over an algorithm which will be sound and complete and may be efficient in several cases.

- Reducing the problem of containment between two XPath fragments to containment between two regular languages by translating from Tree Pattern to an automata.
- The algorithm is complete , with defined rules we can translate completely from automata to Tree Pattern and via versa.

- Defined on ranked trees.
- Bottom up structure.
- Only the root is an accepting state.
- The initial states are the leaves of the tree.
- The transitions are of the form:(q1,q2,…,qn;a) -> q

- FTA - finite tree automata, an automata that contains set of states and transitions of the form described.
- FTA can be deterministic - DFTA.
- Each FTA A with Q states can be translated to a DFTA B with maximum of 2Q states .
- AFTA - alternating finite tree automaton extends the definition of FTA by adding “AND transitions” of the form (q1,q2,…,qm)->qi.
- A DFTA can be built as well for AFTA without increasing the cost of determinisiting the automata.

- Construct the DFTA A accepting the “regular expressions of P”
- Construct the AFTA A’ accepting the regular expressions of P’ ”
- Compute the AFTA B=A x A’
- compute the DFTA C=Det(B)
- if lang(A) lang(C) the return true else return false.

r

r

?

a

a

*

b

b

a

b

*

b

- States(A) = Nodes(p).
- For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> x
- For each descendant edge e from node x to node y we add (y;e)->x.
we add internal circle (y,*) -> y

- The terminal state will be only the root.

r

r

a

a

*

b

b

*

b

a

a

b

b

b

- States(A’) = Nodes(p’) Edges(p’)
- (q,a) -> for every symbol a that has out coming edge e. if it is a descendant relationship than we also add an internal circle to the source node.
- (e1,e2,e3..) -> a for every a that has incoming edges.

r

r

a

a

b

*

b

*

- Sound
- Complete.
- Not always efficient.

THE END