vamana talk 2 v m n
Download
Skip this Video
Download Presentation
VAMANA (Talk 2) ( vǎ - mǎ - nǎ )

Loading in 2 Seconds...

play fullscreen
1 / 39

VAMANA (Talk 2) ( vǎ - mǎ - nǎ ) - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

VAMANA (Talk 2) ( vǎ - mǎ - nǎ ). An Efficient XPath Query Engine Exploiting the MASS Index. Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003. Introduction. Purpose of the talk. Generation of Execution Tree Execution Running Example 1. Running Example 2.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' VAMANA (Talk 2) ( vǎ - mǎ - nǎ )' - hall


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
vamana talk 2 v m n

VAMANA (Talk 2)(vǎ - mǎ - nǎ)

An Efficient XPath Query Engine Exploiting the MASS Index

Venkatesh Raghavan & Prof. Elke Rundensteiner

DSRG Talk

1ST May 2003

introduction
Introduction
  • Purpose of the talk.
    • Generation of Execution Tree
    • Execution
      • Running Example 1.
      • Running Example 2.
    • XPath Expression Execution.
    • Cost Estimation.
    • Heuristics and Transformation.
running examples
Running Examples

E.g. 1: //name/parent::person/descendant::watch

<people>

<person id="person1">

<name> Hayato Cappelletti </name>

<watches> 

<watch open_auction="open_auction82" />

E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person

<people>

<person id="person1">

<name> Klemens Pelz </name>

bigger picture

XPath Expression

Node Set

Execution Tree

Mass Interface

Node Set

Bigger Picture

XQuery Engine

(future development)

XPath

Processor

VAMANA

(XPath Query Engine)

MASS

(A Multi-Axis Storage Structure

for Large XML Documents)

how many root s are there
How many “ROOT(s)” are there?
  • Root of the Document
    • We call it “Document Root”
  • Root of the expression
      • //name/parent::person/descendant::watch
    • We call it “First Location Step”
  • Root of Execution Tree
    • We call it “ROOT”
xpath processor

XPath Expression

XPath

Processor

Execution Tree

XPath Processor

E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person

person

Parent

ROOT

Phase 1: Parse Tree

name

//

CONTEXT

“Klemens Plez”

LITERAL

OPERAND

BIPRED

=

PRED

text

child

OPERAND

contd

XPath Expression

XPath

Processor

Execution Tree

Phase II: Transformed Parse Tree

“Klemens Plez”

LITERAL

OPERAND

BIPRED

=

PRED

text

child

OPERAND

“Klemens Plez”

LITERAL

OPERAND

BIPRED

=

PRED

text

child

OPERAND

Contd..

person

Parent

ROOT

name

//

CONTEXT

Phase I: Parse Tree

phase iii execution tree generation

XPath Expression

XPath

Processor

Execution Tree

“person”

X: Parent

“name”

X: //

“”

X: child

“Klemens Plez”

BI_PREDICATE

“EQ”

Phase III: Execution Tree Generation

person

Parent

ROOT

“Klemens Plez”

LITERAL

OPERAND

BIPRED

=

PRED

text

child

OPERAND

name

//

CONTEXT

Phase II: Transformed Parse Tree

Phase III: VAMANA Execution Tree

vamana nodes vnode

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

VAMANA Nodes (VNode)

VRootNode

MassNode

Node Base

VBinaryPredicateNode

VExistPredicateNode

VJoinNode

VLiteralNode

vnode structure

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

VNode Structure

Root Node

Expression Side

child

Context Side

vnode flow structure

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

VNode Flow Structure
  • Data-Flow style of querying.
    • Most of commercial relational database system.
  • Each node is arranged in a fashion such that data“flow” from one node to another in a procedure-consumer fashion.
    • Correctness.
    • Each node performs some operation on the data that flows through it.
    • The result is produced by the last node on the dataflow chain.
  • IN SHORT:
    • Data Flows upwards.
    • Control Flows downwards.
  • Iterative.
contd1

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

Contd.
  • Iterative.
    • Currently VAMANA executes nodes iteratively.
    • So no copies of the data is made.
  • IS IT A PROBLEM?
    • MASS produces nodes in document order so not a problem.
    • But there are some expression that in sibling order.
      • Work in progress.
execution tree

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

“name”

X: //

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

Execution Tree

E.g. 1: //name/parent::person/descendant::watch

Root Node

Context Side

how do we execute

Execution Tree

VAMANA

(XPath Query Engine)

Mass Interface

Node Set

MASS

How Do We EXECUTE ?
  • Step 1:
    • Set Context Node of the root of the expression.
      • In this example the root of the expression is the root of the document.
  • Step 2:
    • Ask the VAMANA Root Node for nodes.

//name/parent::person/descendant::watch

step1 setting context for the first location step

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

“name”

X: //

Step1:Setting Context for the “First Location Step”

//name/parent::person/descendant::watch

slide16

b.i.c.m.c

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

“name”

X: //

b.i.c

b.i.c.c

INTIAL

FETCHING

b.i.c.c

OUT OF NODE

//name/parent::person/descendant::watch

b.i.c.m.c

b.i.c

slide17

b.i.c.m.e

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

“name”

X: //

b.i.c

b.i.c.c

//name/parent::person/descendant::watch

b.i.c.m.c

b.i.c.m.c

b.i.c.m.e

b.i.c

b.i.c.c

slide18

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

“name”

X: //

b.i.c

b.i.i

b.i.c.c

b.i.i.c

//name/parent::person/descendant::watch

b.i.c.m.e

b.i.c.m.e

b.i.i.m.c

b.i.i

b.i.i.c

io operation
IO Operation

** Please see handout

a.a.a , a.b.a, a.b.b , a.c.a , a.c.a, a.c.b

/z

a.a , a.b , a.c

//y

example 2

“name”

X: //

“person”

X: AXIS_PARENT

“ ”

X: AXIS_CHILD

“Klemens Pelz”

BI_PREDICATE

EQ

Example 2

//name [ text() = “Klemens Pelz” ]/parent::person

Context Side

Expression Side

slide21

b.i.e.c

“person”

X: AXIS_PARENT

BI_PREDICATE

EQ

“name”

X: //

“ ”

X: AXIS_CHILD

“Klemens Pelz”

b.i.e.c

b.i.e.c

b.i.e.c

b.i.e.c

//name [ text() = “Klemens Pelz” ]/parent::person

b.i.e

b.i.e.c.b

Klemens Pelz

determining selectivity
Determining Selectivity

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

I_Tuples:

  • Count.
    • The exact count of the number of nodes in MASS storage structure of that particular nodetest.
  • IN.
    • The number of tuples that are fetched by the child VNode.
  • OUT.
    • The number of tuples produced by the VNode.
  • I_Tuples.
    • Total number of tuples processed till that VNode.
    • This includes the cutrrent node also.
example 1 name parent person emailaddress

NodeType: MASS

NodeTest: person

X: AXIS_PARENT

Count: 255

IN: 482

OUT: ?

Example 1: //name/parent::person/emailaddress

NodeType: MASS

NodeTest: name

X: //

Count: 482

IN: 482

OUT: 482

worst case costing
Worst Case – Costing
  • Categorize the axis into three division
  • Division 1:
    • child | descendant | descendant-or-self

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

  • Cases:
  • #X > #Y
  • #Y > #X

X

#X

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

Y

contd2
Contd.
  • Division 2:
    • parent, ancestor, ancestor-or-self, following, following-sibling, preceding, preceding-sibling

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

  • Cases:
  • #X > #Y
  • #Y > #X

X

#Y

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

Y

contd3
Contd.
  • Division 3:
    • Self
  • For Example:
    • //*/self::X
    • Y/self::*
  • Cases:
  • #X > #Y  #Y
  • #Y > #X  #X

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

X

NodeType:

NodeTest:

X:

Count:

IN:

OUT:

Y

slide27

NodeType: MASS

NodeTest: watch

X: AXIS_DESCENDANT

Count: 488

IN: 482

OUT: 488

I_Tuple: 1225

NodeType: MASS

NodeTest: person

X: AXIS_PARENT

Count: 255

IN: 482

OUT: 482

I_Tuple: 737

NodeType: MASS

NodeTest: name

X: //

Count: 482

IN: 482

OUT: 482

I_Tuple: 482

what about binary operator
What about Binary Operator
  • Cost expression sides w.r.t. to child.
  • Operator = AND | OR | EQ.
    • ALL go out.
  • Arithmetic Operators.
    • ALL go out.
    • Because cannot predict before execution.
heuristics
Heuristics
  • Higher the ratio, better the selectivity.
  • Generate a multimap <scaled(IN/OUT),VNode>.
  • Each optimize-able node can then applied the rules that apply to it.

Ratio = IN/OUT

Scaled Ratio = scale0..1 (IN/OUT)

transformation rule 1

“name”

X: //

“person”

X: AXIS_PARENT

“name”

X: //

“name”

X:AXIS_PARENT

“Klemens Pelz”

X: AXIS_VALUE

BI_PREDICATE

EQ

“ ”

X: AXIS_CHILD

“Klemens Pelz”

“Klemens Pelz”

Transformation Rule 1:

Binary Predicate with text comparison

 Value Index

transformation rule 2

“name”

X: //

“watch”

X: AXIS_DESCENDANT

“person”

X: AXIS_PARENT

“name”

X: //

“person”

X: AXIS_PARENT

“watch”

X: AXIS_DESCENDANT

JOIN

X: AXIS_DESCENDANT

Transformation Rule 2

//name/parent::person/descendant::watch

  • Mass Node to Join

Root Node

removal
* Removal

Rule:

p/descendant :: */child::n ≡ p/descendant::n

Where,

p : path expression

  • Need for this rule:
    • with nodes "*" as node test, during the cost estimation this might be the spoilsport.
axis self removal
“Axis::self” Removal

Rule:

p/descendant::*/self::m ≡ p/descendent::m

Rule:

p/descendant-or-self::*/self::m ≡ p/descendent-or-self::m

  • Need for the node:
    • “self” node in combination with * or a node test not necessary.
reverse axes rules
Reverse Axes rules
  • Rule : p/descendant::n/parent::m

≡ //descendant-or-self::m[child::n]

  • Rule: p/descendant::n/m ≡ p/descendant::m[parent::n]
  • Rule: /descendant::m/preceding::n

≡ /descendant::n [ following::m]

From Paper: Symmetry in XPath by Dan Olteanu, Holger Meuss, Tim Furche, Francois Br

predicate axis rules
Predicate Axis Rules
  • Rule:

p/descendant::* [child::n] ≡ p [descendant::n] / descendant:: *

  • Predicate Node to Join.
conclusion
Conclusion
  • Work in progress in THREE main areas.
    • Frame work for XPath expression execution.
    • Selectivity Determination.
    • Transformation Rules.
references
References

1. James Clark and Steve DeRose. XML Path Language (XPATH), http://www.w3.org/TR/xpath, 2002.

2. S.Boag, D.Chamberlin, Mary F. Fernandez, D.Florescu, J.Robie and J.Siméon,

XQuery 1.0: An XML Query Language. W3C Working Draft, http://www.w3.org/TR/xquery/, 2002.

3. Kurt W. Deschler and Elke Rundensteiner. MASS- Multi Axis Storage Structure, 2002, Technical Report in progress\.

4. T. Milo and D. Suciu. Index structure for path expression, In Proceedings of 7th International Conference on Database Theory, 1999, pages 277-295.

5. Flavio Rizzolo, Alberto Mendelzon. Indexing XML Data with ToXin},WebDB, pages 49-54, Santa Barbara, USA, 2001.

6. Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions, Proceedings of 27th International Conference on Very Large Database (VLDB\'2001), Rome, Italy, September 2001, pages 361-370.

7. XMark - The XML Benchmark project. http://monetdb.cwi.nl/xml/.

ad