Vamana talk 2 v m n
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

VAMANA (Talk 2) ( vǎ - mǎ - nǎ ) PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

VAMANA (Talk 2) ( vǎ - mǎ - nǎ ). An Efficient XPath Query Engine Exploiting the MASS Index. Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003. Introduction. Purpose of the talk. Generation of Execution Tree Execution Running Example 1. Running Example 2.

Download Presentation

VAMANA (Talk 2) ( vǎ - mǎ - nǎ )

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Vamana talk 2 v m n

VAMANA (Talk 2)(vǎ - mǎ - nǎ)

An Efficient XPath Query Engine Exploiting the MASS Index

Venkatesh Raghavan & Prof. Elke Rundensteiner

DSRG Talk

1ST May 2003


Introduction

Introduction

  • Purpose of the talk.

    • Generation of Execution Tree

    • Execution

      • Running Example 1.

      • Running Example 2.

    • XPath Expression Execution.

    • Cost Estimation.

    • Heuristics and Transformation.


Running examples

Running Examples

E.g. 1: //name/parent::person/descendant::watch

<people>

<person id="person1">

<name> Hayato Cappelletti </name>

<watches> 

<watch open_auction="open_auction82" />

E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person

<people>

<person id="person1">

<name> Klemens Pelz </name>


Bigger picture

XPath Expression

Node Set

Execution Tree

Mass Interface

Node Set

Bigger Picture

XQuery Engine

(future development)

XPath

Processor

VAMANA

(XPath Query Engine)

MASS

(A Multi-Axis Storage Structure

for Large XML Documents)


How many root s are there

How many “ROOT(s)” are there?

  • Root of the Document

    • We call it “Document Root”

  • Root of the expression

    • //name/parent::person/descendant::watch

  • We call it “First Location Step”

  • Root of Execution Tree

    • We call it “ROOT”


  • Xpath processor

    XPath Expression

    XPath

    Processor

    Execution Tree

    XPath Processor

    E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person

    person

    Parent

    ROOT

    Phase 1: Parse Tree

    name

    //

    CONTEXT

    “Klemens Plez”

    LITERAL

    OPERAND

    BIPRED

    =

    PRED

    text

    child

    OPERAND


    Contd

    XPath Expression

    XPath

    Processor

    Execution Tree

    Phase II: Transformed Parse Tree

    “Klemens Plez”

    LITERAL

    OPERAND

    BIPRED

    =

    PRED

    text

    child

    OPERAND

    “Klemens Plez”

    LITERAL

    OPERAND

    BIPRED

    =

    PRED

    text

    child

    OPERAND

    Contd..

    person

    Parent

    ROOT

    name

    //

    CONTEXT

    Phase I: Parse Tree


    Phase iii execution tree generation

    XPath Expression

    XPath

    Processor

    Execution Tree

    “person”

    X: Parent

    “name”

    X: //

    “”

    X: child

    “Klemens Plez”

    BI_PREDICATE

    “EQ”

    Phase III: Execution Tree Generation

    person

    Parent

    ROOT

    “Klemens Plez”

    LITERAL

    OPERAND

    BIPRED

    =

    PRED

    text

    child

    OPERAND

    name

    //

    CONTEXT

    Phase II: Transformed Parse Tree

    Phase III: VAMANA Execution Tree


    Vamana nodes vnode

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    VAMANA Nodes (VNode)

    VRootNode

    MassNode

    Node Base

    VBinaryPredicateNode

    VExistPredicateNode

    VJoinNode

    VLiteralNode


    Vnode structure

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    VNode Structure

    Root Node

    Expression Side

    child

    Context Side


    Vnode flow structure

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    VNode Flow Structure

    • Data-Flow style of querying.

      • Most of commercial relational database system.

    • Each node is arranged in a fashion such that data“flow” from one node to another in a procedure-consumer fashion.

      • Correctness.

      • Each node performs some operation on the data that flows through it.

      • The result is produced by the last node on the dataflow chain.

    • IN SHORT:

      • Data Flows upwards.

      • Control Flows downwards.

    • Iterative.


    Contd1

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    Contd.

    • Iterative.

      • Currently VAMANA executes nodes iteratively.

      • So no copies of the data is made.

    • IS IT A PROBLEM?

      • MASS produces nodes in document order so not a problem.

      • But there are some expression that in sibling order.

        • Work in progress.


    Execution tree

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    “name”

    X: //

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    Execution Tree

    E.g. 1: //name/parent::person/descendant::watch

    Root Node

    Context Side


    How do we execute

    Execution Tree

    VAMANA

    (XPath Query Engine)

    Mass Interface

    Node Set

    MASS

    How Do We EXECUTE ?

    • Step 1:

      • Set Context Node of the root of the expression.

        • In this example the root of the expression is the root of the document.

    • Step 2:

      • Ask the VAMANA Root Node for nodes.

    //name/parent::person/descendant::watch


    Step1 setting context for the first location step

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    Step1:Setting Context for the “First Location Step”

    //name/parent::person/descendant::watch


    Vamana talk 2 v m n

    b.i.c.m.c

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    b.i.c

    b.i.c.c

    INTIAL

    FETCHING

    b.i.c.c

    OUT OF NODE

    //name/parent::person/descendant::watch

    b.i.c.m.c

    b.i.c


    Vamana talk 2 v m n

    b.i.c.m.e

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    b.i.c

    b.i.c.c

    //name/parent::person/descendant::watch

    b.i.c.m.c

    b.i.c.m.c

    b.i.c.m.e

    b.i.c

    b.i.c.c


    Vamana talk 2 v m n

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    b.i.c

    b.i.i

    b.i.c.c

    b.i.i.c

    //name/parent::person/descendant::watch

    b.i.c.m.e

    b.i.c.m.e

    b.i.i.m.c

    b.i.i

    b.i.i.c


    Io operation

    IO Operation

    ** Please see handout

    a.a.a , a.b.a, a.b.b , a.c.a , a.c.a, a.c.b

    /z

    a.a , a.b , a.c

    //y


    Example 2

    “name”

    X: //

    “person”

    X: AXIS_PARENT

    “ ”

    X: AXIS_CHILD

    “Klemens Pelz”

    BI_PREDICATE

    EQ

    Example 2

    //name [ text() = “Klemens Pelz” ]/parent::person

    Context Side

    Expression Side


    Vamana talk 2 v m n

    b.i.e.c

    “person”

    X: AXIS_PARENT

    BI_PREDICATE

    EQ

    “name”

    X: //

    “ ”

    X: AXIS_CHILD

    “Klemens Pelz”

    b.i.e.c

    b.i.e.c

    b.i.e.c

    b.i.e.c

    //name [ text() = “Klemens Pelz” ]/parent::person

    b.i.e

    b.i.e.c.b

    Klemens Pelz


    Determining selectivity

    Determining Selectivity

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    I_Tuples:

    • Count.

      • The exact count of the number of nodes in MASS storage structure of that particular nodetest.

    • IN.

      • The number of tuples that are fetched by the child VNode.

    • OUT.

      • The number of tuples produced by the VNode.

    • I_Tuples.

      • Total number of tuples processed till that VNode.

      • This includes the cutrrent node also.


    Example 1 name parent person emailaddress

    NodeType: MASS

    NodeTest: person

    X: AXIS_PARENT

    Count: 255

    IN: 482

    OUT: ?

    Example 1: //name/parent::person/emailaddress

    NodeType: MASS

    NodeTest: name

    X: //

    Count: 482

    IN: 482

    OUT: 482


    Worst case costing

    Worst Case – Costing

    • Categorize the axis into three division

    • Division 1:

      • child | descendant | descendant-or-self

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    • Cases:

    • #X > #Y

    • #Y > #X

    X

    #X

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    Y


    Contd2

    Contd.

    • Division 2:

      • parent, ancestor, ancestor-or-self, following, following-sibling, preceding, preceding-sibling

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    • Cases:

    • #X > #Y

    • #Y > #X

    X

    #Y

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    Y


    Contd3

    Contd.

    • Division 3:

      • Self

    • For Example:

      • //*/self::X

      • Y/self::*

    • Cases:

    • #X > #Y  #Y

    • #Y > #X  #X

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    X

    NodeType:

    NodeTest:

    X:

    Count:

    IN:

    OUT:

    Y


    Vamana talk 2 v m n

    NodeType: MASS

    NodeTest: watch

    X: AXIS_DESCENDANT

    Count: 488

    IN: 482

    OUT: 488

    I_Tuple: 1225

    NodeType: MASS

    NodeTest: person

    X: AXIS_PARENT

    Count: 255

    IN: 482

    OUT: 482

    I_Tuple: 737

    NodeType: MASS

    NodeTest: name

    X: //

    Count: 482

    IN: 482

    OUT: 482

    I_Tuple: 482


    What about binary operator

    What about Binary Operator

    • Cost expression sides w.r.t. to child.

    • Operator = AND | OR | EQ.

      • ALL go out.

    • Arithmetic Operators.

      • ALL go out.

      • Because cannot predict before execution.


    Contd4

    Contd.


    Heuristics

    Heuristics

    • Higher the ratio, better the selectivity.

    • Generate a multimap <scaled(IN/OUT),VNode>.

    • Each optimize-able node can then applied the rules that apply to it.

    Ratio = IN/OUT

    Scaled Ratio = scale0..1 (IN/OUT)


    Transformation rule 1

    “name”

    X: //

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    “name”

    X:AXIS_PARENT

    “Klemens Pelz”

    X: AXIS_VALUE

    BI_PREDICATE

    EQ

    “ ”

    X: AXIS_CHILD

    “Klemens Pelz”

    “Klemens Pelz”

    Transformation Rule 1:

    Binary Predicate with text comparison

     Value Index


    Transformation rule 2

    “name”

    X: //

    “watch”

    X: AXIS_DESCENDANT

    “person”

    X: AXIS_PARENT

    “name”

    X: //

    “person”

    X: AXIS_PARENT

    “watch”

    X: AXIS_DESCENDANT

    JOIN

    X: AXIS_DESCENDANT

    Transformation Rule 2

    //name/parent::person/descendant::watch

    • Mass Node to Join

    Root Node


    Removal

    * Removal

    Rule:

    p/descendant :: */child::n ≡ p/descendant::n

    Where,

    p : path expression

    • Need for this rule:

      • with nodes "*" as node test, during the cost estimation this might be the spoilsport.


    Axis self removal

    “Axis::self” Removal

    Rule:

    p/descendant::*/self::m ≡ p/descendent::m

    Rule:

    p/descendant-or-self::*/self::m ≡ p/descendent-or-self::m

    • Need for the node:

      • “self” node in combination with * or a node test not necessary.


    Reverse axes rules

    Reverse Axes rules

    • Rule : p/descendant::n/parent::m

      ≡ //descendant-or-self::m[child::n]

    • Rule: p/descendant::n/m ≡ p/descendant::m[parent::n]

    • Rule: /descendant::m/preceding::n

      ≡ /descendant::n [ following::m]

      From Paper: Symmetry in XPath by Dan Olteanu, Holger Meuss, Tim Furche, Francois Br


    Predicate axis rules

    Predicate Axis Rules

    • Rule:

      p/descendant::* [child::n] ≡ p [descendant::n] / descendant:: *

    • Predicate Node to Join.


    Conclusion

    Conclusion

    • Work in progress in THREE main areas.

      • Frame work for XPath expression execution.

      • Selectivity Determination.

      • Transformation Rules.


    References

    References

    1. James Clark and Steve DeRose. XML Path Language (XPATH), http://www.w3.org/TR/xpath, 2002.

    2. S.Boag, D.Chamberlin, Mary F. Fernandez, D.Florescu, J.Robie and J.Siméon,

    XQuery 1.0: An XML Query Language. W3C Working Draft, http://www.w3.org/TR/xquery/, 2002.

    3. Kurt W. Deschler and Elke Rundensteiner. MASS- Multi Axis Storage Structure, 2002, Technical Report in progress\.

    4. T. Milo and D. Suciu. Index structure for path expression, In Proceedings of 7th International Conference on Database Theory, 1999, pages 277-295.

    5. Flavio Rizzolo, Alberto Mendelzon. Indexing XML Data with ToXin},WebDB, pages 49-54, Santa Barbara, USA, 2001.

    6. Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions, Proceedings of 27th International Conference on Very Large Database (VLDB'2001), Rome, Italy, September 2001, pages 361-370.

    7. XMark - The XML Benchmark project. http://monetdb.cwi.nl/xml/.


  • Login