Blas an efficient xpath processing system
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

BLAS: An Efficient XPath Processing System PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

BLAS: An Efficient XPath Processing System. Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu. Outline. Introduction BLAS System Experimental Results Conclusions. < ProteinDatabase > < ProteinEntry > < Protein >

Download Presentation

BLAS: An Efficient XPath Processing System

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Blas an efficient xpath processing system

BLAS: An Efficient XPath Processing System

Zhimin Song

Advanced Database System

Professor: Dr. Mengchi Liu


Outline

Outline

  • Introduction

  • BLAS System

  • Experimental Results

  • Conclusions


Blas an efficient xpath processing system

  • <ProteinDatabase>

  • <ProteinEntry>

  • <Protein>

  • <Name> cytochrome c [validated]</name>

  • <classification>

  • <superfamily>cytochrome c</superfamily>

  • </classification>…

  • </protein>

  • <reference>

  • <refinfo>

  • <authors>

  • <author>Evans, M.J.</author>…

  • </authors>

  • <year>2001</year>

  • <title> The human somatic cytochrome c gene </title> …

  • </refinfo>…

  • </reference>…

  • </ProteinEntry> …

  • </ProteinDatabase>

  • Figure 1 : Sample XML protein repository


Introduction

Introduction

  • XML has complex, tree-like structure(nodes).

  • Languages for Querying XML are based on path navigation(XPath [1]).

    Given node  Child node(Child axis)

    Given node  Descendant node(Descendant axis)


Introduction cont

Introduction(cont..)

  • Some techniques were already proposed in order to improve XPath Processing. For example, D-labeling which is used to efficiently handle descendant axis traversal.

  • What about complex queries including child axis, branch???

  • In this case P-labeling is proposed in this paper. It optimizes an important class of queries called suffix path queries.


Blas bi labeling based system

BLAS(Bi-LAbeling based System)

  • Basic definitions

  • The labeling scheme(Index generator)

  • Query translator


Blas an efficient xpath processing system

  • Basic definitions:

    • BLAS: a system for efficiently process complex queries based D-labeling and P-labeling.

    • The BLAS deals with a subset of XPath queires consisting of:

      • Child axis navigation ( / )

      • Descendant axis navigation ( // )

      • Branches ( […..] )

    • The evaluation of a path expression P( [P] ) returns the set of nodes in an XML tree T which are reachable by P starting from the root of T.

    • Since P can be evaluated to retrieve a set of XML nodes, we use “Path expression” and “query” interchangeably.

    • P Q if and only if [P] [Q].

    • P Q = if and only if [P] [Q] =


Blas an efficient xpath processing system

  • Basic definitions(cont..):

    • Suffix path expression: a path expression P which optionally begins with a descendant axis step(//), followed by zero or more child axis steps (/).

      • Example: //protein/name

      • Another one : /proteinDatabase/proteinEntry/protein/name

    • SP(n) : the unique simple path P from the root to the node n.

    • So evaluating a suffix path expression Q is to find all the nodes n such that SP(n) Q.


Architecture of blas

Subquery

Suffix Path Query

Subquery

Generator

(based on

P-labeling)

Query

XPath

Query

Query

decomposition

Subquery

composition

(based on

D-labeling)

Subquery

Suffix Path Query

Ancestor-descendant relationship between the results of the suffix path queries

Query Translator

Query

Engine

P-labeling

generator

P-labelings

SAX

Parser

XML

Events

Storage

Data values

Query result

Data loader

D-labeling

generator

D-labelings

Architecture of BLAS


Blas an efficient xpath processing system

  • The labeling scheme(Index generator)

    • D-labeling scheme: triplet <d1,d2,d3> for a XML node n(n.d1 <= n.d2) and m(m.d1<=m.d2).

      • m is a descendant of n if and only if n.d1<m.d1 and n.d2>m.d2.

      • m is a child of n if and only if m is a descendant of n and n.d3+1=m.d3.

      • Let d1 and d2 for a node n be the position of the start tag and end tag.

      • d3 is set to be the level of n in the XML tree which is the length of the path from the root to n.

         D-label will be represented as <start,end,level>


Blas an efficient xpath processing system

Query: //proteinDatabase//refinfo

First retrieve all the nodes reachable by refinfo and by proteinDatabase

Let pDB and refinfo be two relations which store these nodes, then D-join them

  • Example: using D-labeling

proteinDatabase

proteinEntry

protein

reference

superfamily

//

refinfo

“cytochrome c”

//

author

Title

year

Select pDB.start,pDB.end,refinfo.start,refinfo.end

From pDB, refinfo

Where pDB.start < refinfo.start and pDB.end > refinfo.end

“Evans, M.J.”

“2001”


Blas an efficient xpath processing system

  • P-labeling Scheme

    • It is also important to implement child axis navigation efficiently.

    • e.g. /proteinDatabase/proteinEntry/protein/name

    • Target: improve “/” evaluation

    • Focus on suffix path queries:

      e.g. //protein/name


Blas an efficient xpath processing system

  • Assign each node a number<p1>, and each suffix path an interval <p1,p2> such that:

  • For any two suffix paths Q1 and Q2, Q1 is contained in Q2 if

    Q1.p1<= Q2.p1 and Q1.p2>= Q2.p2

  • A node n is contained in the suffix path Q if

    Q.p1<= SP(n).p1 <=Q.p2.

  • Let Q be a suffix path query. Then

    [Q] = {n | Q.p1 <= n.plabel<=Q.p2} when n.plabel=SP(n).p1


Blas an efficient xpath processing system

  • P-labeling Construction(algorithm)

    • Suppose that there are n distinct tags (t1,t2,….,tn).

    • Assign “/” a ratio r0 and each tag ti a ratio ri such that

      r0+r1+r2+…….+ri = 1.

    • Let ri = 1/(n+1).

    • Define the domain of the numbers in a P-label to be integers in [0, m-1], here m is chosen such that

      m>= , where h is the longest path in an XML tree.

    • Algorithms as follows:

      • Path // is assigned an interval(P-label) of <o, m-1>.

      • Partition the interval <0, m-1> in tag order proportional to ti’s ratio ri, for each path //ti and child axis navigation’s ratio r0.

      • This means we allocate the interval<0, m*r0 -1> to “/” and <pi, pi+1> to each ti such that (pi+1 - pi)/m=ri and p1/m = r0


Blas an efficient xpath processing system

/protein/name

...

4.0301*1010

4.03*1010

4.04*1010

//proteinDatabase/name

//proteinEntry/name

//protein/name

/name

...

4.04*1010

5*1010

4*1010

4.01*1010

4.02*1010

4.03*1010

//protein

Database

//protein

Entry

//protein

//name

/

...

1012

0

1010

2*1010

3*1010

4*1010

5*1010

Query: //protein/name

M=1012

99 tags

Ri=0.01

  • P-labeling Construction(Example)


Blas an efficient xpath processing system

  • Query translator:translates an input XPath query into standard SQL.

    • Query decomposition

      • Splits the query in to a set of suffix path queries and records the ancestor-descendant relationship.

    • SQL generation

      • Computes the query’s p-labeling and generates a corresponding subquery in SQL.

    • SQL composition

      • The subqueries are combined into a single SQL query based on D-labeling and the ancestor-descendant relationship.


Blas an efficient xpath processing system

P//q  p and //q

Q1

  • Split algorithm:

    • D-elimination(query tree Q)

proteinDatabase

proteinEntry

Depth-first traversal

protein

reference

Split p//q into p and //q

Q2

Invokes the B-elimination if branches in Q. Otherwise, it evaluates Q using P-labels.

//

refinfo

superfamily

year

“cytochrome c”

Title

“2001”

Join intermediate results by their D-labels

//

Q3

author

“Evans, M.J.”


Blas an efficient xpath processing system

Q1

Q4

proteinDatabase

proteinDatabase

proteinEntry

proteinEntry

Q6

Q5

//

//

protein

reference

reference

protein

refinfo

refinfo

year

Title

year

Title

“2001”

“2001”

P[q1,q2….qi]/r  p, //q1, //q2,…..,//qi, //r

  • B-elimination(query tree Q1)


Blas an efficient xpath processing system

protein

B-elimination(cont..):

Q4

proteinDatabase

proteinEntry

Q7

//

Q5

//

reference

refinfo

Q8

Q9

//

//

year

Title

“2001”


Blas an efficient xpath processing system

Since p/qi and p/r are more specific than //qi and //r,

Then split P[q1,q2,….,qi]/r  p, p/q1, p/q2, …..p/qi, p/r

  • Push up algorithm: optimize the branch elimination (B-elimination).

proteinDatabase

Q4

proteinDatabase

proteinEntry

proteinEntry

proteinDatabase

reference

proteinEntry

refinfo

reference

Q5

proteinDatabase

refinfo

proteinDatabase

proteinEntry

year

reference

proteinEntry

“2001”

refinfo

protein

title


Blas an efficient xpath processing system

  • Unfold algorithm:A further optimization of descendant-axis elimination(D-elimination).

    There is example as follows:

    Q2=/ProteinDatabase/ProteinEntry/protein//superfamily=“cytochrome c”

    Q21 = /ProteinDatabase/ProteinEntry/protein/classification/

    superfamily=“cytochrome c” ,

P//q  p/r1/q, p/r2/q, ….., p/ri/q


Experimental results

Experimental Results

  • Data sets

  • Query sets

    • Suffix path queries

    • Path queries

    • XPath queries

  • Query Engine: RDBMS or File System


  • Query execution time

    Query Execution Time

    1: suffix path query

    2: path query

    3: XPath query

    A:Auction

    P: Protein

    S: Shakespeare

    Query time for Shakespeare, Protein and Auction data sets


    Scalability

    Scalability

    The performance of D-labeling, Split and Push up for the suffix path query


    Conclusion

    Conclusion

    • P-labeling scheme is proposed to evaluate suffix path queries efficiently.

    • BLAS combines P-labeling and D-labeling to evaluate XPath queries.

    • BLAS is more efficient because the queries translated from XPath queries require:

      • fewer disk accesses

      • fewer joins

    • Experiments show the effectiveness of BLAS


    Blas an efficient xpath processing system

    • [1]J. Clark and S. DeRose. XML Path language (XPath), November1999. http://www.w3.org/TR/xpath.

    • [13] D. DeHaan, D. Toman, M. Consens, and M. T. Ozsu. A

      comprehensive XQuery to SQL translation using dynamic intervalencoding. In Proceedings of SIGMOD, 2001.

    • [26] J.-K. Min, M.-J. Park, and C.-W. Chung. XPRESS: A queriablecompression for XML data. In Proceedings of SIGMOD, 2003.


    Thank you

    Thank you!

    Question ?


  • Login