Beyond the rainbow a pot of gold ala xml database projects
Download
1 / 46

Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects - PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on

Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects. WPI DSRG GROUP. Motivation. XML is new, and here to stay … Universal flexible representation of data De facto standard for information exchange XQuery is useful, and here to stay… Powerful query language for XML

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects' - sylvester-dillard


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Beyond the rainbow a pot of gold ala xml database projects

Beyond the Rainbow:—— A Pot of Gold ala XML Database Projects

WPI DSRG GROUP

DSRG, Worcester Polytechnic Institute


Motivation
Motivation

  • XML is new, and here to stay …

    • Universal flexible representation of data

    • De facto standard for information exchange

  • XQuery is useful, and here to stay…

    • Powerful query language for XML

    • De facto standard for XML querying

  • Plentitude of relevant new issues …

DSRG, Worcester Polytechnic Institute


Xml paradigm

  • searching

  • querying

  • integrating

  • restructuring

  • updating

XML

Queries

And

Updates

XML Paradigm

EVE-Middleware

Internet

WWW:

global scale

distributed

information system

for sharing data

XML1

XML5

RDBL6

XML2

XML3

RDB4

XMLn

DSRG, Worcester Polytechnic Institute


What we aim for

XML Data Management

Middleware Technology

What We Aim For…

  • efficient

  • flexible

  • scalable

  • lightweight

  • resource-sensitive

  • adaptive

Internet

EVE-Middleware

RDB5

XML1

XML6

XML2

RDB3

XML4

XMLn

DSRG, Worcester Polytechnic Institute


Wpi project directions
WPI Project Directions

  • RAINBOW: Exploiting RDB for XML management:

    • Algebraic-XQuery processing

  • XCube: Flexible XML Mapping Tool:

    • Flexible loading/extracting XML to RDB via XQuery

  • Updating Virtual XML Views:

    • Update decomposition and trigger-propagation

  • MASS: Native XML Query Engine:

    • Multi-axis compressed order-preserving XML storage

DSRG, Worcester Polytechnic Institute


Wpi project directions1
WPI Project Directions

  • XCache: XML Query Caching:

    • Cache containment and query rewriting

  • Materialized XML View Maintenance:

    • Incremental algebraic maintenance strategy

  • SAXE: XML Incremental Updating & Evolution:

    • Lightweight updating by update query rewriting

  • RAINDROP: XQuery-based Stream Processing:

    • Adaptive on-fly multi-subscription optimization

DSRG, Worcester Polytechnic Institute


The rainbow project

THE RAINBOW PROJECT

DSRG, Worcester Polytechnic Institute


Xml meets relational dbs

The look and feel of an XML query system

with maturity and technology support of RDB

XML

Relational Database

  • Emerging web standard

  • Flexible data representation

  • Powerful query language

1) Widely used to store business data

2) Efficient, reliable, secure DBMS

3) Mature query processing techniques

XML meets Relational DBs

Rainbow

+

DSRG, Worcester Polytechnic Institute


Running example

<results>

<title>TCP/IP Illustrated</title>

<title>Data on the Web</title>

</results>

<dxv>

<book>

<row>

<bid>001</bid>

<title>TCP/IP Illustrated</title>

</row>

<row>

<bid>002</bid>

<title>Data on the Web</title>

</row>

</book>

<prices>

<row>

<bid>001</bid>

<price>65.95</price>

</row>

<row>

<bid>002</bid>

<price>34.95</price>

</row>

</prices>

</dxv>

<prices>

FOR $book IN document(“dxv.xml”)/book/row

$prices IN document(“dxv.xml”)/prices/row

WHERE $book/bid = $prices/bid

RETURN

<book>

$book/title,

$prices/price

</book>

</prices>

Bid

Title

Bid

Price

001

TCP/IP Illustrated

001

65.95

002

Data on the Web

002

34.95

Running Example

<result>

FOR $t IN

document(“prices.xml”)/book/title

RETURN

$t

</result>

<prices>

<book>

<title>TCP/IP Illustrated</title>

<price>65.95</price>

</book>

<book>

<title>Data on the Web</title>

<price>34.95</price>

</book>

</prices>

DSRG, Worcester Polytechnic Institute


Xml default view
XML Default View

XML Default View

  • Fixed and straight-forward mapping scheme.

<DB>

<BOOKS>

<ROW>

<Cover>Paperback</Cover>

<TITLE>Texas Holdem'</TITLE>

<AUTHORS>David Sklansky, Straight Flush</AUTHORS>

</ROW>

<ROW>

<Cover>Paperback</Cover>

<TITLE>Dracula</TITLE>

<AUTHORS>Bram Stoker</AUTHORS>

</ROW>

</BOOKS>

<…>

</DB>

DSRG, Worcester Polytechnic Institute


Generic loading
Generic Loading

  • Knowledge of schema of XML document to be loaded helps to reduce unnecessary parts.

FUNCTION Q1($root){

LET $maintag := gettag($root)

RETURN

<$maintag $root/@*>

FOR $actual IN $root/*

LET $innertag := gettag($actual)

RETURN

IF ($actual/element())

THEN

Q1($actual)

ELSE

<$innertag $actual/@*>

IF ($actual/text())

THEN

<PCDATA value=$actual/text()/>

ELSE

""

</$innertag>

</$maintag>

}

DSRG, Worcester Polytechnic Institute


Instantiation
Instantiation

  • Generic loading XQuery expression recursive.

  • + It works for every XML document.

  • Many recursive calls return no value.- Unnecessary FOR-loops, IF-clauses, and getName()-fct.

XML Schema

XML

Schema

XQuery

Expression

(flat)

XQuery

Expression

Instantiator

XQuery

Expression

(recursive)

XQuery

Expression

DSRG, Worcester Polytechnic Institute


Instantiation example
Instantiation (Example)

(First Step of CLOCK mapping scheme)

Instantiated Loading Query

  • Short, non-recursive, more efficient …

  • But: XML schema dependent!

FUNCTION Q1($root){

<BOOKLIST>

FOR $book IN $root/BOOK

RETURN

<BOOK$book/@cover>

<TITLE>

<PCDATAvalue=$book/TITLE/text()/>

</TITLE>

<AUTHOR>

FOR $name IN $book/AUTHOR/NAME

RETURN

<NAME>

<PCDATAvalue=$book/A…/>

</NAME>

</AUTHOR>

</BOOK>

</BOOKLIST>

}

DSRG, Worcester Polytechnic Institute


Flexible mapping management

1

XML

XML’

2

H

F

G

g

Relation

Relation’

f

RDB

Default

XQuery

(Load)

XML

XML

View

Reverser

RDB

Default

XQuery

(Extract)

XML

XML

View

Flexible Mapping Management

DSRG, Worcester Polytechnic Institute


Xcube in a nutshell
XCube in a Nutshell

  • Easy-to-use (no new transformation language).

  • Flexible (interchangeable XQuery expressions).

  • Adaptable (to workload, data specifics, …).

  • General (Schema independent).

  • Extendable (with new mapping schemes).

  • Tunable (Loading manager).

  • Generic XQuery loading expressions

  • XQuery load expression instantiation

DSRG, Worcester Polytechnic Institute


Architecture

XML Document

Virtual XML Document

Virtual XML Document

Virtual XML Document

Virtual XML Document

Virtual XML Document

Architecture

Rainbow

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User Query Results in XML

User XQuery

XAT

Executor

XAT

Optimizer

SQL

Generator

XAT

XAT

XAT

Tuples

XAT

View XAT

View XAT

View XQuery

SQL

RDBMS

XAT: XML Algebra Tree

DSRG, Worcester Polytechnic Institute


Xquery level optimization
XQuery-Level Optimization

  • XAT - XML Algebra Tree Model

  • XAT Algebraic Query Plan Optimization

  • XAT Query Plan Reduction

DSRG, Worcester Polytechnic Institute


User xml algebra tree xat

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

User XML Algebra Tree (XAT)

User Query

col3

1:

<result>

FOR $t IN

document(“prices.xml”)/book/title

RETURN

$t

</result>

2:

T<results>$t</result>col3

Agg

3:

R0, book/title$t

6:

S”prices.xml”R0

7:

DSRG, Worcester Polytechnic Institute


View xml algebra tree xat

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

View XML Algebra Tree (XAT)

View Query

<prices>

FOR $book IN document(“dxv.xml”)/book/row

$prices IN document(“dxv.xml”)/prices/row

WHERE $book/bid = $prices/bid

RETURN

<book>

$book/title,

$prices/price

</book>

</prices>

col6=col7

26:

$book, bidcol6

27:

$prices, bidcol7

28:

31:

T<book> [col10][col12] </book>col5

22:

23:

T<prices>col5</prices>col4

11:

$book, titlecol10

R3, /prices/row$prices

R1, /book/row$book

20:

14:

Agg

12:

$prices, pricecol12

25:

15:

S“dxv.xml” R3

S“dxv.xml” R1

21:

DSRG, Worcester Polytechnic Institute


Merged xml algebra tree xat

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

Merged XML Algebra Tree (XAT)

User Query

View Query

col6=col7

col3

26:

1:

2:

T<prices>col5</prices>col4

$book, bidcol6

T<results>$t</result>col3

11:

27:

Agg

3:

Agg

$prices, bidcol7

12:

28:

R0, book/title$t

T<book> [col10][col12] </book>col5

6:

31:

22:

col4 R0

23:

7:

$book, titlecol10

R3, /prices/row$prices

R1, /book/row$book

20:

14:

$prices, pricecol12

25:

15:

S“dxv.xml” R3

S“dxv.xml” R1

21:

DSRG, Worcester Polytechnic Institute


Xquery level optimization1
XQuery-Level Optimization

  • XML Algebra Representation: XAT

  • XAT Query Plan Rewriting

  • XAT Query Plan Reduction

DSRG, Worcester Polytechnic Institute


Xat rewrite

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

XAT Rewrite

  • Query Optimization at Logic Algebra Level.

  • Goals:

    • Redundancy Elimination.

    • Computation Pushdown.

  • Technique:

    • Equivalence Rewrite Rules.

    • Heuristics:

      • Pushdown Navigates

      • Remove Construction of Intermediate Result

      • Combine Multiple Operators.

DSRG, Worcester Polytechnic Institute


Before navigation pushdown
Before Navigation Pushdown

User Query

View Query

col6=col7

col3

26:

1:

$book, bidcol6

2:

T<results>$t</result>col3

27:

Agg

3:

$prices, bidcol7

28:

T<book> [col10][col12] </book>col5

R0, book/title$t

6:

31:

22:

col4 R0

23:

7:

$book, titlecol10

R3, /prices/row$prices

R1, /book/row$book

20:

T<prices>col5</prices>col4

14:

11:

$prices, pricecol12

25:

15:

Agg

S“dxv.xml” R3

S“dxv.xml” R1

12:

21:

DSRG, Worcester Polytechnic Institute


After navigation pushdown

R3, /prices/row$prices

20:

14:

R1, /book/row$book

S“dxv.xml” R1

S“dxv.xml” R3

15:

21:

After Navigation Pushdown

User Query

View Query

T<book> [col10][col12] </book>col5

22:

col3

1:

col6=col7

26:

2:

T<results>$t</result>col3

31:

Agg

3:

$prices, pricecol12

$book, titlecol10

23:

25:

R0, book/title$t

6:

$prices, bidcol7

$book, bidcol6

28:

27:

T<prices>col5</prices>R0

11:

Agg

12:

DSRG, Worcester Polytechnic Institute


Remove any taggers

R3, /prices/row$prices

20:

14:

R1, /book/row$book

S“dxv.xml” R1

S“dxv.xml” R3

15:

21:

Remove any Taggers?

User Query

View Query

T<book> [col10][col12] </book>col5

22:

col3

1:

col6=col7

26:

2:

T<results>$t</result>col3

31:

Agg

3:

$prices, pricecol12

$book, titlecol10

23:

25:

R0, book/title$t

6:

$prices, bidcol7

$book, bidcol6

28:

27:

T<prices>col5</prices>R0

11:

Agg

12:

DSRG, Worcester Polytechnic Institute


After tagger cancel out

$book, bidcol6

col3

1:

27:

$prices, pricecol12

25:

T<results>$t</result>col3

2:

$prices, bidcol7

28:

Agg

3:

R3, /prices/row$prices

20:

14:

R1, /book/row$book

S“dxv.xml” R1

S“dxv.xml” R3

15:

21:

After Tagger Cancel Out

User Query

View Query

col6=col7

26:

31:

$book, title$t

23:

DSRG, Worcester Polytechnic Institute


After making join

$book, bidcol6

col3

1:

27:

T<results>$t</result>col3

2:

$book, title$t

23:

Agg

3:

14:

R1, /book/row$book

S“dxv.xml” R1

15:

After Making Join

User Query

View Query

JOINcol6=col7

31:

$prices, pricecol12

25:

$prices, bidcol7

28:

R3, /prices/row$prices

20:

S“dxv.xml” R3

21:

DSRG, Worcester Polytechnic Institute


Xquery level optimization2
XQuery-Level Optimization

  • XML Algebra Representation: XAT

  • XAT Query Plan Rewriting

  • XAT Query Plan Reduction

DSRG, Worcester Polytechnic Institute


Xat cleanup

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

XAT Cleanup

  • Why:

    • SQL engine cannot reduce redundancy in XQuery.

  • How:

    • Data Redundancy by Schema Cleanup

      • Each operator produced, consumed and modified some columns.

      • Minimum schema is then computed.

    • Tree Redundancy by Unused Operator Cutting

      • Cutting matrix generation.

      • Required columns analysis.

      • Operator cutting.

DSRG, Worcester Polytechnic Institute


Xat operator properties
XAT Operator Properties

  • Produced

    • Desc: New column generated by operator.

    • Example: , S, T

  • Consumed

    • Desc: Columns required by operator.

    • Example: , 

  • Modified

    • Desc: Columns modified by operator.

    • Example: , , 

DSRG, Worcester Polytechnic Institute


Schema computation
Schema Computation

Node

Parent

Produced

Consumed

Old Schema

1

{}

{col3}

{col3}

1:

col3

2

1

{col3}

{$t}

{col3, R1, $book, col6, $t, R3, $prices, col7, col12}

T<results>$t</result>col3

2:

3

2

{}

{}

{R1, $book, col6, $t, R3, $prices, col7, col12}

Agg

3:

31

3

{}

{col6, col7}

{R1, $book, col6, $t, R3, $prices, col7, col12}

col6=col7

31:

23

31

{$t}

{$book}

{R1, $book, col6, $t}

25:

$prices, pricecol12

$book, title$t

23:

27

23

{col6}

{$book}

{R1, $book, col6}

14

27

{$book}

{R1}

{R1, $book}

28:

$prices, bidcol7

$book, bidcol6

27:

15

14

{R1}

{}

{R1}

20:

25

31

{col12}

{$prices}

{R3, $prices, col7, col12}

R3, /prices/row$prices

R1, /book/row$book

14:

28

25

{col7}

{$prices}

{R3, $prices, col7}

S“dxv.xml” R3

21:

20

28

{$prices}

{R3}

{R3, $prices}

S“dxv.xml” R1

15:

21

20

{R3}

{}

{R3}

DSRG, Worcester Polytechnic Institute


Schema computation1

Intuition: Don’t keep anything that’s not used later.

Schema Computation

#

Parent()

col3

$t

col6

col7

$book

R1

col12

$prices

R3

New Schema

1:

1

C

{col3}

col3

2

1

P

C

{col3}

T<results>$t</result>col3

2:

3

2

{$t}

31*

3

C

C

{$t}

Agg

3:

23

31

P

C

{col6, $t}

27

23

P

C

{$book, col6}

col6=col7

31:

14

27

P

C

{$book}

25:

$prices, pricecol12

$book, title$t

23:

15

14

P

{R1}

25

31

P

C

{col7, col12}

28:

$prices, bidcol7

$book, bidcol6

27:

28

25

P

C

{$prices, col7}

20:

R3, /prices/row$prices

R1, /book/row$book

20

28

P

C

{$prices}

14:

21

20

P

{R3}

S“dxv.xml” R3

21:

S“dxv.xml” R1

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

15:

DSRG, Worcester Polytechnic Institute


Schema cleanup result
Schema Cleanup Result

DSRG, Worcester Polytechnic Institute


Xat cleanup1
XAT Cleanup

  • Schema Cleanup

    • Each operator produced, consumed and modified some columns.

    • Minimum schema is then computed.

  • Unused Operator Cutting

    • Cutting matrix generation.

    • Required columns analysis.

    • Operator cutting.

DSRG, Worcester Polytechnic Institute


Cutting matrix
Cutting Matrix

  • Purpose:

    • Get rid of unused operators.

  • Equations:

    • Propagation of modified

    • Propagation of required

  • Identify cuttable node.

DSRG, Worcester Polytechnic Institute


Matrix computation
Matrix Computation

1:

col3

T<results>$t</result>col3

2:

Agg

3:

JOINcol6=col7

31:

25:

$prices, pricecol12

$book, title$t

23:

28:

$prices, bidcol7

$book, bidcol6

27:

20:

R3, /prices/row$prices

R1, /book/row$book

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

14:

S“dxv.xml” R3

21:

S“dxv.xml” R1

15:

DSRG, Worcester Polytechnic Institute


Matrix computation cont 1

Intuition: Give me only the required columns in order to get the final result.

Matrix Computation (Cont.1)

1:

col3

#

Parent()

col3

$t

col6

col7

$book

R1

col12

$prices

R3

Cut?

T<results>$t</result>col3

2:

1

R

R

R

R

2

1

P

C

3

2

-

M

-

-

-

-

-

-

-

Agg

3:

31*

3

C

C

23

31

P

C

JOINcol6=col7

31:

27

23

P

C

14

27

P

C

25:

$prices, pricecol12

$book, title$t

23:

15

14

P

25

31

P

C

28:

$prices, bidcol7

28

25

P

C

$book, bidcol6

27:

20

28

P

C

20:

21

20

P

R3, /prices/row$prices

R1, /book/row$book

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

14:

S“dxv.xml” R3

21:

S“dxv.xml” R1

15:

DSRG, Worcester Polytechnic Institute


Matrix computation cont 2
Matrix Computation (Cont. 2) the final result.

1:

col3

T<results>$t</result>col3

2:

Agg

3:

JOINcol6=col7

31:

25:

$prices, pricecol12

$book, title$t

23:

28:

$prices, bidcol7

$book, bidcol6

27:

20:

R3, /prices/row$prices

R1, /book/row$book

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

14:

S“dxv.xml” R3

21:

S“dxv.xml” R1

15:

DSRG, Worcester Polytechnic Institute


Xat after cutting
XAT after Cutting the final result.

1:

col3

col3

1:

T<results>$t</result>col3

2:

Agg

T<results>$t</result>col3

3:

2:

Agg

JOINcol6=col7

31:

3:

Reduced To

25:

$prices, pricecol12

$book, title$t

$book, title$t

23:

23:

28:

R1, /book/row$book

$prices, bidcol7

14:

$book, bidcol6

27:

20:

S“dxv.xml” R1

R3, /prices/row$prices

15:

R1, /book/row$book

14:

S“dxv.xml” R3

21:

S“dxv.xml” R1

15:

DSRG, Worcester Polytechnic Institute


Sql generated

XAT Generator the final result.

XAT Decorrelator

XAT Merger

User XAT

User XAT

User XQuery

XAT

Executor

XAT

Optimizer

XAT

XAT

XAT

SQL

Generator

XAT

View XAT

View XAT

View XQuery

SQL Generated

SELECT “$book”.title as “$t”,

“$book”.bid as “col6”,

“$prices”.price as “col12”,

“$prices”.bid as “col7”

FROM book “$book”,

prices “$prices”

WHERE “col6”=“col7”

1:

col3

col3

1:

2:

T<results>$t</result>col3

T<results>$t</result>col3

2:

Agg

3:

Agg

3:

31:

JOINcol6=col7

$book, title$t

23:

25:

$prices, pricecol12

23:

$book, title$t

R1, /book/row$book

14:

28:

$prices, bidcol7

27:

$book, bidcol6

S“dxv.xml” R1

15:

20:

R3, /prices/row$prices

14:

R1, /book/row$book

SELECT “$book”.title as “$t”,

FROM book “$book”,

S“dxv.xml” R3

21:

15:

S“dxv.xml” R1

DSRG, Worcester Polytechnic Institute


Xquery level optimization3
XQuery-Level Optimization the final result.

  • XML Algebra Representation: XAT

  • XAT Query Plan Rewriting

  • XAT Query Plan Reduction

DSRG, Worcester Polytechnic Institute


Performance gain in execution
Performance Gain in Execution the final result.

DSRG, Worcester Polytechnic Institute


Rainbow engine overhead
Rainbow Engine Overhead the final result.

XAT

Optimizer

XAT Generator

XAT Decorrelator

XAT Merger

User XAT

User XAT

XAT

Rewrite

User XQuery

XAT

XAT

XAT

XAT

Executor

SQL

Generator

XAT

View XAT

View XAT

XAT

Cleanup

View XQuery

Total:

32,522 ms

Ack.: XQuery using Kweelt Parser

DSRG, Worcester Polytechnic Institute


Http davis wpi edu dsrg rainbow https sourceforge net projects rainbow engine

http://davis.wpi.edu/dsrg/rainbow the final result.

https://sourceforge.net/projects/rainbow-engine/

Rainbow

DSRG, Worcester Polytechnic Institute


Related work
Related Work the final result.

  • XPERANTO[VLDBJ2000]: XQGM vs. XAT

    • Xquery Views over RDB, Extension by UDFs for XML features

  • SilkRoute[IEEE2001(24:2)]:

    • Xquery Views over RDB, Generate SQL Efficiently

  • AGORA[VLDB2000]:

    • Syntax level rewriting.

DSRG, Worcester Polytechnic Institute


Summary
Summary the final result.

  • Efficient XQuery Processing

  • XML Algebra Tree (XAT)

  • XAT Optimization:

    • Rewrite by using equivalent rules

    • Cleanup

      • Schema cleanup

      • Operator cutting

  • Prototype system implementation.

DSRG, Worcester Polytechnic Institute


ad