Timber a native xml database
Download
1 / 41

TIMBER A Native XML Database - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

TIMBER A Native XML Database. The Overview of the TIMBER System in University of Michigan. Xiali He. Outline. Introduction Motivations and Related Work System Architecture Tree Algebra Query Evaluation Query Optimization Updates Issue. Introduction. Why Native XML Database?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' TIMBER A Native XML Database' - pandora-athans


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Timber a native xml database
TIMBERA Native XML Database

The Overview of the TIMBER System in University of Michigan

Xiali He


Outline
Outline

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue


Introduction
Introduction

  • Why Native XML Database?

    • Mapping between XML data and existing database has some problems due to the flexible nature of XML

      • Results in an unnormalized relational representation

      • Results in large number of tables

  • Challenges in TIMBER system:

    • Start from scratch

    • Retain XML data’s natural structures and flexibility and heterogeneity

    • Efficient processing on tree structures

    • Updates


  • Reuse the existing database technologies

    • Transaction Management Facilities

    • Declarative Querying

    • Set-at-a-time Processing

  • Redesign and tailor certain components for the XML domain

    • Bulk Algebra – TAX

    • Query Evaluation

    • Query Optimization


Outline1
Outline

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue


Motivations and related work
Motivations and Related Work

  • Mapping techniques between tree-based XML data to flat relational schema

    Problems:

    • XML has very rich tree structure.

    • Relational has rigid table structure.

    • A simple tree schema produces complex relational schema with many tables.

    • A simple XML query get translated into expensive sequences of joins in relational database.



Outline2
Outline

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue

  • System Study


System architecture
System Architecture

TIMBER- An efficient XML database engine

  • Data Storage

  • Index Storage

  • Metadata Storage

  • Query Processing


Data storage

System Architecture

Data Storage

  • Nodes in Timber System:

    • Node for each element

    • Child node for each sub-element

    • Child node for all attributes of an element

    • Child node for content of an element node

    • Child node for all processing instructions, comments.

      ( in future)

  • Node Identifier in Timber System:

    (S, E, L) – Start label, End Label, Level Label

  • Physical Storage Order:

    Sorted nodes by the value of start Labels.


Index storage

System Architecture

Index Storage

  • Indices in Timber System:

    • On attribute values

    • On element content

    • On tag name

  • Index structure return lists of

    (S, E, L) labels


Metadata storage

System Architecture

Metadata Storage

  • Use histograms for cost estimation

  • Timber is independent of XML schema

Query Processing


Outline3
Outline

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue

  • System Study


Tree algebra tax
Tree Algebra - TAX

Timber System develop a suite of operators suited to manipulating trees instead of tuples:

  • Selection

  • Projection

  • Ordering

  • Grouping

  • Product

  • Set Union

  • Set Difference

  • Renaming


Pattern tree
Pattern Tree

Tree Algebra - TAX

  • XML: Can not reference the component of the tree by position or name!

  • Solution: Pattern trees to specify homogeneous tuples of node binding. Witness tree is produced for each combination of node bindings that matches the pattern.

  • Pattern tree can bind as many variables as there are nodes in the pattern tree. While XPath binds only one variable.

Pattern Tree

Witness Tree



Selection

Tree Algebra - TAX another example

Selection

More than just filter!

Order is preserved!

C - Collection

P - pattern

SL – Selection List

(Lists nodes from P for which not just the nodes themselves,

but all descendants, are to be returned in the output)

Output: is the witness tree induced by some embedding of P into C, modified as possibly prescribed in SL.


Projection

Tree Algebra - TAX another example

Projection

C - Collection

P - pattern

PL – Projection List

(A list of node labels from P, possible with *)

Output: Could be zero, one or more output trees in a projection.


Example - Projection another example

$1.tag = faculty &

$2.tag = RA &

$3.tag = name

PL: $1, $3

$1

Pattern Tree

pc

pc

$2

$3

faculty

faculty

pc

pc

pc

RA

name

projection

name

TA

faculty

pc

no match

pc

projection

name

TA


Ordering

Tree Algebra - TAX another example

Ordering

Timber system specify pattern trees to be unordered except where ordering constraints are explicitly specified!


Grouping

Tree Algebra - TAX another example

Grouping

With the use of grouping, we can produce a simpler and mode efficient execution!

Grouping may not induce a partitioning

C - Collection

P - pattern

OL - Ordering List

(compose an order direction and an element

or element attribute, with values drawn from an ordered domain)

GB - Grouping basis

(lists elements by label in P, whose value are used to partition the set W of witness tree of P against the collection C)

Output: Output tree Si corresponding each group Wi (witness tree) is showed in the next page.


Output tree: Si another example

tax_group_root

tax_group_subroot

tax_grouping_basis

one child for each

element In the

grouping basis

roots of the input

tree in C that

corresponding to Wi


How to make flwr execution more efficient by using grouping operator
How to make FLWR execution more efficient by using grouping operator?

FOR $a IN distint-value(document(“bib.xml”)//author)

RETURN

<authorpubs>

{$a}

{

FOR $b IN document(“bib.xml”)//article

WHERE $a = $b/author

RETURN $b/title

}

</authorpubs>


Algorithm: operator?

  • Construct an initial pattern tree from the “inner” FLWR statement and consisting of bound variables and their paths from the document root.

$1

$1.tag = doc_root &

$2.tag = article

pc

$2

Construct the input for the GROUPBY operator

$1

$1.tag = article &

$2.tag = author

pc

$2


TAX group root operator?

  • Apply the GROUPBY operator on the collection of trees generated from step 1.

TAX group

subroot

TAX group

basis

article

author

article

title

year

author

title

year

author


$1.tag = TAX Group root &

$2.tag = TAX.Grouping basis &

$3.tag = TAX group subroot &

$4.tag = author &

$5.tag = article &

$6.tag = title

PL: $1, $4*, $6*

$1

$2

$3

$4

$5

$6

5. Use rename operator to change the dummy root to the tag specified in the return clause.


Outline4
Outline grouping nodes necessary for the outcome.

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue


Query evaluation
Query Evaluation grouping nodes necessary for the outcome.

  • Physical Algebra

    • Separation of physical algebra and logical algebra

    • Pattern Tree Reuse

    • Node Materialization

  • Structural Joins in Pattern Tree Matching

  • GroupBy


Physical algebra
Physical Algebra grouping nodes necessary for the outcome.

Query Evaluation

  • Pattern Tree Reuse

$1

$1.tag = department&

$2.tag = faculty &

$3.tag = RA &

$4.tag = name

$1

Isroot($1) &

$2.tag = secretary

$2

$2

$3

$4

Selection

projection

Find out the secretary for each faculty?

$1

$1.tag = PID1WID2 &

$2.tag = secretary

$2


  • Node Materialization grouping nodes necessary for the outcome.

    Timber system has materialization in the physical algebra, which takes a node identifier(s) as input and returns a set of XML tree(s) that correspond.

    Partial materialization is needed to minimize the size of the intermediate results being manipulated.


Structural joins in pattern tree matching

Query Evaluation grouping nodes necessary for the outcome.

Structural Joins in Pattern Tree Matching

  • For performance reason, full database scan is not be able to find all the matches in a single pass.

  • Locate one node in each pattern match by indices and scan part of database is good but still expensive.

  • Timber!- Use all available indices and independently locate candidates for as many nodes in pattern tree.


Q: Seeking a faculty who has a secretary reporting to them grouping nodes necessary for the outcome.


merge

Push

AList

DList

stack


Groupby

Query Evaluation grouping nodes necessary for the outcome.

GroupBy

  • RDBMS implement grouping rely on sorting (or hashing)

  • Tree structure grouping not necessarily partition the set. So timber system use pattern tree to identify group list node and thus produce all possible tuples of bindings. Sorting (hashing) then can be performed by using them.


Query optimization
Query Optimization grouping nodes necessary for the outcome.

  • Structural Join Order Selection

  • In relational query processing, it is almost good idea to evaluate selections first.

  • Not in XML! Since structural join may sometimes be more selective than selection predicate; Also, structural joins can be computed with node identifier alone, while selection predicate may require access to the actual data.

  • Finding the best fully pipelined evaluation plan by using algorithm FP-Optimization.


  • Result Size Estimation grouping nodes necessary for the outcome.

  • Need an accurate estimate of the cardinality of the final query as well as each intermediate result for each query plan!

  • Position Histogram

Y-

END

5(faculty) * 3(TA) = 15

Upper bound of number of matches = 2*2+1*3 = 7

faculty

TA

X-START


Outline5
Outline grouping nodes necessary for the outcome.

  • Introduction

  • Motivations and Related Work

  • System Architecture

  • Tree Algebra

  • Query Evaluation

  • Query Optimization

  • Updates Issue


Update issue
Update Issue grouping nodes necessary for the outcome.

  • Start and End label? (floating number)

  • Changes in the sizes and numbers of elements could cause pages to overflow or underflow. Space management!


Discussions
DISCUSSIONS grouping nodes necessary for the outcome.

Thank You!


ad