On relational support for xml publishing
1 / 37

On Relational Support for XML Publishing - PowerPoint PPT Presentation

  • Uploaded on

On Relational Support for XML Publishing. Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by: Conn Doherty. Outline. Motivation & Observations XML Topic of Paper GApply Operator Approach Transformation Rules Experiments and Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'On Relational Support for XML Publishing' - lynn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On relational support for xml publishing

On Relational Support for XML Publishing

Beyond Sorting and Tagging

Surajit Chaudhuri

Raghav Kaushik

Jeffrey F. Naughton

Presented by:

Conn Doherty



  • Motivation & Observations

  • XML

  • Topic of Paper

  • GApply Operator Approach

  • Transformation Rules

  • Experiments and Results

  • Related Work

  • Conclusions

  • Future Problems



  • Does the need for efficient XML publishing bring any new requirements for relational query engines, or is sorting query results in the relational engine and tagging them in middleware sufficient?



  • The mismatch between the XML data model and relational model requires relational engines to be enhances for efficiency

  • Need support for relation-valued variables


On relational support for xml publishing

  • Extendible Markup Language (rather a metalanguage or metametalanguage)

  • Rapidly emerging as a standard for exchanging business data

  • Substantial interest in publishing existing relational data as XML


Current xml publishing
Current XML Publishing

  • Most focus has been on issues external to the RDBMS

    • Determining the class of XML views that can be defined

    • Languages used to specify the conversion from relational data to XML

    • Methods of composing XML queries with XML views

  • Data warehousing has caused focus on similar issues internal to RDBMS


Primary topic of paper
Primary Topic of Paper

  • Focus closely on the class of SQL queries that are typically generated by XML publishing applications

  • Ask if anything needs to be changed within the relational engine to efficiently evaluate these queries?


On relational support for xml publishing

  • Differences in the XML and relational data models

    • cause awkward and inefficient translations of XML queries to relational SQL queries

  • Main Issue

    • XML’s hierarchical model makes it very convenient and natural to apply operators to subtrees


Part supplier example
Part Supplier Example

  • Part and Supplier Data Set

    • supplier(s_key, s_name)

    • partsupp(ps_suppkey, ps_partkey)

    • part(p_partkey, p_name, p_retailprice)


Part supplier example1
Part Supplier Example

  • Query Q1: For each supplier element, return the names and retail prices of all parts supplied by that supplier, and also, the over-all average retail price of all parts supplied

Example XML Document






























Example queries


For $s in /doc(tpch.xml)/suppliers/supplier

Return <ret> $s/s_suppkey


For $p in $s/part

Return <part>








(select ps_suppkey, p_name, p_retailprice,null

from partsupp, part

where ps_partkey = p_partkey

union all

select ps_suppkey,null,null, avg(p_retailprice)

from partsupp, part

where ps_partkey = p_partkey

group by ps_suppkey)

Order by ps_suppkey

Example Queries

  • SQL (relational data model) is hard to express and inefficient

    • Unable to bind a variable to sets of tuples and execute subqueries on these sets


3 angle approach
3 Angle Approach

  • 1) New operator, GApply

    • Binds variable to sets of tuples

    • Allows subqureies to be executed over set of tuples (tmp relation) bound to a variable

  • 2) Propose transformation rules to modify query plan trees with GApply operator

  • 3) Expose GApply operator in SQL syntax


Gapply operator
GApply Operator

  • Syntax: GApply(GCols, PGQ)

    • GCols: grouping/partitioning columns

    • PGQ: per-group query

  • Input tuple stream is partitioned on GCols

  • PGQ applied to each group

  • Output is the union of all above results taken over all groups



  • Outer tuple stream: input tuple stream

  • Inner query: per-group query

  • Outer child of GApply: root of outer query

  • Inner child of GApply: root of inner query


Pgq restrictions
PGQ Restrictions

  • Only operate on temporary relation associated with the group of tuples

  • Operator type also known as groupwise processing

  • Operators allowed in PGQ: scan, select, project, distinct, apply, exists, union(all), groupby, aggregate, and orderby


Physical implemenation
Physical Implemenation

  • Two Phases:

    • Partitioning Phase

      • Implemented using sorting or hashing

    • Execution Phase

      • Performed in nested loop fashion

      • PGQ is evaluated on each group of tuples

        • Each group is a temporary relation bound to a relation-valued parameter $group


Implementation diagram
Implementation Diagram

NL – Nested Loop

Tmp relation: $group


Outer Child

Outer Query

Partition Phase

Inner Child

Inner Query

Execution Phase


Expose gapply in syntax
Expose GApply in Syntax

  • Difficult for the parser and optimizer to determine when GApply applies

  • Tests on Microsoft SQL Server 2000 with GApply operator not exposed in syntax

    • Need sometimes identified by optimizer

    • Use in each case, considerably speeds up performance


Proposed syntax
Proposed Syntax

  • Proposed extension to SQL syntax

  • SQL query performing groupwise processing:

    • Select gapply(PGQ(x)) as <column list>

      from <relation list>

      where <conditions>

      group by <grouping columns> : x

    • x is a relation-valued variable


Example query in syntax
Example Query in Syntax

  • Query Q1:

    • select gapply(PGQ1(tmpSupp))

      from partsupp, part

      where ps_partkey = p_partkey

      group by ps_suppkey: tmpSupp

    • PGQ1(tmpSupp)

      • select p_name, p_retailprice, null

        from tmpSupp

        union all

        select null, null, avg(p_retailprice)

        from tmp


Transformation rules
Transformation Rules

  • Precise semantics of the operators

  • Three categories

    • 1) Pushing Computation into the Outer Query

      • Placing Projections Before GApply

      • Placing Selections Before GApply

      • Converting GApply to groupby

    • 2) Group Selection

    • 3) Pushing GApply Below Joins


Rule 2
Rule 2

  • Group Selection

    • Consider PGQ that either return whole group (subtree) or nothing based on a predicate

    • Two methods to evaluate

      • Join suppliers & parts, group by suppkey, check selection method on group, if true - return group

      • Selection method to get suppkeys, then return join

    • Second method will win if predicate is highly selective


Rule 2 cont
Rule 2 cont.

  • Example

    For $s in /doc(tpch.xml)/suppliers

    /supplier[/part/p_retailprice > 1000]

    Return $s


Integrating rules in optimizer
Integrating Rules in Optimizer

  • None of the rules above loop -> optimizer terminates

  • Optimizer must estimate the cost of the GApply operation


Preliminary experiments
Preliminary Experiments

  • Performance study

    • Find efficacy of the GApply operator to speed up queries

    • Understand impact of each proposed transformation rule

  • Microsoft SQL Server 2000

    • Supports GApply without syntax exposure

    • Control over GApply invocation is needed

      • Simulate operation of GApply on the client side


Client side simulation of gapply
Client Side Simulation of GApply

  • Partition

    • Sorting

    • Hashing (simulation)

  • Execute

    • Store result of outer query in temporary table

    • For each distinct tmp group relation, evaluate PGQ on that relation, then union all results


Estimate running time
Estimate Running Time

  • Measure both elapsed time and CPU time

  • Operator trees with GApply is the top most operator

  • Expect real elapsed time less in full server implementation



  • Experimental Setup

    • TPCH benchmark data

    • 5GB database

    • Server

      • 1 GHz processor

      • 784 MB main memory

      • 512 MB buffer pool

    • Each query ran several times and then average taken



  • Effectiveness of GApply

    • Comparable whether performing partitioning using sorting or hashing

    • Tested 4 queries representing a wide range of queries


Gapply effectiveness results
GApply Effectiveness Results

  • Main conclusions:

    • GApply is a useful operator even for simple XQuery queries

    • Yields improvements of factors of up to 2x faster

    • Queries representative of a wide class of queries

    • Q4 took 20% longer with the client side implementation

    • Q1, Q2, Q3 expect performance improvements with server side implementation

(hash-based partitioning)


Results cont
Results cont.

  • Effectiveness of Optimization Rules

    • Tested the improvement obtained by firing each rule

    • Performance metric is elapsed time

    • Method:

      • Choose relevant parameterized query

      • Vary parameter and find performance benefit for each value

      • Benefit ratio: elapsed time without the rule to time taken with the rule fired


Rule effectiveness example
Rule Effectiveness Example

  • Query:

    • For $s in /doc(tpch.xml)/suppliers

      /supplier[/part/p_retailprice > x]

      Return $s

    • x parameter determines the selectivity of selection


Results cont1
Results cont.

  • Effectiveness of Optimization Rules

    • Main conclusions:

      • Proposed rules can have significant impact on elapsed time of a query involving GApply

      • Some rules always lowered cost of the query, while other occasionally lowered or increased cost

      • Benefit of converting GApply to groupby is comparatively lower


Related work
Related Work

  • Xperanto Project

    • Concluded, pushing as much computation to relational engine is best

  • SilkRoute Project

    • Language to specify the conversion between relational data and XML

  • ROLEX Project

    • To avoid inefficient parsing in applications, the relational engine returns a navigable result tree

  • Difference

    • Question whether whole process of XML publishing has any impact on the core relational operators (YES)



  • Relational engine must provide support for binding variable to sets of tuples

  • Required support can be enabled through the GApply operator with seamless integration into existing relational engines

  • Operator should be exposed in the syntax

  • Optimization rules are needed


Future problems
Future Problems

  • How should modified syntax be exploited by algorithms to translate XML queries over XML views of relational data?

  • Any other changes needed to meet the requirements of XML publishing?

  • What changes are needed in the optimizer if the relational database returns navigable results?


Other papers
Other Papers

  • D. Chatziantoniou and K. A. Ross. Querying multiple features of groups in relational databases. In VLDB, 1996.

    • Extension to SQL syntax with relational algebra implementation

  • D. Chatziantoniou and K. A. Ross. Groupwise processing of relational queries. In VLDB, 1997.

    • Methods to identify group query components

  • C. A. Galindo-Legaria and M. M. Joshi. Ortogonal optimization of subqueries and aggregation. In SIGMOD, 2001.

    • Introduction of segmentApply operator and many transformation rules