on relational support for xml publishing
Download
Skip this Video
Download Presentation
On Relational Support for XML Publishing

Loading in 2 Seconds...

play fullscreen
1 / 37

On Relational Support for XML Publishing - PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on

On Relational Support for XML Publishing. Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by: Conn Doherty. Outline. Motivation & Observations XML Topic of Paper GApply Operator Approach Transformation Rules Experiments and Results

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' On Relational Support for XML Publishing' - lynn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on relational support for xml publishing

On Relational Support for XML Publishing

Beyond Sorting and Tagging

Surajit Chaudhuri

Raghav Kaushik

Jeffrey F. Naughton

Presented by:

Conn Doherty

CS561

outline
Outline
  • Motivation & Observations
  • XML
  • Topic of Paper
  • GApply Operator Approach
  • Transformation Rules
  • Experiments and Results
  • Related Work
  • Conclusions
  • Future Problems

CS561

motivation
Motivation
  • Does the need for efficient XML publishing bring any new requirements for relational query engines, or is sorting query results in the relational engine and tagging them in middleware sufficient?

CS561

observations
Observations
  • The mismatch between the XML data model and relational model requires relational engines to be enhances for efficiency
  • Need support for relation-valued variables

CS561

slide5
XML
  • Extendible Markup Language (rather a metalanguage or metametalanguage)
  • Rapidly emerging as a standard for exchanging business data
  • Substantial interest in publishing existing relational data as XML

CS561

current xml publishing
Current XML Publishing
  • Most focus has been on issues external to the RDBMS
    • Determining the class of XML views that can be defined
    • Languages used to specify the conversion from relational data to XML
    • Methods of composing XML queries with XML views
  • Data warehousing has caused focus on similar issues internal to RDBMS

CS561

primary topic of paper
Primary Topic of Paper
  • Focus closely on the class of SQL queries that are typically generated by XML publishing applications
  • Ask if anything needs to be changed within the relational engine to efficiently evaluate these queries?

CS561

slide8
YES!
  • Differences in the XML and relational data models
    • cause awkward and inefficient translations of XML queries to relational SQL queries
  • Main Issue
    • XML’s hierarchical model makes it very convenient and natural to apply operators to subtrees

CS561

part supplier example
Part Supplier Example
  • Part and Supplier Data Set
    • supplier(s_key, s_name)
    • partsupp(ps_suppkey, ps_partkey)
    • part(p_partkey, p_name, p_retailprice)

CS561

part supplier example1
Part Supplier Example
  • Query Q1: For each supplier element, return the names and retail prices of all parts supplied by that supplier, and also, the over-all average retail price of all parts supplied

Example XML Document

<suppliers>

<supplier>

<sname>S1</sname>

<parts>

<part>

<pname>P1</pname>

<retailprice>10</retailprice>

</part>

<part>

<pname>P2</pname>

<retailprice>10</retailprice>

</part>

</parts>

</supplier>

<supplier>

<sname>S2</sname>

<parts>

<part>

<pname>P21</pname>

<retailprice>12</retailprice>

</part>

<part>

<pname>P22</pname>

<retailprice>13</retailprice>

</part>

</parts>

</supplier>

<suppliers>

CS561

example queries
XQuery

For $s in /doc(tpch.xml)/suppliers/supplier

Return <ret> $s/s_suppkey

<parts>

For $p in $s/part

Return <part>

$p/p_name

$p/p_retailprice

</part>

</parts>

avg($s/part/p_retailprice)

</ret>

SQL

(select ps_suppkey, p_name, p_retailprice,null

from partsupp, part

where ps_partkey = p_partkey

union all

select ps_suppkey,null,null, avg(p_retailprice)

from partsupp, part

where ps_partkey = p_partkey

group by ps_suppkey)

Order by ps_suppkey

Example Queries
  • SQL (relational data model) is hard to express and inefficient
    • Unable to bind a variable to sets of tuples and execute subqueries on these sets

CS561

3 angle approach
3 Angle Approach
  • 1) New operator, GApply
    • Binds variable to sets of tuples
    • Allows subqureies to be executed over set of tuples (tmp relation) bound to a variable
  • 2) Propose transformation rules to modify query plan trees with GApply operator
  • 3) Expose GApply operator in SQL syntax

CS561

gapply operator
GApply Operator
  • Syntax: GApply(GCols, PGQ)
    • GCols: grouping/partitioning columns
    • PGQ: per-group query
  • Input tuple stream is partitioned on GCols
  • PGQ applied to each group
  • Output is the union of all above results taken over all groups

CS561

terminology
Terminology
  • Outer tuple stream: input tuple stream
  • Inner query: per-group query
  • Outer child of GApply: root of outer query
  • Inner child of GApply: root of inner query

CS561

pgq restrictions
PGQ Restrictions
  • Only operate on temporary relation associated with the group of tuples
  • Operator type also known as groupwise processing
  • Operators allowed in PGQ: scan, select, project, distinct, apply, exists, union(all), groupby, aggregate, and orderby

CS561

physical implemenation
Physical Implemenation
  • Two Phases:
    • Partitioning Phase
      • Implemented using sorting or hashing
    • Execution Phase
      • Performed in nested loop fashion
      • PGQ is evaluated on each group of tuples
        • Each group is a temporary relation bound to a relation-valued parameter $group

CS561

implementation diagram
Implementation Diagram

NL – Nested Loop

Tmp relation: $group

$group

Outer Child

Outer Query

Partition Phase

Inner Child

Inner Query

Execution Phase

CS561

expose gapply in syntax
Expose GApply in Syntax
  • Difficult for the parser and optimizer to determine when GApply applies
  • Tests on Microsoft SQL Server 2000 with GApply operator not exposed in syntax
    • Need sometimes identified by optimizer
    • Use in each case, considerably speeds up performance

CS561

proposed syntax
Proposed Syntax
  • Proposed extension to SQL syntax
  • SQL query performing groupwise processing:
    • Select gapply(PGQ(x)) as <column list>

from <relation list>

where <conditions>

group by <grouping columns> : x

    • x is a relation-valued variable

CS561

example query in syntax
Example Query in Syntax
  • Query Q1:
    • select gapply(PGQ1(tmpSupp))

from partsupp, part

where ps_partkey = p_partkey

group by ps_suppkey: tmpSupp

    • PGQ1(tmpSupp)
      • select p_name, p_retailprice, null

from tmpSupp

union all

select null, null, avg(p_retailprice)

from tmp

CS561

transformation rules
Transformation Rules
  • Precise semantics of the operators
  • Three categories
    • 1) Pushing Computation into the Outer Query
      • Placing Projections Before GApply
      • Placing Selections Before GApply
      • Converting GApply to groupby
    • 2) Group Selection
    • 3) Pushing GApply Below Joins

CS561

rule 2
Rule 2
  • Group Selection
    • Consider PGQ that either return whole group (subtree) or nothing based on a predicate
    • Two methods to evaluate
      • Join suppliers & parts, group by suppkey, check selection method on group, if true - return group
      • Selection method to get suppkeys, then return join
    • Second method will win if predicate is highly selective

CS561

rule 2 cont
Rule 2 cont.
  • Example

For $s in /doc(tpch.xml)/suppliers

/supplier[/part/p_retailprice > 1000]

Return $s

CS561

integrating rules in optimizer
Integrating Rules in Optimizer
  • None of the rules above loop -> optimizer terminates
  • Optimizer must estimate the cost of the GApply operation

CS561

preliminary experiments
Preliminary Experiments
  • Performance study
    • Find efficacy of the GApply operator to speed up queries
    • Understand impact of each proposed transformation rule
  • Microsoft SQL Server 2000
    • Supports GApply without syntax exposure
    • Control over GApply invocation is needed
      • Simulate operation of GApply on the client side

CS561

client side simulation of gapply
Client Side Simulation of GApply
  • Partition
    • Sorting
    • Hashing (simulation)
  • Execute
    • Store result of outer query in temporary table
    • For each distinct tmp group relation, evaluate PGQ on that relation, then union all results

CS561

estimate running time
Estimate Running Time
  • Measure both elapsed time and CPU time
  • Operator trees with GApply is the top most operator
  • Expect real elapsed time less in full server implementation

CS561

setup
Setup
  • Experimental Setup
    • TPCH benchmark data
    • 5GB database
    • Server
      • 1 GHz processor
      • 784 MB main memory
      • 512 MB buffer pool
    • Each query ran several times and then average taken

CS561

results
Results
  • Effectiveness of GApply
    • Comparable whether performing partitioning using sorting or hashing
    • Tested 4 queries representing a wide range of queries

CS561

gapply effectiveness results
GApply Effectiveness Results
  • Main conclusions:
    • GApply is a useful operator even for simple XQuery queries
    • Yields improvements of factors of up to 2x faster
    • Queries representative of a wide class of queries
    • Q4 took 20% longer with the client side implementation
    • Q1, Q2, Q3 expect performance improvements with server side implementation

(hash-based partitioning)

CS561

results cont
Results cont.
  • Effectiveness of Optimization Rules
    • Tested the improvement obtained by firing each rule
    • Performance metric is elapsed time
    • Method:
      • Choose relevant parameterized query
      • Vary parameter and find performance benefit for each value
      • Benefit ratio: elapsed time without the rule to time taken with the rule fired

CS561

rule effectiveness example
Rule Effectiveness Example
  • Query:
    • For $s in /doc(tpch.xml)/suppliers

/supplier[/part/p_retailprice > x]

Return $s

    • x parameter determines the selectivity of selection

CS561

results cont1
Results cont.
  • Effectiveness of Optimization Rules
    • Main conclusions:
      • Proposed rules can have significant impact on elapsed time of a query involving GApply
      • Some rules always lowered cost of the query, while other occasionally lowered or increased cost
      • Benefit of converting GApply to groupby is comparatively lower

CS561

related work
Related Work
  • Xperanto Project
    • Concluded, pushing as much computation to relational engine is best
  • SilkRoute Project
    • Language to specify the conversion between relational data and XML
  • ROLEX Project
    • To avoid inefficient parsing in applications, the relational engine returns a navigable result tree
  • Difference
    • Question whether whole process of XML publishing has any impact on the core relational operators (YES)

CS561

conclusions
Conclusions
  • Relational engine must provide support for binding variable to sets of tuples
  • Required support can be enabled through the GApply operator with seamless integration into existing relational engines
  • Operator should be exposed in the syntax
  • Optimization rules are needed

CS561

future problems
Future Problems
  • How should modified syntax be exploited by algorithms to translate XML queries over XML views of relational data?
  • Any other changes needed to meet the requirements of XML publishing?
  • What changes are needed in the optimizer if the relational database returns navigable results?

CS561

other papers
Other Papers
  • D. Chatziantoniou and K. A. Ross. Querying multiple features of groups in relational databases. In VLDB, 1996.
    • Extension to SQL syntax with relational algebra implementation
  • D. Chatziantoniou and K. A. Ross. Groupwise processing of relational queries. In VLDB, 1997.
    • Methods to identify group query components
  • C. A. Galindo-Legaria and M. M. Joshi. Ortogonal optimization of subqueries and aggregation. In SIGMOD, 2001.
    • Introduction of segmentApply operator and many transformation rules

CS561

ad