R sox r untime s emantic query o ptimization over x ml streams
Download
1 / 15

R-SOX: - PowerPoint PPT Presentation


  • 335 Views
  • Updated On :

R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li , Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani D atabase S ystems R esearch G roup Department of Computer Science Worcester Polytechnic Institute

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'R-SOX:' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
R sox r untime s emantic query o ptimization over x ml streams l.jpg

R-SOX:Runtime Semantic Query Optimization over XML Streams

Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang

Drew Ditto, Elke A. Rundensteiner and Murali Mani

Database Systems Research Group

Department of Computer Science

Worcester Polytechnic Institute

Worcester, Massachusetts, USA

VLDB2006Seoul, Korea


Background xml stream applications l.jpg
Background:XML Stream Applications

  • Wide-range and growing applications

    • Examples: news publishing and on-line auction systems

  • Characteristics

    • Real-time processing: short response time

    • Limited resources: minimize memory

News Publishing

On-line Auction


Slide3 l.jpg

Background:Optimization Using Constraints

  • Constraint Properties

    • Document Type Definition (DTD) or XML Schema

    • Constraints are statically available beforehand

  • General XML Semantic Query Optimization (SQO)

    • Tree minimization

    • Recursion optimization

  • Stream-specific XML SQO

    • Context-aware shortcutting

    • Token-granularity data output


Slide4 l.jpg

R-SOX: Motivation and Goal

  • Motivation

    • Scenarios where static schema cannot be applied

    • Challenges when schema comes dynamically:

      - how to represent and manage runtime schema

      - how to exploit dynamic schema for runtime optimization

      - how to propagate runtime schema down stream

  • Goals

    • Runtime schema encoding and synchronization

    • Semantic query optimization techniques

    • Runtime schema propagation


Slide5 l.jpg

R-SOX: Architecture and Workflow

Annotated

Output Stream

Input Stream

Stream

Annotator

Result

Stream

Extended Raindrop XQuery Engine

Plan Refinement

RSI

Result Schema

Query Plan

Generator

Schema Inf.

Manager

Query Plan

Adaptor

Schema

knowledge

Query

Plan

R-SOX System

XQuery

Basic XQuery Evaluation

Runtime Schema Refinement

Runtime Semantic Query Optimization

Downstream Schema Propagation

Raindrop Engine

Demon Focus

R-SOX Contributions

Future Work


Slide6 l.jpg

Basic XQuery Evaluation

XQuery Q1-1:

FOR $o in document(“news.xml")/stream/news

RETURN <result> $o/source, $o/comments </result>

  • Raindrop XQuery Engine

    • Construction of Raindrop plan

    • Automaton-based query evaluation

SJoin on $x

ExtractNest $b

ExtractNest $c

Nav $x//source-> $b

Nav $x//comments->$c

Nav stream//news -> $x

Input Token Stream:

<stream>

<news>

<source>

<content>CNN…</content> <rank>…</rank>…

</source>

<comments>

<content>President…</content>…

</comments>

……

</news>

……

Raindrop XQuery Plan

Stream Data

content

s4

s3

source

stream

news

s0

s1

s2

content

comments

s5

s6

Query Automata


Slide7 l.jpg

Runtime Schema Refinement

Example of RSI:

News

((source | comment)+, date+)

RSI 1:

((news,inf,TIME),

(/news/comment, , ),-)

News

(source+, date+)

RSI 2:

((/news,200,COUNT),

(/news/comment,

/news/source, *), +)

News

(source*, comment+, date+)

  • Runtime Schema Information (RSI)

    • Representing RSI: RSI Grammar

    • Encoding RSI:

      - embedded into input XML token stream

      - extracted using DFA stream loader

  • Managing Schema Information

    • Schema Graph: directed ordered graph

    • Schema graph synchronization with the newly received RSIs

    • History-aware RSI rollback


Slide8 l.jpg

Runtime SQO: Overview

Supporting Following

SQO Techniques:

( 1)

Tree Minimization

( 2)

Recursion Optimization

( 3)

Fast Data Output

( 4)

Navigation Shortcutting

  • Runtime Plan Adaptor

    • Incremental plan migration

    • Rule library

    • Rule applier

  • Query Execution

    • Modifying automata computations

    • Switching execution modes

    • Performing event-condition actions


Slide9 l.jpg

Runtime SQO: Tree Minimization

XQuery Q1:

FOR $o in document(“news.xml")/stream/news

RETURN <result> $o/source, $o/comments </result>

  • Benefits

    • Expedite document traversal on pattern retrieval by avoiding unnecessary navigation

    • Change query plan at run-time by adjusting automata

  • Query Execution

    • Temporarily removing and adding automaton states

RSIs:

P1: ((stream,inf,Count), (/news, source , ), -)

P2: ((stream,inf,Count), (/news, comments ,), -)

stream

(1,∞)

news

Cut by P1

Cut by P2

(1, ∞)

(1, ∞)

comments

date

source

……

……

……

Schema Graph Refinement

Disable the transition by P1

content

s4

s3

source

Disable the transition by P2

stream

news

s0

s1

s2

content

comments

s5

s6

Query Automata Refinement


Slide10 l.jpg

Runtime SQO: Recursion Optimization

Recursive-aware operators will be switched to the non-recursive operator if input XML data isn’t recursive

RSIs:

P1: ((news,inf,Count), (/news, news, ), - )

P2: ((news,inf,Count), (/news, news, ), +)

  • Benefits

    • Improve performance by avoiding unneces-sary over-head on recursive handling

  • Optimization Processing

    • Detect recursion by analyze the runtime schema knowledge

    • Switch between recursion-aware/non-recursive operators

    • Characterize safe moments of runtime migration

RecurSJoin on $x

Recursive Operator

RecurExtractNest $b

RecurExtractNest $c

P1

P2

RecurNav $x//source-> $b

RecurNav $x//comments->$c

Non-recursive Operator

RecurNav stream//news -> $x

Stream Data

Operator Switching in the Query Plan

XQuery Q2: (slightly different with Q1)

FOR $o in document(“news.xml") stream//news

RETURN <result> $o/source, $o/comments </result>


Slide11 l.jpg

Runtime SQO: Fast Data Output

source

date

comments

S2

S3

S4

S1

comments

date

comments

source

  • Benefits

    • Minimize memory consumption by avoiding unnecessary data storage and releasing buffered data at the earliest moment

  • Optimization Processing

    • Augment query automata with Glushkov automata

    • Encode event-condition actions

Glushkov Automata for Type “News”

start

  • Case 1: Overall Schema Knowledge as

  • news((source | comments | date)+)

  • No order constraints can be used.

    Storing comments/content

  • Case 2: Overall Schema Knowledge as

  • News(source+,comments+,date+)

  •  Global order constraint: Order( source, comments )

  • No storage is needed

  • Case3: Overall Schema Knowledge as

  • News( (source | comment)+, date+, comment+ )

  •  Local order constraint: LocalOrder( source, comments )

  • Same as Case 1 at the beginning. Glushkov automata on the type “news” is used to indicate the completeness of source elements. After that, storage on comments/content is not needed

XQuery Q1:

FOR $o in document(“news.xml")/stream/news

RETURN <result> $o/source, $o/comments </result>

content

s5

s4

source

stream

news

s1

s2

s3

content

comments

s6

s7

Actions Encoded into the Automata


Slide12 l.jpg

Runtime SQO: Navigation Shortcut (I)

  • Benefit

    • Expedite document-order traversal on pattern retrieval by early filtering of failed patterns

  • Optimization Rules

    • Order, occurrence and exclusive rules

    • Completeness and minimal cost optimization is guaranteed

  • Query Execution

    • Introduce new pattern look-up into query automata

    • Encode event-condition actions


Slide13 l.jpg

Runtime SQO: Navigation Shortcut (II)

XQuery Q3:

FOR $a in stream(bids)/auction, $bin$a/seller[homepage], $cin$a/bidder[sameAddr]

WHERE $b/*/phone = “508”

RETURN <auction> $b, $c </auction>

Actions Encoded into the Automata

Overall Schema Knowledge as:

Occurrenc( phone, 2 )

when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2and s3

Overall Schema Knowledge as:

Order( primary, homepage)

when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2

Utilizing Order Constraints

Utilizing OccurrenceConstraints


Slide14 l.jpg

R-SOX System Demonstration

Algebraic Query Plan Generation

Runtime SchemaRefinement

  • Application Scenarios:

  • On-line auction data

  • News publishing data

Runtime SQO


Slide15 l.jpg

Raindrop Project

http://davis.wpi.edu/dsrg/raindrop

Recent Publications

  • S.Wang etc. R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB 2006.

  • H.Su etc. Automata Meets Algebra. DKE Journal 2006.

  • M.Wei etc. Processing Recursive XQuery over XML Streams: the Raindrop Approach. XSDM 2006.

  • H.Su etc. Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine. VLDB 2004.

  • H.Su etc. Semantic Query Optimization for XQuery over XML Streams. VLDB 2005.

Source Code Release

  • Raindrop 1.0 is released: http://davis.wpi.edu/dsrg/raindrop/release

Acknowledgement

  • NSF for the Support on Grants IIS 0414567 and CNS 0551584


ad