430 likes | 575 Views
XQuery to SQL by XML Algebra Tree. Brad Pielech, Brian Murphy Thanks: Xin. Outline. Overview of Rainbow System Process of translating XQuery -> SQL XML Operators Partial translation walkthrough with running example. Rainbow System. Complete XML <-> SQL system
E N D
XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin
Outline • Overview of Rainbow System • Process of translating XQuery -> SQL • XML Operators • Partial translation walkthrough with running example
Rainbow System • Complete XML <-> SQL system • Uses some ideas from XPERANTO, Niagara, and other systems • Several main subsystems: • Document Shredder • View Generator • Query Translation, Query Rewrite • Result Generation • Work in progress
Steps in Translation • User inputs XQuery query • User Query is converted into an XML Algebra Tree (XAT) • Database Mapping Query’s XAT generated • Queries are Decorrelated • Trees are merged, unnecessary branches cut
Steps Continued • Computation Pushdown (presentation concludes here) • SQL Generation • Query Execution • Tagging of Results
What is the difference between the two queries? • The user query is executed over a view of the XML document and specifies what to return and how to return it • The mapping query specifies how the view the user is querying “maps” to the database • Therefore, combining the two queries into one is necessary in order to correctly process the user’s request
XAT Operators • Each XAT is comprised of XAT Operators. • Similar in concepts to Relational Algebra • Operator set is combination between Niagara and Xperanto papers
Set of Operators • SQL like (9): • Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), Cartesian Product. • XML like (4): • Tagger, Navigate, is(Element, Text), Aggregate. • Special: • SQL, Function, Source, NameColumn, FOR
Sports XML Document <sports> <organization> <team> Boston Red Sox </team> <stadium sname = "Fenway Park"/> <starPlayer> <pname> Nomar </pname> <position> Shortstop </position> </starPlayer> <starPlayer> <pname> Pedro </pname> <position> Pitcher </position> </starPlayer> <starPlayer> <pname> Manny </pname> <position> Outfield </position> </starPlayer> </organization> <organization> … </organization> <stadium> <sname> Fenway Park </sname> <capacity> 33,000 </capacity> <yearBuilt> 1912 </yearBuilt> <ticket_high rate = "55"/> <ticket_low rate = "18"/> </stadium> <stadium> … <stadium> <player name="Pedro" number="45" rookieYear = "1991" /> <player name="Nomar" number="5" rookieYear = "1997" /> <player name="Manny" number="24" rookieYear = "1993" /> </sports>
Example XQuery <bestPlayers> { For $p in document("sports.xml")/sports/organization Let $a = $p/team/text() Where $a = "Boston Red Sox" Return <playerName> $p/starPlayer/pname/text() <playerName> } </bestPlayers> List all of the star players’ names on the Boston Red Sox
Tagger(<bestPlayers> V1 </bestPlayers> $pname = Navigate($p, starPlayer/pname/text()) V1 := Aggregate Tagger(<playerName> $pname </playerName> $a := Navigate($p, team/text()) $p := Navigate(“/”, sports/organization) Select($a = "Boston Red Sox") Source(“sports.xml”) XAT Tree for Example Query
RDBMS Tables of Sports Info PlayerInfo Organization Stadium StarPlayer
Partial Default XML View <Organization> <row> <organizationId> 1 </organizationId> <teamName> Boston Red Sox </teamName> <stadiumName> Fenway Park </stadiumName> </row> </Organization> <Stadium> <row> <stadiumID> 1 </stadiumID> <sname> Fenway Park </sname> … </row> </Stadium>
Challenge Question I What is the XQuery that converts the document on the left (default XML view) to the document on the right (user view)? <Organization> <row> <organizationId> 1 </organizationId> <teamName> Boston Red Sox </teamName> <stadiumName> Fenway Park </stadiumName> </row> </Organization> … <StarPlayer> <row> <starPlayerName> Nomar </starPlayerName> <starPlayerPosition> shortstop</starPlayerPosition> <organizationId> 1 </organizationId> </row> </StarPlayer> … <organization> <team> Boston Red Sox </team> <stadium sname = "Fenway Park"/> <starPlayer> <pname> Nomar </pname> <position> Shortstop </position> </starPlayer> <starPlayer> <pname> Pedro </pname> <position> Pitcher </position> </starPlayer> <starPlayer> <pname> Manny </pname> <position> Outfield </position> </starPlayer> </organization>
Mapping Query Part I B2 Create view invoice as ( <sports> FOR $organization IN view ("default") /Organization/row RETURN <organization> <team> $organization/teamName/text() </team> <stadium sname = $organization/stadiumName/text() /> FOR$starPlayer IN view ("default") /StarPlayer/row WHERE$starPlayer/organizationID = $organization/organizationID RETURN <starPlayer> <pname> $starPlayer/starPlayerName/text() </pname> <position> $starPlayer/starPlayerPosition/text() </position> </starPlayer> </organization> B1
Mapping Query Part II FOR$stadium IN view ("default") /Stadium/row RETURN <stadium> <sname> $stadium/sname/text() </sname> <capacity> $stadium/capacity/text() </capacity> <yearBuilt> $stadium/yearBuilt/text() </yearBuilt> <ticket_high rate = $stadium/ticket_high_rate/text() /> <ticket_low rate = $stadium/ticket_low_rate/text() /> </stadium> FOR$player IN view ("default") /PlayerInfo/row RETURN <player name = $player/playerName/text() number = $player/playerNumber/text() rookieYear = $player/rookieYear/text() /> </sports> ) B3 B4
Cutting Mapping Query • The mapping query has data that is unused by the user query, so we can get rid of it • B3 and B4 are completely removed • Remove stadium from B1 • Remove position from B2
Mapping Query XAT General Form B1 FOR $organization Some Stuff $organization := Navigate("/",Organization/row) FOR $starPlayer $starPlayer := Navigate("/", StarPlayer/row) Source(“default.xml”) More Stuff Source(“default.xml”) B2 • Some Stuff will be shown in Part I • More Stuff in Part II
O := Tagger(<sports> All </sports) All = Aggregate Tagger(<organization> V0 </organization) V0 := Aggregate Tagger (<team>$tname </team> ) Mapping Query XAT Part I B1 FOR $organization $tname := Navigate($organization, teamName/text()) FOR $starPlayer $starPlayer := Navigate("/", StarPlayer/row) To: Part II Source("default.xml") Some Stuff
Select($starPlayerID = $ID) $sname := Navigate($starPlayer, starPlayerName) $starPlayerID := Navigate($starPlayer, OrganizationID) $ID := Navigate($organization, organizationID) Mapping Query XAT Part II More Stuff To: Part I Aggregate Tagger(<starPlayer> <pname> $sname </pname> </starPlayer) B2
Decorrelated Mapping XAT Part I <sports> <organization> <team> Boston Red Sox </team> <starPlayer> <pname> Nomar </pname> </starPlayer> <starPlayer> <pname>Pedro </pname> </starPlayer> <starPlayer> <pname> Manny </pname> </starPlayer> </organization> </sports> O:= Tagger(<sports> All </sports>) All = Aggregate Tagger(<organization> V0 </organization) V0 := Aggregate Tagger (<team>$tname </team> $tname := Navigate($organization, teamName/text()) From Part II
$ID := Navigate($organization, organizationID) Select($starPlayerID = $ID) Cartesian Product $sname := Navigate($starPlayer, starPlayerName) $starPlayerID := Navigate($starPlayer, organizationID) Decorrelated Mapping XAT Part II To Part I Aggregate Tagger(<starPlayer> <pname> $sname </pname> </starPlayer) $starPlayer := (Navigate"/", StarPlayer/row) $organization = Navigate("/", Organization/row) Source("default.xml") Source("default.xml")
Progress Report • User inputs XQuery query • User Query is converted into an XML Algebra Tree (XAT) • Database Mapping Query’s XAT generated • Queries are Decorrelated • Trees are merged, unnecessary branches cut
XAT merging • Input: • User Query XAT + Mapping Query XAT • Output: • Simplified composite XAT • Approach: • The Tagger from the top of the Mapping Query is linked to the bottom of the User Query. • The Source Operator at the bottom of the User Query is deleted • Pushdown Navigation • By using the commutative rules • Cancel out the navigation operators • By using the composition rules
O:= Tagger(<sports> All </sports>) All = Aggregate Tagger(<organization> V0 </organization) V0 := Aggregate Tagger (<team>$tname </team> $tname := Navigate($organization, teamName/text()) Combined XAT Top of Mapping Query Tagger(<bestPlayers> V1 </bestPlayers> V1 := Aggregate Tagger(<playerName> $pname </playerName> Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) User Query $pname = Navigate($p, starPlayer/pname/text()) $p := Navigate(O, sports/organization) Rest of Mapping Query
Computation Pushdown Part I What is PushDown? • After merging the 2 XATs, there may be redundancies in the larger tree. • Ex: The user query and mapping query may navigate to the same thing • The decorrelated query tree may be unorganized and inefficient • Pushdown aims to eliminate these problems
Computation Pushdown Part II • XPERANTO mentions pushdown as a means of pushing computation to relational engine • Niagara defines equivalence rules and specifies several different heuristics for using the rules
XAT Pushdown Example Part I Top of Mapping Query Tagger(<bestPlayers> V1 </bestPlayers> V1 := Aggregate O:= Tagger(<sports> All </sports>) Tagger(<playerName> $pname </playerName> All = Aggregate Tagger(<organization> V0 </organization) Select($a = "Boston Red Sox") V0 := Aggregate $a := Navigate($p, team/text()) Tagger (<team>$tname </team> User Query $pname = Navigate($p, starPlayer/pname/text()) $tname := Navigate($organization, teamName/text()) $p := Navigate(O, sports/organization) Rest of Mapping Query
XAT Pushdown Example Part II Top of Mapping Query Tagger(<bestPlayers> V1 </bestPlayers> V1 := Aggregate O:= Tagger(<sports> All </sports>) Tagger(<playerName> $pname </playerName> All = Aggregate Tagger(<organization> V0 </organization) Select($a = "Boston Red Sox") V0 := Aggregate $a := Navigate($p, team/text()) Tagger (<team>$tname </team> User Query $pname = Navigate($p, starPlayer/pname/text()) $tname := Navigate($organization, teamName/text()) $p := Navigate(O, sports/organization) Rest of Mapping Query
XAT Pushdown Example Part III Tagger(<starPlayer> <pname> $sname </pname> </starPlayer) Tagger(<bestPlayers> V1 </bestPlayers> Select($starPlayerID = $ID) V1 := Aggregate $sname := Navigate($starPlayer, starPlayerName) Tagger(<playerName> $pname </playerName> $starPlayerID := Navigate($starPlayer, organizationID) Select($a = "Boston Red Sox") $ID := Navigate($organization, organizationID) Cartesian Product $a := Navigate($p, team/text()) $pname = Navigate($p, starPlayer/pname/text()) $starPlayer := (Navigate"/", StarPlayer/row) $organization = Navigate("/", Organization/row) User Query $p := Navigate(O, sports/organization) Source("default.xml") Source("default.xml")
XAT Pushdown Example Part IV Tagger(<starPlayer> <pname> $sname </pname> </starPlayer) Tagger(<bestPlayers> V1 </bestPlayers> Select($starPlayerID = $ID) V1 := Aggregate $sname := Navigate($starPlayer, starPlayerName) Tagger(<playerName> $pname </playerName> $starPlayerID := Navigate($starPlayer, organizationID) Select($a = "Boston Red Sox") $ID := Navigate($organization, organizationID) Cartesian Product $a := Navigate($p, team/text()) $pname = Navigate($p, starPlayer/pname/text()) $starPlayer := (Navigate"/", StarPlayer/row) $organization = Navigate("/", Organization/row) User Query $p := Navigate(O, sports/organization) Source("default.xml") Source("default.xml")
Challenge Questions II & III • What are some of the heuristics we could use during Pushdown? • What can / should we try to accomplish? • What should the tree look like afterwards? • How could we go about pushing things down? • What would the algorithm be? • How do we know if an operator can be pushed down? • When do we stop pushing an operator down?
Computation Pushdown Part III • Goal: Tagger + SQL operators + XML operators • Use Equivalence rules repository to swap operators • Step 1: Navigation Pushdown. • Cancel Mapping Query Taggers and corresponding Aggregates • Delete redundant Navigates from User Query • Rename columns in Mapping Query • Step 2: SQL Computation Pushdown. • By commutative and composition rules.
Equivalence Rules • Pair-wise rules that determine if one operator (parent) may be pushed through another (child) • Navigate / Navigate rule: If the parent depends on the child, they may not be swapped • Navigate / Join: Navigate is pushed to the side of the join that its entry point comes from • And many, many more
Pushdown Results • Push Navigates to the correct side of Cartesian Product • Create a NameColumn operator that renames $tname into $a • Create a 2nd NameColumn operator that renames $pname into $sname • Get rid of all Taggers and Aggregates from Mapping Query and Navigates that were crossed out from User Query • Merge Select($starPlayerID = $ID) and Cartesian into a Join
Tagger(<bestPlayers> V1 </bestPlayers> V1 := Aggregate Tagger(<playerName> $pname </playerName> Select($a = "Boston Red Sox") NameColumn( $a = $tname) NameColumn( $pname = $sname) XAT After Computation PushDown Part I From Part II
$tname := Navigate($organization, teamName/text()) $starPlayerID := Navigate($starPlayer, OrganizationID) $ID := Navigate($organization, organizationID) $sname := Navigate($starPlayer, starPlayerName) $organization := Navigate("/",Organization/row) $starPlayer := Navigate("/", StarPlayer/row) Source("default.xml") Source("default.xml") XAT After Computation PushDown Part II To Part I Join on ($ID = $starPlayerID)
Rest of the Process • Take the Combined XAT from the previous slide and generate a single SQL query. • Execute query on local RDBMS • Format result tuples according to Tagger • Return XML document to user
Summary • Created XAT of the user query • Created XAT for mapping query • Cut information unused by user query • Decorrelated Mapping query • Merged two queries into 1 larger XAT • Identified weaknesses in combined tree • Walked through pushdown steps • Displayed final, optimized tree