1 / 47

CSE 636 Data Integration

CSE 636 Data Integration. XML Distributed Query Processing Slides by Yannis Papakonstantinou. Overview. The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators Issues Overview An Algebra-Based Architecture Navigation-driven Evaluation.

ksadler
Download Presentation

CSE 636 Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 636Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou

  2. Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation

  3. Data Integration Requirements in eBusiness Applications • It starts with …“Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … • Then the problem comes up…“The applications uses information assets widely distributed across my enterprise” • If only….“Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded”

  4. View-Based Approach: Wrappers Export Basic Source Views customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … <customer_table> <customer> <name>John</name> <id>56</id> <city>Chicago</city> </customer> <customer> <name>George</name> <id>58</id> <city>Chicago</city> </customer> … </customer_table> Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB

  5. Wrappers Export Basic Source Views order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB

  6. Mediators Export Integrated Views, Tailored to Application Needs customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB

  7. Virtual Views: Query-Driven Mediator Operation Find all Chicago customer names, along with their ordered items Application Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Mediator Wrapper Wrapper Customers Database Orders Database

  8. customers customer name John ordered_items item chips item salsa customer … On-Demand (Query-Driven)Mediator Operation customer name John id 56 … order cid 56 item chips order cid 56 item salsa … Application Mediator Wrapper Wrapper Customers Database Orders Database

  9. Multiple Plans are Possible • Retrieve customers • For each customer find matching orders

  10. A New Kind of Query Processing Problem • Build and Run “Optimal” Plan • Consisting of operators that • Collect source info using supported queries and commands • Combine info into XML result

  11. Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • Compose Queries/Views Efficiently • Schema inference & optimization • Combine navigation & querying

  12. From Limited Wrappers to Efficient Plans for Extended Query Sets Queries supported by mediator Queries supported by wrapper • Answering Queries Using Views • But with Infinite Sets of Views • Increasing Relevance due to Web Services all queries over schema Source Data & Schema Source Data & Schema

  13. Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • XQuery processing • Schema inference & optimization • Combine navigation & querying • Build iterator models for low memory footprint

  14. Navigation-Driven Evaluation of Query Result customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago …

  15. Navigation-Driven Evaluation right(p) down(p) p Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  16. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  17. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  18. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  19. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  20. Mixing Querying & Navigation customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Find details of all salsa orders below visited node

  21. Challenges in Mixing Querying & Navigation • Two-dimensional navigation • Reminds of cursors but there are multiple continuation points • Controlling size + shape • Contextualizing queries by navigation

  22. Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation

  23. An Algebra-Based Query Processor Architecture Client XQuery Navigation Requests Results XQuery Views Translation to Algebra Algebra Plan Source Schemas & Types Source Description Rewriter/Optimizer Physical Algebra Plan Functions Plan Execution Engine Function Description Queries & Fetch Requests to Sources

  24. Query Processing on Tuple-Oriented Algebra Enables… • Well-known efficient physical implementations of the operators • Join optimization • Nested data by nested plans or group-by • Efficient iterator model

  25. XQuery: Queries & Views for XML <customers> { for $cust in document(“db”)/customer return <customer> { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return <order> { $order/id } </order> } </customer> } </customers>

  26. Access and Navigation $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 ct c1 i1 $db1 ct c2 i2 getD $cust, id  $cust_id db customer_table customer name John id 56 customer name George id 58 getD $db1, customer  $cust source db, [$db1]

  27. Simplification Using Schema Inference $db1 $cust_id ct i1 ct i2 ct $db1 ct Since $cust_id  $cust and $cust is “useless” otherwise db customer_table customer name John id 56 customer name George id 58 getD $db1, customer/id  $cust_id i1 i2 source db, [$db1]

  28. Nested Plans Plan p … $db1 $cust_id $orders ct i1 [o11…] nestedSrc $part $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id $part ct i1 ct i2 $db1 $cust_id ct i1 ct i2 ct i2 [o21…] apply $part, p  $orders for $part

  29. Joins and Selections $db1 $cust_id ct i1 $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id … $cust_id2=? $db2 $order $cust_id2 $order_id … getD $order, id  $order_id getD $order, cid  $cust_id2 getD $db2, order  $order nestedSrc $part source db, [$db2]

  30. Constructors … $order_id $oidL … o1 [o1] … o2 [o2] … $oidL $oidE … [o1] e1 … [o2] e2 e2 order e1 order $orders [e1, e2] listify $oidE  $orders o2 crEl order, $oidL  $oidE o1 crList $order_id  $oidL … $order_id … o1 … o2

  31. Algebra Example

  32. Plan Decomposition • Within Rewriting Optimizer • Rules replacing “leaf” trees • May move commutable parts • Catch: No projection limitation

  33. Plan After Decomposition

  34. Replacing Nested Plans with GroupBy/Outerjoin Combinations apply $part, p  $R apply $part, p  $R p3 p3 nestedSrc $part groupBy S(p1)  $part p2 nestedSrc $part for $part p1 p1 p2

  35. Multiple Possible Plans

  36. Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation

  37. Building Navigation-Driven Evaluation on the Algebra Client Source access Source access Source Source

  38. Think of Each Operator as a Lazy Mediator $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 root tuple $db1 customer_table customer name John id 56 customer name George id 58 c1 $cust $cust_id i1 tuple getD $cust, id  $cust_id c2 $db1 $cust i2 $cust_id

  39. Navigation-Driven Evaluation of Operators • Augmented with • nextTuple(p) • p.attr Input: client navigations result Lazy Operator Output: source navigations s1 sn ... Result of Operator below Result of Operator below

  40. Use of Semantic Id’s in Navigation-Driven Evaluation <f’1, f’2, …, f’n> Operator State V1: V2: … Vn: Other: … Proceed down/right f’1 f’2 … f’n r/d(<f1, f2, …, fn>) Operator State V1: V2: … Vn: Other: … f1 f2 … fn

  41. Fragments Reduce the “Set State” – “Produce State” Overhead root customer Hole 3 name, “John” order Hole 2 oid, 123 lineitem lineitem lineitem Hole 1

  42. Fragments Reduce the “Set State” – “Produce State” Overhead root customer Hole 3 name, “John” order Hole 5 order ordnum=16 oid, 123 lineitem lineitem lineitem Hole 1 Hole 4 lineitem lineitem

  43. Controlling the Size and Shape of Fragments Client listify Client-Server Interaction Controller listify Source access Source access Source Source

  44.  Fragment Size causes  Memory Footprint causes  Performance

  45. Fragmentation Strategies • Fixed Fragment Size • Ideal for depth-first, left-to-right navigation • Adaptive Fragment Size • Assign larger pieces to those who use them

  46. Response Performance for Breadth-First and Depth-First Depth First traversal Breadth First traversal

  47. References • Navigation-Driven Evaluation of Virtual Mediated Views • Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov • EDBT 2000 • Architecture and Implementation of an XQuery-based Information Integration Platform • Yannis Papakonstantinou, Vasilis Vassalos • IEEE Data Eng. Bull. 25(1), 2002 • XML queries and algebra in the Enosys integration platform • Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov • Data Knowl. Eng. 44(3), 2003

More Related