1 / 73

Bridging Relational Technology and XML

Bridging Relational Technology and XML. Jayavel Shanmugasundaram. University of Wisconsin & IBM Almaden Research Center. eXtensible Markup Language (XML). Internet. Order Fulfillment Application. Purchasing Application. Relational Database System. Relational Database System.

finn
Download Presentation

Bridging Relational Technology and XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging Relational Technologyand XML Jayavel Shanmugasundaram University of Wisconsin &IBM Almaden Research Center

  2. eXtensible Markup Language (XML) Internet Order Fulfillment Application Purchasing Application Relational DatabaseSystem Relational DatabaseSystem Business to Business Interactions Cars R Us Tires R Us

  3. XML applicationdevelopment XML applicationdevelopment Code to convert XMLdata to relational data Code to convert relational data to XML data Relations Relational datamanipulation Shift in Application Developers’ Conceptual Data Model XML

  4. eXtensible Markup Language (XML) Internet Order Fulfillment Application XML DatabaseSystem Are XML Database Systemsthe Answer? Cars R Us Tires R Us Purchasing Application XML DatabaseSystem

  5. Why use Relational Database Systems? • Highly reliable, scalable, optimized for performance, advanced functionality • Result of 30+ years of Research & Development • XML database systems are not “industrial strength” … and not expected to be in the foreseeable future • Existing data and applications • XML applications have to inter-operate with existing relational data and applications • Not enough incentive to move all existing business applications to XML database systems • Remember object-oriented database systems?

  6. XML Translation Layer XML Translation Layer A Solution Cars R Us Tires R Us eXtensible Markup Language (XML) Internet Order Fulfillment Application Purchasing Application Relational DatabaseSystem Relational DatabaseSystem

  7. XML Translation Layer(Contributions) • Store and query XML documents • Harnesses relational database technology for this purpose • Publish existing relational data as XML documents • Allows relational data to be viewed in XML terms

  8. XML applicationdevelopment XML applicationdevelopment XML TranslationLayer Code to convert XMLdata to relational data Code to convert relational data to XML data Relations Relational datamanipulation Bridging Relational Technologyand XML XML

  9. Outline • Motivation & High-level Solution • Background (Relations, XML) • Storing and Querying XML Documents • Publishing Relational Data as XML Documents • Conclusion

  10. Relational Schema and Data PurchaseOrder Id Customer Day Month Year 200I Cars R Us 10 June 1999 300I Bikes R Us null July 1999 Item Payment Name Quantity Cost Installment Percentage Pid Pid 1 200I 200I Firestone Tire 50 2000.00 40% 200I 300I Schwinn Tire 100 2500.00 2 60% 300I 300I Trek Tire 20 400.00 1 100% 200I Goodyear Tire 200 8000.00

  11. Self-describing tags XML Document <PurchaseOrderid=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Itemname=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Itemname=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder>

  12. XML Document <PurchaseOrderid=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Itemname=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Itemname=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder> Self-describing tags Nested structure

  13. XML Document <PurchaseOrderid=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Itemname=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Itemname=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder> Self-describing tags Nested structure Nested sets

  14. XML Document <PurchaseOrderid=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Itemname=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Itemname=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder> Self-describing tags Nested structure Nested sets Order

  15. XML Schema PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date (Item)* (Payment)*</PurchaseOrder> Date <Date>Day? MonthYear</Date> Day <Day> {integer} </Day> Month <Month> {string} </Month> Year <Year> {integer} </Year> Item <Item name={string} cost={float}>Quantity</Item> … and so on

  16. XML Schema (contd.) PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date? (Item | Payment)*</PurchaseOrder> PurchaseOrder <PurchaseOrderid={integer} customer={string}> (Date | Payment*) (Item (ItemItem)* Payment)*</PurchaseOrder> PurchaseOrder <PurchaseOrderid={integer} customer={string}>DateItem (PurchaseOrder)* Payment</PurchaseOrder>

  17. XML Query Find all the items bought by “Cars R Us” in 1999 For $po in /PurchaseOrderWhere$po/@customer = “Cars R Us” and$po/date/year = 1999 Return$po/Item

  18. XML Query (contd.) //Item //Item[5] //Item Before //Payment /(Item/(Item/Payment)*/(Payment | Item))*/Date

  19. Outline • Motivation & High-level Solution • Background (Relations, XML) • Storing and Querying XML Documents • Publishing Relational Data as XML Documents • Conclusion

  20. XML Documents XMLSchema XML Query XML Result Relational Schema Tuples SQL Query Relational Result Translation Information Storing and Querying XML Documents[Shanmugasundaram et. al., VLDB’99] XML Translation Layer Relational Database System

  21. Outline • Motivation & High-level Solution • Background (Relations, XML) • Storing and Querying XML Documents • Relational Schema Design and XML Storage • Query Mapping and Result Construction • Publishing Relational Data as XML Documents • Conclusion

  22. XML Schema PurchaseOrder <PurchaseOrderid={integer} customer={string}> (Date | (Payment)*) (Item (ItemItem)* Payment)*</PurchaseOrder>

  23. Desired Properties of Generated Relational Schema R • All XML documents conforming to XML schema should be “mappable” to tuples in R • All queries over XML documents should be “mappable” to SQL queries over R • Not Required: Ability to re-generate XML schema from R

  24. Simplifying XML Schemas • XML schemas can be “simplified” for translation purposes PurchaseOrder <PurchaseOrderid={integer} customer={string}> (Date | (Payment)*) (Item (ItemItem)* Payment)*</PurchaseOrder> PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date? (Item)* (Payment)*</PurchaseOrder> • All without undermining storage and query functionality!

  25. Why is Simplification Possible? • Structure in XML schemas can be captured: • Partly in relational schema • Partly as data values PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date? (Item)* (Payment)*</PurchaseOrder> • Order field to capture order among siblings • Sufficient to answer ordered XML queries • PurchaseOrder/Item[5] • PurchaseOrder/Item AFTER PurchaseOrder/Payment • Sufficient to reconstruct XML document

  26. Simplification Desiderata • Simplify structure, but preserve differences that matter in relational model • Single occurrence (attribute) • Zero or one occurrences (nullable attribute) • Zero or more occurrences (relation) PurchaseOrder <PurchaseOrderid={integer} customer={string}> (Date | (Payment)*) (Item (ItemItem)* Payment)*</PurchaseOrder> PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date? (Item)* (Payment)*</PurchaseOrder>

  27. Translation Normal Form • An XML schema production is either of the form: <Pattr1={type1} … attrm={typem}>a1 … ap ap+1? … aq? aq+1*… ar* </P> P where aiaj • … or of the form: <Pattr1={type1} … attrm={typem}> {type} </P> P

  28. Example Simplification Rules (e1 | e2) e1? e2? (Date | (Payment)*) (Item (ItemItem)* Payment)* Date? (Item)*? (Item (ItemItem)* Payment)* e*? e* Date? (Payment)*? (Item (ItemItem)* Payment)* Date? (Item)* (Item (ItemItem)* Payment)*

  29. Simplified XML Schema PurchaseOrder <PurchaseOrderid={integer} customer={string}>Date (Item)* (Payment)*</PurchaseOrder> Date <Date>Day? MonthYear</Date> Day <Day> {integer} </Day> Month <Month> {string} </Month> Year <Year> {integer} </Year> Item <Item name={string} cost={float}>Quantity</Item> … and so on

  30. Relational Schema Generation PurchaseOrder (id, customer) 1 * * Date Item (name, cost) Payment ? 1 1 1 Day Month Year Quantity Satisfy: Fourth normal form Minimize: Number of joins for path expressions

  31. Generated Relational Schemaand Shredded XML Document PurchaseOrder Id Customer Day Month Year 200I Cars R Us 10 June 1999 Item Payment Name Order Value Pid Order Cost Quantity Pid 2 200I 200I Firestone Tire 2000.00 40% 1 50 200I 200I 3 Goodyear Tire 8000.00 200 4 60%

  32. PurchaseOrder Id Customer Order Pid 200I Cars R Us null null 900I Us Again 1 5 Item Name Pid Order Quantity Id 200I Firestone Tire 1 50 5 200I 2 Goodyear Tire 200 6 Recursive XML Schema PurchaseOrder (id, customer) * * Item (name) 1 Quantity

  33. Relational Schema Generation and XML Document Shredding (Completeness and Optimality) • Any XML Schema X can be mapped to a relational schema R, and … • Any XML document XD conforming to X can be converted to tuples in R • Further, XD can be recovered from the tuples in R • Also minimizes the number of joins for path expressions (given fourth normal form)

  34. Outline • Motivation & High-level Solution • Background (Relations, XML) • Storing and Querying XML Documents • Relational Schema Design and XML Storage • Query Mapping and Result Construction • Publishing Relational Data as XML Documents • Conclusion

  35. XML Query Find all the items bought by “Cars R Us” in 1999 For $po in /PurchaseOrderWhere$po/@customer = “Cars R Us” and$po/date/year = 1999 Return$po/Item

  36. Path Expression Automata(Moore Machines) PurchaseOrder /PurchaseOrder/Item Item # Item //Item

  37. XML Schema Automaton(Mealy Machine) PurchaseOrder (id, customer) Date Item (name, cost) Payment Day Month Year Quantity

  38. Intersected Automaton PurchaseOrder (customer) = “Cars R Us” Date Item (name, cost) Year Quantity = 1999

  39. Generated SQL Query Select i.name, i.cost, i.quantity From PurchaseOrder p, Item i Where p.customer = “Cars R Us” and p.year = 1999 and p.id = i.pid Predicates Join condition

  40. Recursive XML Query PurchaseOrder (id, customer) Find all items (directly or indirectly)under a “Cars R Us” purchase order Item (name) For $po in /PurchaseOrderWhere$po/@customer = “Cars R Us” Return$po//Item Quantity

  41. Item (name) Quantity PurchaseOrder Recursive Automata Intersection PurchaseOrder (id, customer) PurchaseOrder (customer) = “Cars R Us” Item (name) Quantity

  42. Item (name) Union all Select it.id, it.name, it.quantityFromResultItems rit, PurchaseOrder po, Item itWhere rit.id = po.pid and po.id = it.pid Quantity PurchaseOrder ) Recursive SQL Generation ResultItems (id, name, quantity) as ( PurchaseOrder (customer) Select it.id, it.name, it.quantityFrom PurchaseOrder po, Item itWhere po.customer = “Cars R Us”and po.id = it.pid = “Cars R Us”

  43. SQL Generation for Path Expressions (Completeness) • (Almost) all path expressions can be translated to SQL • SQL does not support seamless querying across data and meta-data (schema) • SQL based on first-order logic (with least fix-point recursion) • Meta-data query capability provided in the XML translation layer • Implemented and optimized using a higher-order operator

  44. Constructing XML Results <itemname = “Firestone Tire” cost=“2000.00”> <quantity> 50 </quantity> </item> <itemname=“Goodyear Tire” cost=“8000.00”> <quantity> 200 </quantity> </item> (“Firestone Tire”, 2000.00, 50) (“Goodyear Tire”, 8000.00, 200)

  45. Complex XML Construction PurchaseOrder (id, customer) = “Cars R Us” Date Item (name, cost) Payment Day Month Year Quantity = 1999

  46. Outline • Motivation & High-level Solution • Background (Relations, XML) • Storing and Querying XML Documents • Publishing Relational Data as XML Documents • Conclusion

  47. Relational Schema and Data PurchaseOrder Id Customer Day Month Year 200I Cars R Us 10 June 1999 Item Payment Name Quantity Cost Installment Percentage Pid Pid 1 200I 200I Firestone Tire 50 2000.00 40% 200I 200I Goodyear Tire 200 8000.00 2 60%

  48. XML Document <PurchaseOrderid=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Itemname=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Itemname=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder>

  49. Naïve Approach • Issue many SQL queries that mirror the structure of the XML document to be constructed • Tag nested structures as they are produced (200I, “Cars R Us”, 10, June, 1999) DBMS Engine PurchaseOrder Item (“Firestone Tire”, 2000.00, 50) (“Goodyear Tire”, 8000.00, 200) (40%) (60%) Payment Problem 1: Too many SQL queries Problem 2: Fixed (nested loop) join strategy

  50. Outside Engine Outside Engine Inside Engine Inside Engine Outside Engine Inside Engine Relations to XML: Issues[Shanmugasundaram et. al., VLDB’00] • Two main differences: • Ordered nested structures • Self-describing tags • Space of alternatives: Early Tagging Late Tagging Early Structuring Late Structuring

More Related