1 / 28

Using XML to View Relational Data

Using XML to View Relational Data. Xin He AMPS Seminar November 30, 2001. Outline. Introduction An XML Primer Several related systems and results An XML middle-ware system: SilkRoute XML and Spreadsheets XML and OLAP. Introduction.

yanka
Download Presentation

Using XML to View Relational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

  2. Outline • Introduction • An XML Primer • Several related systems and results • An XML middle-ware system: SilkRoute • XML and Spreadsheets • XML and OLAP

  3. Introduction • XML -- Basic ideas are simple but potential impact is significant • Easy to read • Simple and flexible • Easy to extract useful information • Research opportunities XML brings to Database Management • XML will “turn the Web into a database” • Thus general Database Management issues arise for XML

  4. Introduction Why using XML to view Relational Data • XML is emerging as the standard data-exchange format between applications on the Web • While most existing data is stored in relational databases • This scenario is common • This scenario is challenging • Relational data is flat, normalized, its schema is often proprietary • XML data is nested, unnormalized, its schema is public • So, the mapping is inherently complex and maybe difficult to compute efficiently

  5. An XML(eXtensible Markup Language) Primer Example XML file (From Apache Tomcat 4.0 configuration file): 1. <Server> 2. <Service name=“Tomcat-Standalone”> 3. <Connector className=“http.HttpConnector” port=“80”> 4. <Connector className=“http.HttpConnector” port=“8443”> 5. <Factory className=“SSLServerSocketFactory”> 6. </Connector> 7. <Engine> 8. <parameter> 9. <name>mail.smtp.host</name> 10. <value>localhost</value> 11. </parameter> 12. </Engine> 13. </Service> 14. <Service name=“Tomcat-Apache”> 15. <Connector className=“warp.WarpConnector” port=“8008”> 16. </Service> 17.</Server>

  6. An XML Primer Example DTD(Document Type Definitions) 1. <?xml encoding="US-ASCII"?> 2. <!ELEMENT Server (Service*)> 3. <!ELEMENT Service (Connector*, Engine*)> 4. <!ATTLIST Service name ID> 5. <!ELEMENT Connector (Factory?)> 6. <!ATTLIST Connector className #REQUIRED 7. port #REQUIRED> 8. <!ELEMENT Engine (Parameter)> 9. <!ELEMENT Parameter (name, value)> 10. <!ELEMENT name (#PCDATA)> 11. <!ELEMENT value (#PCDATA)>

  7. An XML Primer • XML is a method for putting structured data in a text file • XML looks a bit like HTML but isn't HTML • XML is a family of technologies • http://www.w3c.org • XML is new, but not that new • XML is license-free, platform-independent and well-supported

  8. Several Related Systems and Results • An XML middle-ware system in AT&T research labs: SilkRoute • Automate the conversion of realtional data into XML • A new paper published to optimize the query processing algorithm • IBM research center: Efficiently Publishing Relational Data as XML Documents • Language specification is based on SQL with minor extension • So standard APIs like ODBC can be used • Query performance is worse than the revised SilkRoute

  9. Several Related Systems and Results • UCSD: MIX--Mediation of Information using XML • DTD inference • Concentrate on Information Integration • A more complicated architecture • University of Wisconsin-Madison: Relational Databases for Querying XML Documents • Objective is different, but part of techniques is related • Limitations and opportunities: some valuable points

  10. SilkRoute Introduction • Public DTDs: Numerious industries are working on it • http://www.oasis-open.org/cover • Construct XML views which conform to the public DTDs from vast stores of relational data automatically • The system is general, dynamic, and efficient

  11. Motivating Example A simple example from electronic commerce: Suppliers provide product information to resellers • For mutual benefit, they have agreed on a particular DTD • Supplier's business data is organized according to a relational schema • Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers • Assume supplier wants to export a subset of its inventory (e.g. only its winter-outerwear stock) • Resellers: access that data by formulating queries over the XML view • Reseller is typically only interested in a small subset of the info (e.g. sale price less than half of the retail price) • Relational schemas differ from supplier to supplier

  12. Architecture SilkRoute’s Architecture

  13. The View Query: RXL • Full power on both sides • Joins, selection conditions, aggregates, and nested queries • Generate XML data with arbitrary levels of nesting • RXL has three powerful features which make it possible to create arbitrary complex XML structures • Nested queries, Skolem functions, and Block structure

  14. construct • <supplier ID=Supp()> • <company ID=Comp()>”Acme Clothing”</company> • { • from Clothing $c • where $c.category = “outerwear” • construct • <product ID=Prod($c.pid)> • <name ID=Name($c.pid,$c.item)>$c.item</name> • <category ID=Cat($c.pid,$c.category)>$c.category</category> • <retail ID=Retail($c.pid,$c.price)>$c.price</retail> • {from SalePrice $s • where $s.pid = $c.pid • construct • <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail> • } • {from Problems $p • where $p.pid = $c.pid • construct • <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)> • $p.comments • </report> • } • </product> • } • </supplier> RXL (Relational to XML transformation Language) view query ( V )

  15. The User Query: XML-QL • Construct • <results> • { where <supplier> • <company>$company</company> • <product> • <name>$name</name> • <retail>$retail</retail> • <sale>$sale</sale> • </product> • </supplier> in http://acme.com/products.xml, • $sale < 0.5 * $retail • construct • <result ID=Result($Company)> • <supplier>$company</supplier> • <name>$name</name> • </result> • } • </results> XML-QL user query (U)

  16. The Query Composer • The composed RXL query is equivalent to the user query evaluated on the materialized view • Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database

  17. The Query Composer • Construct • <results> • { from Clothing $c, SalePrice $s • where $c.category = “outerwear”, • $c.pid = $s.pid, • $s.price < 0.5 * $c.retail • construct • <result ID=Result( “Acme Clothing” )> • <supplier> ”Acme Clothing” </supplier> • <name ID=Name($c.pid, $c.item)>$c.item </name> • } • </results> Composed RXL query ( C )

  18. Composition Algorithm Problem Statement: * C = UοV * XD = V(RDB) * A = U(XD) = U(V(RDB)) * C(RDB) = A = U(V(RDB)) = (UοV)(RDB)

  19. Composition Algorithm Key Idea: Match U's pattern on V directly, without constructing XD • First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted) • Second step: Construct C • C's construct clause is the same as U's construct clause, with variable substitutions • C's from and where clauses consist of all the "relevant" from and where in V and all the where filters in U, with variable renaming

  20. construct • <supplier ID=Supp()> • <company ID=Comp()>”Acme Clothing”</company> • { • from Clothing $c • where $c.category = “outerwear” • construct • <product ID=Prod($c.pid)> • <name ID=Name($c.pid,$c.item)>$c.item</name> • <category ID=Cat($c.pid,$c.category)>$c.category</category> • <retail ID=Retail($c.pid,$c.price)>$c.price</retail> • {from SalePrice $s • where $s.pid = $c.pid • construct • <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail> • } • {from Problems $p • where $p.pid = $c.pid • construct • <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)> • $p.comments • </report> • } • </product> • } • </supplier> RXL view query ( V ) with patterns from XML-QL query in RED

  21. Composition Algorithm Diagram of Query Composition

  22. Translator and XML Generator • The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template • Initial SilkRoute uses full partition strategy • The IBM research paper: sorted, outer union strategy • The 2001 SIGMOD paper gives an optimal algorithm • XML generator merges the result tuples into XML document in a single pass

  23. Other Scenarios Minor changes to the information flow permit other scenarios • Export the entire database as one, large XML document by materializing the view query • The result of query composition can be kept virtual for later composition with other user queries

  24. Alternative Approaches • Materialized XML view • Precompute or compute on demand • Feasible when the XML view is small and the applicaton needs to load the entire view in memory • Data may become stale • Use a native XML database engine • Stanford DB group: Lore Project • One can materialize an XML view using SilkRoute and store the result in an XML engine • Avoid the cost of query composition • Performance is unlikely to compete with SQL engine anytime soon • Can't guarantee data freshness and incur a high space cost

  25. XML and Spreadsheets XML support in Microsoft Excel 2002 for Office XP • "..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment." • Bi-direction transformation • Excel can recognize and open XML documents including XSL processing • XML flattening • Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"

  26. XML and Spreadsheets Well, Microsoft enable Spreadsheets to manipulate XML documents We can enable Spreadsheets to manipulate relational data using XML view!

  27. XML and Spreadsheets Can we execute Spreadsheet style processing directly on XML files ? • XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets • Given system like SilkRoute Can we define Spreadsheet function map on XML view? • Think the function executions as XML view queries • Define one function on the other is similar as define a new XML view from an existing composed view • It will also arise challenge in the view generation systems

  28. XML and OLAP (OnLine Analytical Processing) • Physically integrating unexpected data into OLAP systems is time-consuming • Logical integration is the better choice • XML’s increasing use in data-exchange suggests that the required data can be available through XML views • Possibilities: • Reference external XML data in OLAP queries • XML data can be presented along with dimensional data in the result of an OLAP query • Use XML data for selection and grouping • Microsoft and Hyperion published "Open XML for Analysis Specification" in April 2001

More Related