290 likes | 379 Views
Using XML to View Relational Data. Xin He AMPS Seminar November 30, 2001. Outline. Introduction An XML Primer Several related systems and results An XML middle-ware system: SilkRoute XML and Spreadsheets XML and OLAP. Introduction.
E N D
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001
Outline • Introduction • An XML Primer • Several related systems and results • An XML middle-ware system: SilkRoute • XML and Spreadsheets • XML and OLAP
Introduction • XML -- Basic ideas are simple but potential impact is significant • Easy to read • Simple and flexible • Easy to extract useful information • Research opportunities XML brings to Database Management • XML will “turn the Web into a database” • Thus general Database Management issues arise for XML
Introduction Why using XML to view Relational Data • XML is emerging as the standard data-exchange format between applications on the Web • While most existing data is stored in relational databases • This scenario is common • This scenario is challenging • Relational data is flat, normalized, its schema is often proprietary • XML data is nested, unnormalized, its schema is public • So, the mapping is inherently complex and maybe difficult to compute efficiently
An XML(eXtensible Markup Language) Primer Example XML file (From Apache Tomcat 4.0 configuration file): 1. <Server> 2. <Service name=“Tomcat-Standalone”> 3. <Connector className=“http.HttpConnector” port=“80”> 4. <Connector className=“http.HttpConnector” port=“8443”> 5. <Factory className=“SSLServerSocketFactory”> 6. </Connector> 7. <Engine> 8. <parameter> 9. <name>mail.smtp.host</name> 10. <value>localhost</value> 11. </parameter> 12. </Engine> 13. </Service> 14. <Service name=“Tomcat-Apache”> 15. <Connector className=“warp.WarpConnector” port=“8008”> 16. </Service> 17.</Server>
An XML Primer Example DTD(Document Type Definitions) 1. <?xml encoding="US-ASCII"?> 2. <!ELEMENT Server (Service*)> 3. <!ELEMENT Service (Connector*, Engine*)> 4. <!ATTLIST Service name ID> 5. <!ELEMENT Connector (Factory?)> 6. <!ATTLIST Connector className #REQUIRED 7. port #REQUIRED> 8. <!ELEMENT Engine (Parameter)> 9. <!ELEMENT Parameter (name, value)> 10. <!ELEMENT name (#PCDATA)> 11. <!ELEMENT value (#PCDATA)>
An XML Primer • XML is a method for putting structured data in a text file • XML looks a bit like HTML but isn't HTML • XML is a family of technologies • http://www.w3c.org • XML is new, but not that new • XML is license-free, platform-independent and well-supported
Several Related Systems and Results • An XML middle-ware system in AT&T research labs: SilkRoute • Automate the conversion of realtional data into XML • A new paper published to optimize the query processing algorithm • IBM research center: Efficiently Publishing Relational Data as XML Documents • Language specification is based on SQL with minor extension • So standard APIs like ODBC can be used • Query performance is worse than the revised SilkRoute
Several Related Systems and Results • UCSD: MIX--Mediation of Information using XML • DTD inference • Concentrate on Information Integration • A more complicated architecture • University of Wisconsin-Madison: Relational Databases for Querying XML Documents • Objective is different, but part of techniques is related • Limitations and opportunities: some valuable points
SilkRoute Introduction • Public DTDs: Numerious industries are working on it • http://www.oasis-open.org/cover • Construct XML views which conform to the public DTDs from vast stores of relational data automatically • The system is general, dynamic, and efficient
Motivating Example A simple example from electronic commerce: Suppliers provide product information to resellers • For mutual benefit, they have agreed on a particular DTD • Supplier's business data is organized according to a relational schema • Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers • Assume supplier wants to export a subset of its inventory (e.g. only its winter-outerwear stock) • Resellers: access that data by formulating queries over the XML view • Reseller is typically only interested in a small subset of the info (e.g. sale price less than half of the retail price) • Relational schemas differ from supplier to supplier
Architecture SilkRoute’s Architecture
The View Query: RXL • Full power on both sides • Joins, selection conditions, aggregates, and nested queries • Generate XML data with arbitrary levels of nesting • RXL has three powerful features which make it possible to create arbitrary complex XML structures • Nested queries, Skolem functions, and Block structure
construct • <supplier ID=Supp()> • <company ID=Comp()>”Acme Clothing”</company> • { • from Clothing $c • where $c.category = “outerwear” • construct • <product ID=Prod($c.pid)> • <name ID=Name($c.pid,$c.item)>$c.item</name> • <category ID=Cat($c.pid,$c.category)>$c.category</category> • <retail ID=Retail($c.pid,$c.price)>$c.price</retail> • {from SalePrice $s • where $s.pid = $c.pid • construct • <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail> • } • {from Problems $p • where $p.pid = $c.pid • construct • <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)> • $p.comments • </report> • } • </product> • } • </supplier> RXL (Relational to XML transformation Language) view query ( V )
The User Query: XML-QL • Construct • <results> • { where <supplier> • <company>$company</company> • <product> • <name>$name</name> • <retail>$retail</retail> • <sale>$sale</sale> • </product> • </supplier> in http://acme.com/products.xml, • $sale < 0.5 * $retail • construct • <result ID=Result($Company)> • <supplier>$company</supplier> • <name>$name</name> • </result> • } • </results> XML-QL user query (U)
The Query Composer • The composed RXL query is equivalent to the user query evaluated on the materialized view • Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database
The Query Composer • Construct • <results> • { from Clothing $c, SalePrice $s • where $c.category = “outerwear”, • $c.pid = $s.pid, • $s.price < 0.5 * $c.retail • construct • <result ID=Result( “Acme Clothing” )> • <supplier> ”Acme Clothing” </supplier> • <name ID=Name($c.pid, $c.item)>$c.item </name> • } • </results> Composed RXL query ( C )
Composition Algorithm Problem Statement: * C = UοV * XD = V(RDB) * A = U(XD) = U(V(RDB)) * C(RDB) = A = U(V(RDB)) = (UοV)(RDB)
Composition Algorithm Key Idea: Match U's pattern on V directly, without constructing XD • First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted) • Second step: Construct C • C's construct clause is the same as U's construct clause, with variable substitutions • C's from and where clauses consist of all the "relevant" from and where in V and all the where filters in U, with variable renaming
construct • <supplier ID=Supp()> • <company ID=Comp()>”Acme Clothing”</company> • { • from Clothing $c • where $c.category = “outerwear” • construct • <product ID=Prod($c.pid)> • <name ID=Name($c.pid,$c.item)>$c.item</name> • <category ID=Cat($c.pid,$c.category)>$c.category</category> • <retail ID=Retail($c.pid,$c.price)>$c.price</retail> • {from SalePrice $s • where $s.pid = $c.pid • construct • <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail> • } • {from Problems $p • where $p.pid = $c.pid • construct • <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)> • $p.comments • </report> • } • </product> • } • </supplier> RXL view query ( V ) with patterns from XML-QL query in RED
Composition Algorithm Diagram of Query Composition
Translator and XML Generator • The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template • Initial SilkRoute uses full partition strategy • The IBM research paper: sorted, outer union strategy • The 2001 SIGMOD paper gives an optimal algorithm • XML generator merges the result tuples into XML document in a single pass
Other Scenarios Minor changes to the information flow permit other scenarios • Export the entire database as one, large XML document by materializing the view query • The result of query composition can be kept virtual for later composition with other user queries
Alternative Approaches • Materialized XML view • Precompute or compute on demand • Feasible when the XML view is small and the applicaton needs to load the entire view in memory • Data may become stale • Use a native XML database engine • Stanford DB group: Lore Project • One can materialize an XML view using SilkRoute and store the result in an XML engine • Avoid the cost of query composition • Performance is unlikely to compete with SQL engine anytime soon • Can't guarantee data freshness and incur a high space cost
XML and Spreadsheets XML support in Microsoft Excel 2002 for Office XP • "..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment." • Bi-direction transformation • Excel can recognize and open XML documents including XSL processing • XML flattening • Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"
XML and Spreadsheets Well, Microsoft enable Spreadsheets to manipulate XML documents We can enable Spreadsheets to manipulate relational data using XML view!
XML and Spreadsheets Can we execute Spreadsheet style processing directly on XML files ? • XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets • Given system like SilkRoute Can we define Spreadsheet function map on XML view? • Think the function executions as XML view queries • Define one function on the other is similar as define a new XML view from an existing composed view • It will also arise challenge in the view generation systems
XML and OLAP (OnLine Analytical Processing) • Physically integrating unexpected data into OLAP systems is time-consuming • Logical integration is the better choice • XML’s increasing use in data-exchange suggests that the required data can be available through XML views • Possibilities: • Reference external XML data in OLAP queries • XML data can be presented along with dimensional data in the result of an OLAP query • Use XML data for selection and grouping • Microsoft and Hyperion published "Open XML for Analysis Specification" in April 2001