1 / 30

XML + Databases = ? (DIMACS Workshop, 3/2000)

XML + Databases = ? (DIMACS Workshop, 3/2000). Mike Carey Exploratory Database Systems Department IBM Almaden Research Center carey@almaden.ibm.com. Plan for Today’s Talk. Thoughts on DB and web technologies The web and web “querying” Semistructured databases Object-relational databases

farrah
Download Presentation

XML + Databases = ? (DIMACS Workshop, 3/2000)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML + Databases = ?(DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center carey@almaden.ibm.com

  2. Plan for Today’s Talk • Thoughts on DB and web technologies • The web and web “querying” • Semistructured databases • Object-relational databases • XML and databases • XML/DB research at IBM Almaden • The XPERANTO project • Motivation and approach • Whirlwind tour of the system

  3. The Web is Great at Supporting URL-Based Sharing • Ex:Online conference proceedings • Web browsers have given us • Universal file access (ftp++) • Universal document access (html) • Universal service access (forms) • What more could we navigational couch potatoes possibly want? • Universal platform for e-shopping!

  4. The Web is Lousy at Supporting Parametric Searches • Ex:Find all the used Musicman Sterling bass guitars currently available for under $750 within a 50-mile radius of my San Jose home • This is hard for a number of reasons • Data buried in web pages, news groups, classified ads, store sites, auction sites, … • No schema (no metal fish, please!) • No data types (miles, US$, instruments) • No regularity within/across (good!) sites

  5. Aren’t We Supposed to be the Experts on Data Management? • The DB community brought the world • Data models, schemas, and views • Query languages, optimizers, fast joins • Scalable parallel servers • Federated database systems • What do we have in our bag of tricks? • Semistructured databases • Object-relational database systems

  6. Is Semistructured Database Technology the Answer? • Database characteristics • Collections of [name, value] pairs or maybe [name, type, value] triples • Collections typically set<any> or list<any> • System characteristics • “Typeloose” query languages • Indexes for nested, typeloose structures • Appropriate query processing techniques

  7. Are Semistructured Databases the Answer? (2) • No, because schemas are critical for • Data readers • What info is in a given collection? • Thus, what queries might make sense? • Data writers • What should I call this piece of info? • Is it okay to put this kind of data here? • Efficient/effective query processors • Indexing, statistics, ... (e.g., range queries) • Integration mappings (e.g., unit conversions)

  8. Are Semistructured Databases the Answer? (3) • It has some nice features, though • Flexible, dynamic schemas • Forgiving w.r.t. variations and exceptions • Schema evolution is not a big deal • Richer data modeling (vs. relational) • Nested structures, ordered collections • More powerful query languages • Blurring of schema and data querying • Ordering, nesting, restructuring handled

  9. Is Object-Relational Database Technology the Answer? • Database characteristics • Base types, user-defined structured types, inheritance, reference types, collections • Collections are well-typed • System characteristics • Extended SQL-based query languages • Support for methods (fenced/unfenced) • Also triggers, LOBs, extensible indexes

  10. Are Object-Relational Databases the Answer? (2) • No, because most O-R DBMSs have • Overly rigid schemas • Every instance is of one (known) type • Evolving a type can be a major burden • Distributed type management is hard • Crufty old storage managers • Ragged or sparse records poorly supported • Insufficient power in extended SQL • Prehistoric assumptions get in the way • Weak on restructuring, schema-querying

  11. Is XML the Answer?(Yes!! ...What Was the Question Again?) • Structured documents (for the web) <book> <booktitle> Tables Are The Answer </booktitle> <author id = “cdate”> <name> <firstname> Chris </firstname> <lastname> Date </lastname> </name> <address> <city> Saratoga </city> <state> CA </state> </address> </author> </book>

  12. Is XML the Answer? (2) • W3C’s XML Schema working group • Typed elements, attributes, documents • Simple types and complex types • Derived types (extension, restriction) • Facets, anonymous types, groups, … • Uniqueness, keys and key references • W3C’s XML Query working group • XML-QL, Xpath, XQL, XSL/T, XSQL, … • Recommendation due in late 2000 (?)

  13. Is XML the Answer? (3) • XML Schema might help because • XML has achieved a huge mindshare for data interchange on the web • DTD standardization is happening for documents within vertical industries, and XML Schemas should take over • When finished, XML Schema should be a widely used schema description tool • Similar to O-R schemas, but with more flexibility (and web-based sex appeal)

  14. Some Useful XML+DB Topics • Publish documents with XML Schemas from O-R databases • B2B e-commerce messages • B2C comparison shopping (if permitted!) • Robust O-R DB-resident web sites with XML for page content generation • Use XML Schema as the central data model for data integration middleware • I.e., web information integration

  15. Useful XML+DB Topics (2) • Build a “native” XML Repository on top of an O-R DBMS • Map from XML Schema model to O-R DBMS modeling constructs • Map from XML queries to O-R queries (including tag variables and loose typing) • Thereby provide XML document storage management with industrial-strength robustness, scalability, and performance

  16. Useful XML+DB Topics (3) • Evolve XML-QL into a complete web data manipulation language • Typing a la XML Schema • Ordered/unordered collections • XPath-inspired expressions • Easier grouping and aggregation • Updates (insert/delete, modify) • Etc.

  17. The XPERANTO Project • Middleware for publishing O-R (or plain relational) DB content on the web • Provides a virtual XML document view • Based on a “pure XML” approach • Using XML-QL (as W3C placeholder) • Born at Almaden in summer of 1999 • Mike Carey, Dana Florescu, Zack Ives, Ying Lu, Jai Shanmugasundaram, Beau Shekita, Subbu Subramanian

  18. The XPERANTO Belief System • Databases contain, and will continue to contain, the world’s “data jewels” • Transactional data (RDBMS) • Important multimedia assets (ORDBMS) • XML application developers of the future may not love SQL like we do • View databases as default XML documents • Let them define appropriate (query-able) views of these XML documents

  19. XML Schema Generator Views XML Schema Catalog Info XML Tagger SQL Queries Data Tuples Query Translation Metadata Services View Services XML-QL Parser XQGM Type & Table Services Query Rewrite XQGM Table & Type Info SQL Translation O-R Database SQL Query Processor Stored Tables System Catalog XPERANTO Architecture

  20. XPERANTO Components • XML-QL Parser • Neutral query representation (XQGM) • Query Rewrite • View composition and other rewrites • SQL Translation • Produce SQL query(s) to get the required data from the underlying DBMS • XML Tagger • Tag and structure the tabular results

  21. XPERANTO Components • View Services • Repository for XML view definitions • Type & Table Services • Interface (and cache) for DB catalog info • XML Schema Generator • Give DB catalog info in XML Schema form for default views • Infer XML Schema info for queries and non-default view definitions

  22. Consider a Simple O-R Schema Create TablebookAS (bookID CHAR(30), name VARCHAR(255), publisher VARCHAR(30)) Create TablepublisherAS (name VARCHAR(30), address VARCHAR(255)) Create Typeauthor_typeAS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30)) Create Table author OFauthor_type (REF IS ssn USER GENERATED)

  23. Part of the Default XML View <simpleType name=”string255” source=”string”> <maxLength value=”255” /> </simpleType> <simpleType name=”string30” source=”string”> <maxLength value=”30” /> </simpleType> <complexType name=“bookTupleType”> <element name=“bookID” type=“string30” /> <element name=“name” type=“string255” /> <element name=“publisher” type=“string30” /> </complexType> <complexType name=“bookSetType”> <element name=“bookTuple” type=“bookTupleType” maxOccurs=“*” /> </complexType> <element name=“book” type=“bookSetType” /> . . .

  24. XPERANTO’s Default Views • XPERANTO generates default O-R to XML Schema mappings • Each DB shown as an XML file • Subtyping handled via XML Schema’s refinement facilities • OIDs and references become ids/idrefs • “Don’t use this at home!” • Application developers are expected to define the real view(s) using XML-QL

  25. Creating a Better XML View WHERE <library.book.bookTuple> <bookID> $bid </> <name> $name </> <publisher> $bpub </> </> IN “db2:xml:books/library”, $bpub = “Kluwer”CONSTRUCT <book id=$bid> <name> $bname </> {WHERE <library.publisher.publisherTuple> <name> $bpub </> <address> $addr </> </> IN “db2:xml:books/library” CONSTRUCT <publisher> <address> $addr </> </>} {WHERE <library.author.authorTuple> <bookID> $bid </> <first> $fname </> <last> $lname </> </> IN “db2:xml:books/library” CONSTRUCT <author first=$fname last=$lname/>} </> . . .

  26. XPERANTO Query Rewrite • XML-QL queries first translated into XQGM representation • Neutral, well-poised for more features • Easier to go from XML-QL to SQL • Borrow rewrites from DB2 UDB engine • XQGM is an extension of DB2’s QGM • XML data type for “columns” • Set of XML-specific functions

  27. SQL Generation and XML Document Tagging/Structuring • Sorted Outer Union queries are used to obtain the data • Fetch the data in one query that brings it back in the appropriate order • Tag and nest it to create XML document • Advantages of this approach • Shown to be stable as well as fast • Simple (linear-space) tagging possible • Just watch for nesting-related changes

  28. Outer Union Query Example WITH OuterUnion (type, bookID, bookName, pubName, pubAddr, authFirst, authLast) AS ( SELECT ‘0’, b.bookID, b.name, NULL, NULL, NULL, NULL FROM book b WHERE b.publisher = “Kluwer”UNION ALL SELECT ‘1’, b.bookID, NULL, p.name, p.address, NULL, NULL FROM book b, publisher p WHERE b.publisher = “Kluwer” and b.publisher = p.nameUNION ALL SELECT ‘2’, b.bookID, NULL, NULL, NULL, a.first, a.last FROM book b, author a WHERE b.publisher = “Kluwer” and b.bookID = a.bookID)SELECT * FROM OuterUnion ORDER BY bookID

  29. XPERANTO Project Summary • Goal is to publish O-R data in XML form • Default XML views • XML-QL for defining useful views • “Look Ma, no SQL!” • Currently (re)building our prototype • View composition is our first stop • Updates in addition to queries • Queries over both data and metadata • Other needs for XML web sites...?

  30. A Few Closing Remarks • DB community must ensure that the web will support real queries…! • XML Schema and XML Query standards need ongoing input from DB researchers • Large-scale technologies needed for XML indexing, caching, querying, etc. • DB community should also work on important underlying technologies • Publishing XML both from and to RDBMSs and ORDBMSs, for example!

More Related