1 / 34

Historical XML Databases

Historical XML Databases. Fusheng Wang and Carlo Zaniolo University of California, Los Angeles. Overview. State of the art Two scenarios of archiving history Publishing relational database history in XML Temporal queries with XQuery Historical database architecture

fclara
Download Presentation

Historical XML Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Historical XML Databases Fusheng Wang and Carlo Zaniolo University of California, Los Angeles

  2. Overview • State of the art • Two scenarios of archiving history • Publishing relational database history in XML • Temporal queries with XQuery • Historical database architecture • Efficient query support for temporal queries • Conclusion

  3. State of the Art • Publishing Relational DBs as XML Documents: • as actual documents; to be processed using the very rich XML tool set (XSLT, DOM) • as views; to be queried by languages such as XPath or XQuery. Queries against these views are then mapped into SQL queries on the DB • DB vendors are very active in this area, e.g.: • SQLX and SQL functions for XML publishing • XTables ( XPeranto ) as a middleware

  4. Our Proposal:Publish the History of Relational DBs as XML Documents • Publish the history of relational DBs as XML documents: • Natural ways to represent such history in XML • Historical queries can be expressed in XQuery as is—no extensions to the data model or query language required for temporal queries • Approach amenable to efficient implementation: query and storage efficiency of alternative approaches • Gain: Temporal applications are very important and are not supported well by current databases

  5. Two Basic Scenarios • XML Data Warehouses archive the history: • change can be detected by current database update logs • or compute the delta between the published XML document snapshots of the new version and old version • Traditional version management (RCS, SCCS). More recent techniques (UBCC, RBVM) used for XML and complex queries • RDBMSs archive the history: • XML history is a view---and historical queries are mapped back into relational ones (e.g., using XTables)

  6. A Short History of Timein Databases • Between 33 and 48 proposals counted: • A perennial struggle to get around the limitations of relational (flat) tables and a rigid query language (SQL) • Clifford, Croker, Grandi, and Tuzhilin in their “On Temporal Grouping” paper show that the temporal-grouped models are more natural and powerful [Temp DB workshop, 1995] • But it is hard to fit temporally grouped models and query languages into SQL—an infinite morass

  7. Temporal Grouping in XML • XML makes it possible to express and support temporal grouping • The history of a relational DB can be viewed as an XML document, using such representation • Then, powerful temporal queries can be specified without requiring the introduction of new constructs in the language • There are many ways to publish DBs using XML—and not all will do

  8. History of Tables Transaction-Time Relational Tables • Timestamped tuple snapshots • Temporally grouped history of employees

  9. Publishing DB History in XML Many Alternatives • Each table as XML document: columns as attributes • Flat structure that corresponds to the tuple snapshots (employees2.xml) • Each table as an XML document: columns as elements • A natural structure [Clifford et al.] which simplifies many queries (employees.xml, depts.xml) • Multiple tables as a single XML document: flat structure • Good for some join queries but not for others(company.xml) • Multiple tables as a single XML document: hierarchy • similar but more intuitive than previous (depts3.xml) • Multiple tables as an XML document: flat structure with IDs • Can simplify join queries with IDs and IDREFs (company2.xml)

  10. XML Representation of DB HistoryTable Columns as XML Elements <employees tstart="1995-01-01" tend="1996-12-31"> <employee tstart="1995-01-01" tend="1996-12-31"> <empno tstart="1995-01-01" tend="1996-12-31">10003</empno> <name tstart="1995-01-01" tend="1996-12-31">Bob</name> <salary tstart="1995-01-01" tend="1995-05-31">60000</salary> <salary tstart="1995-06-01" tend="1996-12-31">70000</salary> <title tstart="1995-01-01" tend="1995-09-30">Engineer</title> <title tstart="1995-10-01" tend="1996-01-31">Sr Engineer</title> <title tstart="1996-02-01" tend="1996-12-31">Tech Leader</title> <dept tstart="1995-01-01" tend="1995-09-30">QA</dept> <dept tstart="1995-10-01" tend="1996-12-31">RD</dept> <DOB tstart="1995-01-01" tend="1996-12-31">1945-04-09</DOB> </employee> <!-- More… --> </employees>

  11. XML Representation of DB History(cont’d) • Historical data is represented in an XML document • Two attributes tstart and tend are used to represent the time interval • The value now is used to denote the ever-increasing current time • Node updates: • delete: tend is updated to the current timestamp • insert: a new node is appended with tend set as now • update: delete followed by an insert

  12. Schema of the XML Representation • The document has a well-defined schema derived from the snapshot document: <!ELEMENT employees (employee)*> <!ATTLIST employees tstart CDATA #REQUIREDtendCDATA #REQUIRED> <!ELEMENT employee (empno+, name+, salary+, title+, dept+, DOB+)> <!ATTLIST employee tstartCDATA #REQUIRED tendCDATA #REQUIRED> <!ELEMENT empno (#PCDATA)> <!ATTLIST empno tstartCDATA #REQUIREDtendCDATA #REQUIRED> <!ELEMENT salary (#PCDATA)> <!ATTLIST salary tstartCDATA #REQUIREDtendCDATA #REQUIRED> <!-- … -->

  13. Temporal Queries with XQuery • Temporal projection: retrieve the salary history of “Bob”: element salary_history { for$s in document("employees.xml")/ employees/employee/[name=“Bob”]/salary return$s } • Snapshot queries: retrieve the departments on 1996-01-31: for$din document("depts.xml")/depts/dept [tstart(.) <= "1996-01-31" and tend(.) >= "1996-01-31"] let$n := $d/name[tstart(.)<="1996-01-31" and tend(.)>="1996-01-31"] let$m := $d/manager[tstart(.)<="1996-01-31" and tend(.)>= "1996-01-31"] return( element dept{$n,$m } )

  14. Temporal Queries with XQuery (cont’d) • Interval Queries. Find employee(s) who worked in the “QA” department throughout the history of that department: for$din document("depts.xml")/depts/dept[deptname='QA']/deptno for$ein document("employees.xml")/employees/employee[deptno=$d] where tstart($e/deptno)=tstart($d) and tend($e/deptno)=tend($d) return$e/name

  15. Complex Temporal Queries with XQuery • A Since B. Find the employee who has been the manager of the dept since he/she joined the dept “d007”: for $e in document("employees.xml")/employees/employee let $m:= $e/title[title="Manager" and tend(.)=current-date()] let $d := $e/deptno[deptno ="d007" and tcontains($m, .) ] where not empty($d) and not empty($m) return <employee> { $e/empno, $e/firstname, $e/lastname}</employee>

  16. Complex Temporal Queries with XQuery (cont’d) • Period Containment. Find employees with same history as employee “10112”, i.e., they worked in the same dept(s) as employee “10112” and exactly for the same periods: • for $e1 in document("employees.xml")/employees/employee • [empno = '10112'] • for $e2 in document("employees.xml")/employees/employee • [empno != '10112'] • whereevery $d1 in $e1/deptno satisfies some $d2 in $e2/deptno satisfies(string($d1) = string( $d2 ) and tequals($d2, $d1)) • and every $d2 in $e2/deptno satisfies • some $d1 in $e1/deptno satisfies • (string($d2) = string( $d1 ) and tequals($d1, $d2)) • return <employee>{$e2/empno}</employee>

  17. User-Defined Temporal Functions • Shield the user from the low-level details used in representing time, e.g., “now” • Eliminate the need for the user to write complex functions, e.g., coalescing and diff • Predefined functions: • History functions: history($e,Ts, Te), snapshot($e, T), invariance($e, Ts, Te) • Restructuring functions: coalese($l) • Interval functions: toverlaps, tprecedes, tcontains, tequals, tmeets • Duration and date/time functions: timespan($e), tstart($e), tend($e), tinterval($e), telement(Ts, Te), getdbnow(), rtend($e), external($e)

  18. Support for ‘now’ • ‘now’: no change until now. Values of tuples are still current at the time the query is asked • Internally, “end of time” values are used to denote ‘now’, e.g., 9999-12-31 • Intervals are only accessed through built-in functions: tstart() returns the start of an interval, tend() returns the end or CURRENT_DATE if it’s different from 9999-12-31 • In the output, tend value can be: • “9999-12-31”, • CURRENT_DATE (through rtend() ), or • “now” (through externalnow() )

  19. Historical Database Architecture Current Database XML Publishing XML Views Active Rules/ update logs XML Data XML Queries XML Publishing XML Views Historical XML Data Temporal XML Queries Historical Database

  20. Historical XML Database ArchitectureTwo Approaches • XML-enabled RDBMS • Historical view decomposed into relational databases as binary tables • Historical data can then be published as XML document through SQL/XML publishing functions; or queried through a middleware as XML views • Native XML databases • Historical data are stored in native XML database • XML queries can be specified directly upon the database • Native XML databases: SoftwareAG’s Tamino, eXcelon’s XIS

  21. Relational Storage of Temporal Relational Data • Assumptions • Each entity or relation has a unique key ( or composite keys) to identify it which will not change along the history. e.g., employee: empno • Relational schema: • employee(empno, firstname, lastname, sex, DOB, deptno, salary, title) • The historical XML documents are decomposed into tables

  22. Relational Storage of Temporal Relational Data (cont’d) • Key table for keys: • employee_id(id, tstart, tend), where id =empno • For composite keys, the table will be like: • lineitem_id(id, supplierno, itemno, tstart, tend) • Attribute history tables: employee_lastname(id, lastname, tstart, tend) … employee_salary(id, salary, tstart, tend) employee_title(id, title, tstart, tend) … • Global relation table: keep all the relations history • relations(name, tstart, tend)

  23. Relational Storage of Temporal Relational Data (cont’d) • Sample contents of employee_salary: ID SALARY TSTART TEND ================================ 100022 58805 02/04/1985 02/04/1986 100022 61118 02/05/1986 02/04/1987 100022 65103 02/05/1987 02/04/1988 100022 64712 02/05/1988 02/03/1989 100022 65245 02/04/1989 02/03/1990 100023 43162 07/13/1988 07/13/1989 ...

  24. XML publishing and XML Queries • A middleware (XPERANTO/XTABLES) can be used to publish and query historical tables as XML documents • Create XML views over relational data • Each database has a default XML view • The temporal XML document representation can be reconstructed with user-defined XML views with XQuery, and be queried with XQuery • Query upon XML views with XQuery • Only the desired relational data items are materialized • Most computation pushed down to relational engine

  25. Automatic Archiving • Statement CREATE HISTORICAL VIEWviewnameAS SELECT col1, col2, … FROMtablename [ USING KEY coli, Colj, …] • Results: • Historical tables are created for each attribute of the current table • Temporal XML views are created with XPERANTO • The historical tables are initialized with the snapshot of the current table • Active rules are started to trace any changes and archive into the historical tables • Temporal XQuery can be specified on the XML views

  26. Implementation Comparisons • A temporal data simulation program automatically generates the historical data in XML • Total number of employees: 300,024 • Database systems and major supported query languages for comparison: • Relational: DB2. SQL • Native: • SoftwareAG’s Tamino (text-based storage). XPath • eXcelon’s XIS (XML Information Server) (OODBMS-based storage). XQuery

  27. Performance Comparisons Storage Size:

  28. Performance Comparisons (cont’d) Query Performance of DB2 and Tamino: Q2: history query Q4,Q6: snapshot queries Q3,Q5: interval queries Q1: scan of databases Q7: join

  29. Performance Comparisons (cont’d) Query Performance of Tamino and XIS (1/3 data size)

  30. Efficient Query Support for Temporal Queries • H-document is first clustered by document structure, and then by the change history • Tamino will preserve the clustering structure thus retrieving the history of a node can be efficient • In RDBMS approach, tuples are stored in the order of updates, neither temporarily clustered nor clustered by objects • Traditional B+ Tree index will not help on interval-related temporal queries • A segment-based archiving scheme was used in this project

  31. Segment-based Archiving Scheme (cont’d) • Sample segments: Segment1 (01/01/1985 - 10/17/1991): ID SALARY TSTART TEND 100002 40000 02/20/1988 02/19/1989 100002 42010 02/20/1989 02/19/1990 100002 42525 02/20/1990 02/19/1991 100002 42727 02/20/1991 12/31/9999 ... Segment2 (10/18/1991 - 07/08/1995): ID SALARY TSTART TEND 100002 42727 02/20/1991 02/19/1992 100002 45237 02/20/1992 02/18/1993 100002 46465 02/19/1993 02/18/1994 100002 47418 02/19/1994 02/18/1995 100002 47273 02/19/1995 12/31/9999 ... segment3 (07/09/1995 - 01/08/1999): ...

  32. Query Performance Query Performance with different usefulness Q1,Q3: snapshot queries Q5: interval queries Q2, Q4: history queries

  33. Conclusion • XML can be used to support a temporally grouped data model, and represent temporal relational data • The framework supports complex temporal queries with XQuery, without extension to XQuery • The XML-viewed history of database tables can be stored using a native XML database or using a RDBMS • RDBMS has significant query performance compared to native XML database, while the latter can be more effective in terms of storage due to compression techniques • A segment-based archiving scheme based on usefulness can significantly boost the performance on most temporal queries

  34. History of XML Documents • The temporal representation in XML not only applies to historical relational data, but also historical XML documents <document tstart="2002-01-01" tend="now"> <chapter tstart="2002-01-01" tend="now"> <no isAttr="yes" tstart="2002-01-01" tend="now">1</no> <title tstart="2002-01-01" tend="2002-01-01">Introduction</title> <title tstart="2002-01-02" tend="now">Introduction and Overview</title> <section tstart="2002-01-01" tend="now"> <title tstart="2002-01-01" tend="now">Background</title> <subsection tstart="2002-01-01" tend="now"> <title tstart="2002-01-01" tend="now">Previous Work</title> <content tstart="2002-01-01" tend="now">...</content> </subsection> </section> </chapter> <!-- ... --> </document>

More Related