Updating XM and Maintaining XML Views

Updating XM and Maintaining XML Views • Updating XML • Incremental maintenance of XML views (materialized ATG): • A reduction approach • A bud-cut approach • Updating XML views • Virtual views • Materialized views QSX (LN 7)

XML updates update T T • Input: an XML tree T and XML update T • Output:updated XML tree T’ = T + T QSX (LN 7)

query answer XML publishing query translation middleware DBMS RDB Incremental view maintenance Incremental maintenance: propagation from relations to XML • Input: a mapping  from DB R to XML T, and DB updates R • Output: updated XML document T such that T + T =  (R + R) schema T incremental updates R QSX (LN 7)

query answer view updates schema XML publishing query translation middleware DBMS RDB XML view updates XML view updates: propagation from XML to relations • Input: a mapping  from DB R to XML T, and XML updates T • Output: updated database R such that T + T =  (R + R) Already hard for relational views T R QSX (LN 7)

Maintaining XML Views • Updating XML • Incremental maintenance of XML views (materialized ATG): • A reduction approach • A bud-cut approach • Updating XML views • Virtual views • Materialized views QSX (LN 7)

XML updates Native support of XML updates: • update languages for XML data • implementation of XML updates • native storage of XML data • consistency and integrity • concurrency control for XML “databases” • recovery • . . . Question: is there a method to • support updates commonly found in practice? • provide the existing XML query engine with the immediate capability to support XML updates? • avoid the troubles of concurrency control, consistency checking, etc? QSX (LN 7)

XML updates update T T • Input: an XML tree T and XML update T • Output:updated XML tree T’ = T + T Update language • insert e into p • delete p where • p: XPath query • e: an XML element/subtree QSX (LN 7)

title cno “Intro to Quantum Query Languages” “cs843” ST ST prereq ... delete //course[cno = ‘cs843’] Insert ST //course[cno = ‘cs331’] Example XML updates db ... course course course course prereq title cno takenBy ... course course “Advanced Quantum Query Languages” “cs99” QSX (LN 7)

From XML updates to queries Find an automatic method to rewrite XML updates to queries • Input: an XML update T • Output: an equivalent XML query Q, referred to as a complement query, such that for any XML tree T, Q(T) = T + T Motivation: • Updates and queries can be processed in a uniform framework: • evaluation and optimization • no need for implementing updates separately • Immediate support of XML updates by existing XML query engine • No need to hack the engine • Composition of updates and queries becomes trivial: composition of queries Always possible? Certainly: XQuery and XSLT are Turing-complete But how efficient? QSX (LN 7)

Rewrite XML updates to queries let $xp = doc(T)/p declare function local:insert ($n as node(), $xp as node()*) as node() { if $n[element] then element { fn:local-name($n) /* copy the element tag { for $c in $n/* return local:insert($c, $xp)} /* recursive call: children {if $n in $xp then {e} /* insertion } else $n /* copy the node if it is not an element } • Recursively traverse T and insert the subtree e; efficient methods • Similarly for deletions Rewriting insert e into p to a complement query in XQuery QSX (LN 7)

Updating virtual views How can one update a virtual XML view of distributed sources? • Rewrite update to an equivalent complement query Q • Querying the “updated” virtual view: Q’(V) = Q’(Q(V)) 1, 2,Q3, Q4, 5, Q6 Query Q’ XML: virtual Mediator wrapper wrapper wrapper QSX (LN 7) XML RDB OODB

Hypothetical queries Q when {{U}} • Q: query, U: update • What Q would return if U were executed against the DB? – U is not actually executed and thus does not inflict destructive effect • Important for decision making, active database, version control Examples: How much will a laptop cost if 15% more tax is posed on some parts made in certain countries? Complement query solution: • Rewrite U into a complement query Q’ • Compose Q (Q’(DB)) – without inflicting destructive impact QSX (LN 7)

Maintaining XML Views • Updating XML • Incremental updating XML views (materialized ATG): • A reduction approach • A bud-cut approach • Updating XML views • Virtual views • Materialized views QSX (LN 7)

(offline) Application defines an XML Publishing View Application makes request for XML document from middleware. Middleware translates to some number of SQL queries, sends to DB DB optimizes and executes queries, returns tuple streams Middleware merges and tags streams --> <XML> Application parses and uses result Parser XML Text View Def. Tuple- Stream(s) SQL Execute Optimize Review: XML Publishing Flow Application Logic Application Query Rewrite Tagger XML Pub. Middleware DBMS QSX (LN 7)

View Query Cached Trees T Tuple- Stream(s) SQL Updates: I Goal: Incremental Update Application Logic Parser Application XML Text XML Pub Middleware Query Rewrite Tagger Process Updates?? Execute Optimize I QSX (LN 7) DBMS

Why incremental update? Goal: update external materialized XML tree in response to changes  Ito the underlying database • Batch computation: recompute the entire tree from scratch; large XML views may take multiple hours or days to produce! • Incremental computation: compute XML change  T • Idea: the new view T’ = the old view T +  T • Why? the new view T’ often differs slight from the old view T – reuse partial results computed earlier • Typically more efficient to compute  T (small) and update the old viewT with  T Incremental computation: an effective technique with a wide range of applications QSX (LN 7)

Updates: I incremental update T Coping with source updates I ATG – XML publishing source database cached XML tree T Problem: the underlying database may be updated constantly ( I), e.g., modifying Procedure(tname1,tname2) -- insertions/deletions Goal: update the published (materialized) XML tree in response to source changes  I -- updating the treatment hierarchies • Incremental approach: compute XML change  T such that the new view T’ = the old view T +  T QSX (LN 7)

Incremental evaluation of ATG: Running Example Source relational schema: course(cno, title, dept), project(cno, title, dept) student(ssn, name) enroll(ssn, cno) prereq(cno1, cno2) Target DTDD for course catalogs: db course* course  (cno, title, type, prereq, takenby) type  (regular | project) prereq  course* takenBy  student* student  (ssn, name) Remark: the DTD is recursive QSX (LN 7)

ATG Rules • db  course * • course  cno, title, type, prereq,takenBy $course  Q1 Q1: select distinct c.cno, c.title, 1 as tag from course c where c.dept = ``CS'' union select distinct p.cno, p.title, 2 as tag from project: p where p.dept = ``CS'' $cno = $course.cno, $title = $course.title, $type = $course.tag, $prereq = $course.cno, $takenBy =$course.cno QSX (LN 7)

ATG definition (cont.) • type  (regular | project) • prereq  course * • takenBy  student* ($regular, $project) = case $type of 1: ($type, null) 2: (null, $type) $course  Q2($prereq) Q2(p1): select distinct c.cno, c.title, 1 as tag from prereq p, course c where p.cno1 = p1 and p.cno2 = c.cno $student  Q3($takenBy) Q3(t): select distinct s.ssn, s.name from enroll e, student s where e.cno = t and e.ssn = s.ssn QSX (LN 7)

title cno “Intro to Quantum Query Languages” “cs843” prereq ... Example XML View db ... course course course course prereq title cno takenBy ... course course “Advanced Quantum Query Languages” “cs99” QSX (LN 7)

title cno “Intro to Quantum Query Languages” “cs843” title cno “Intro to Quantum Query Languages” “cs843” prereq ... Sub-Tree Property db ... course course course course prereq title cno takenBy ... course course “Advanced Quantum Query Languages” “cs999” Tree depth unbounded — exception handling Sub-tree Property: Sub-tree determined by root attribute ($course) QSX (LN 7)

H [(cno, “cs999”), (title, “advanced…”),(prereq, p999)] (course, “c999”) [(course, “c843”), (course, “m530”), … (prereq, p999) Middleware XML Representation • Associate an ID with each node in the tree Small, unique value derived from the node’s semantic attribute • Store each unique sub-tree once (has pros and cons) • Use a hash table H to map from (type, ID) to a node in the graph • Each node has a reference count and a children list [(type1, ID1), (type2, ID2), …] • A sub-tree pool stores any nodes with reference count QSX (LN 7)

H [(cno, “cs999”), (title, “advanced…”),(prereq, p999)] (course, “c999”) [(course, “c843”), (course, “m530”), … (prereq, p999) Processing Updates • An edge consists of a pair of ( (type,ID), (type2,ID2) ) • An update to the tree, T, contains two sets, E+, E-, of edges • To process E-, find (type2, ID2) in the child list of (type, ID) and remove, decrement reference count on (type2, ID2) • To process E+, insert (type2, ID2) in the child list of (type, ID) and increment reference count on (type2, ID2) • Nodes with 0 reference counts move to sub-tree pool QSX (LN 7)

Reduction Approach • Most XML middleware takes a “reduction approach”: • treat Relational Database Systems (DBMS) as a black box, • re-use as much functionality as possible • Top commercial systems support incremental update of some views • Idea: treat view query as a view, and ask the DBMS to incrementally update it • Capture these changes, and propagate to XML tree QSX (LN 7)

Reduction Approach Overview • For each semantic attribute $foo in an ATG , define a “virtual relation”, gen_foo • For each rule “  … $foo…” where $foo appears on the RHS, define a query Q_foo_from_ • Generates entries in gen_foo from $A, the semantic variable for  • Now, treat these queries as nodes in a graph, and create edges for dependencies… QSX (LN 7)

db  course course  prereq c2 c1 prereq  course course  cno course  type type  regular type project course  takenBy takenby  student Example Dependency Graph Cyclic components Require “novel” approach Contiguous collections of single-input nodes Use techniques from “standard XML publishing” QSX (LN 7)

The “Novel” Approach • Regions without recursion can be mapped to relational views using techniques from PRATA • Regions with recursion depend on support for recursion in the DBMS • Many algorithms are known for incremental update of recursive views, but unfortunately not in practice. • Limited support for recursion exists in a recent, but not yet widely implemented standard, SQL 99, as the WITH…RECURSIVE clause. QSX (LN 7)

Reducing Recursive Components to WITH…RECURSIVE • Recursive components of “virtual relation graphs” can be translated to WITH…RECURSIVE queries by adapting a technique from relational incremental view updates • Are we done? NO • Very few database systems implement WITH…RECURSIVE • Fewer support its use in views • None support incremental update of these views • At least one DB (IBM DB2) has no obvious way to capture incremental updates to views For XML publishing middleware, incremental update processing should support the lowest common denominator of functionality QSX (LN 7)

title cno “Intro to Quantum Query Languages” cs843 E- are cuts E+ are buds (or cross edges) X course X prereq 2. Generate the sub-trees under the buds, re-using existing and deleted sub-trees as possible ... 3. Collect Garbage. Bud-Cut Approach 1.For a set of base table changes, I, execute a set of non-recursive queries which determine direct edge changes, E-, E+ db ... course course course prereq title cno takenBy ... course course “cs999” “Advanced Quantum Query Languages” QSX (LN 7)

The Bud-Cut Approach • The “delta-edge” queries needed for the first phase can be generated using any of several techniques from the literature. • “Sub-tree reuse”: • Other recursive update algorithms first recursively delete everything reached from E-, then add back based on E+, whereas bud-cut generates E- by executing a fixed number of delta-edge queries • The implementation depends on the sub-tree property • Generation phase similar to ATG construction, and can use similar optimizations (grouping of queries, etc.) • The two phases can be overlapped QSX (LN 7)

Incremental ATG Evaluation • New reduction of ATGs to advanced relational database functionality • New bud-cut algorithm for incremental ATG evaluation • Available when advanced functionality not present • Has optimizations based on XML structure of output Does the incremental approach always outperform the batch method? QSX (LN 7)

Incremental: “proportional” to update QSX (LN 7)

Updating XML Views • Updating XML • Incremental updating XML views (materialized ATG): • A reduction approach • A bud-cut approach • Updating XML views • Virtual views • Materialized views QSX (LN 7)

query answer view updates schema XML publishing query translation middleware DBMS RDB XML view updates XML view updates: propagation from XML to relations XML views: • Published from relational data (typically materialized) • Stored in relations via XML shredding (virtual) T R QSX (LN 7)

View updates: hard even for relational views • Input: a relational view definition , an instance of relational database I of schema R, a view V = (I), and view updates V • Output: database updates I such that V + V =  (I + I) May not be updatable: • Schema: R(A, B), S(B, C); • View: AC (R S) • View delete: remove (a1, c1). Not doable without side effect (the deletion of (a1, c2) or (a2, c1)) R: S: V: QSX (LN 7)

More on relational view updates May not have a unique answer: • Schema: R(A, B), S(B, C); • View: AC (R S) • View delete: remove (a1, c1). Four possible ways: remove either (a1, b1), (a1, b2), (b1, c1) or (b2, c1) Question: can we find a minimal update? R: S: V: QSX (LN 7)

Complexity of relational updates • View updatability problem: given a relational view definition , an instance of relational database I of schema R, a view V = (I), and view updates V, determine whether the view is updatable, ie, whether there exist side-effect-freedatabase updateI such that V + V =  (I + I) • Minimal view update problem: given a relational view definition , an instance of relational database I of schema R, a view V = (I), and view insertions (resp. deletions) V, it is to find the smallest database update I such that V take place, ie, V are in (resp. not in for delete)  (I + I) • The view updatability problem is NP-hard for relational views defined with PJ or JU only, even for only deletions • The minimal view update problem is NP-hard for relational views defined with PJ or JU only, even for only deletions QSX (LN 7)

XML view updates • Input: an ATG  from DB R to XML T, and XML updates T • Output: updated database R such that T + T =  (R + R) XML updates: • insert e into p • delete p where p: XPath query; e: an XML element/subtree Recall that T is stored as edge relations – relational view V Approach: • Translate T to V • Resolve the relational view update problem: from V to base relational updates R QSX (LN 7)

Relational encoding of XML views • An edge consists of a pair of ( (type,ID), (type2,ID2) ) • Each edge relation is a relational view: defined in terms of queries embedded in ATG • Relational coding: a collection V of relational views (edges) • An update to the tree, T, contains two sets, E+, E-, of edges • To process E-, find (type2, ID2) in the child list of (type, ID) and remove, decrement reference count on (type2, ID2) • To process E+, insert (type2, ID2) in the child list of (type, ID) and remove, increment reference count on (type2, ID2) • DAG (directed acyclic graph) • Compressed: each subtree is stored only once; could lead to exponential saving in space • Finite representation of XML data of recursive DTDs QSX (LN 7)

Processing XML view updates Two-phase solution: • From updates on XML T to updates on relational views V • Input: an ATG  from DB R to XML T, stored as multiple relational views V, and XML updates T • Output: relational view updates V such that V + V is the relational representation of T + T There is an efficient algorithm to compute V • XPath evaluation on DAG rather than trees – complications • From relational view updates V to base relation updates R Processing of relational view updates • Heuristic in general (NP-complete) • Polynomial time for special cases, e.g., key preservation QSX (LN 7)

Summary and review • Why rewriting XML updates into complement queries? • What is hypothetical query? Can it be implemented for XML? • How to (a) update a virtual view? (b) compose query and update? • Why do we want to incrementally maintain XML views? • What is the key criterion for an incremental algorithm? • What is the view update problem? Why is it hard? Exercise: • Write an algorithm that, given an XML insert/delete, computes an equivalent complement query in XQuery. What is the complexity of your complement query? (hint: its upper bound in linear time data complexity) • Study XML view updates for in the shredding context (virtual view) QSX (LN 7)

Updating XM and Maintaining XML Views