Scheduling of Transactions on XML Documents

Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian

Native XML Database • It’s a database that stores and retrieves XML documents efficiently • The query language should be based on XML, and the structure of XML • The indexes should ensure that the fast execution of queries against XML documents G.Dobbie 2004

Pros NXD is more efficient in terms of documents retrieval NXD requires less system space compare to relational DB since it store less nulls and tables compare to relational DB Cons NXD is slow to retrieve a view compare to relational database Pros and cons of NXD compare to relational Database G.Dobbie 2004

Motivation for conflict schduler • NXD becomes more popular for companies to use • Currently there is no NXD scheduler that can guarantee high concurrency and serializability • State of the art of concurrency control in NXD is currently locking the entire document

n1 Root n2 n3 n4 Name = “Jason” Occupation= “Research Assistant” Id = 1 element attribute text Data Model of xml document in Conflict scheduler • Similar to Xpath data model representation <document id = “0”> <person id = “1”, age = “22”> <name>Jason</name> <address>33 Colmar Rd </address> <occupation>Student</occupation> <father> <person id = 3, age = “54”> <name>Ming</name> </person> </father> </person> <person id = “2”, age = “23”> <name>Tanya</name> <occupation>Office Assistant</occupation> </person> </document> figure 1 Figure 1 Figure 2

Data Manipulation language in conflict scheduler • DML is used to query a resultset in a DB, in this case it is a NXD DB • Also similar to Xpath DML • P ::= F | P/ F | P// F • F ::= E | * • Where f can be root or an element of a xml document Dekeyser, S. & Hidders, J. (2004)

Operations in Conflict Scheduler • A(n, a) This is the update operation which add a new node a to an existing node n The new node a is returned as the result of the operation The operation fails if the document can not represent as a tree structure. • D(n) The update operation delete an existing node n from the document tree This operation does not return any result This operation fails if the document structure is not a tree after the updates • Q(n, p) This query operation return all the nodes which is connected from node n by path p Dekeyser, S. & Hidders, J. (2004)

Locking Scheme in Conflict Scheduler • Locking scheme is used to lock the node when one user is accessing it. In this way it prevents other user to modify it at the same time to protect data integrity • There are 2 locking scheme involved in the conflict scheduler • Path Lock Propagation Scheme • Path Lock Satisfiability Scheme Dekeyser, S. & Hidders, J. (2004)

Path Lock Propagation Scheme • In this locking scheme, if node n is locked, all nodes which connect from n with path expression “p” is also locked <a1(Q(, document/person/father/person), t1)> will perform the locking of this node rl(t1, r, document/person/father/person) the propagation process will also perform the following locking rl(t1, n1, person/father/person) rl(t1, n2, father/person) rl(t1, n3, father/person) rl(t1, n8, person) Dekeyser, S. & Hidders, J. (2004)

Path Lock Satisfiability Scheme • This locking generates less locks than Path Lock Propagation Scheme but is hard to complex in terms of testing conflict therefore can also effect the efficiency in terms of processing speed of the database system. <a1(Q(, document/person/father/person), t1)> will perform the locking of this node rl(t1, r, document/person/father/person) Dekeyser, S. & Hidders, J. (2004)

The Conflict Scheduler • The conflict scheduler is the automaton whose state consists of a schedule S of actions that it has previously accepted and processed, a set of locks L, a dependency graph G which is directed graph whose nodes are transaction identifiers, and an instance graph I. Dekeyser, S. & Hidders, J. (2004)

The Conflict Scheduler (cont) • Consider the following schedule S in given the instance graph in figure 1 S in = < a1( Q(r,document/person/occupation, t1), a2 (Q(r, document), t2), a3 (A (n1, person), t2), a4( A(n28, occupation), a5(D ( n22)>

t1 The Conflict Scheduler (cont) • Initially : • S: empty, G empty • First operation ( Q(r, document/person/occupation, t1) dependency graph G1 with only 1 node

t1 t2 The Conflict Scheduler (cont) • Second operation: a2 (Q(r, document), t2) • Dependency graph G2 • two nodes without any edge between them

Root n1 new person node n2 n3 n28 n4 n22 n20 n9 n5 n21 n23 Name = “Jason” Occupation= “Research Assistant” Id = 1 element attribute text The Conflict Scheduler (cont) • Third operation: a3 (A (n1, person), t2) • Dependency graph G3 is still the same to the G2 from the second operation

t1 t2 The Conflict Scheduler (cont) • Forth operation: a4( A(n28, occupation) • This operation conflicts with operation a1. ie the node a4 required (occupation is locked by a1 ) thus, G4 now contains an edge from t1 to t2, after a4, no further conflicts appear. The conflict scheduler finishes and accepts Sin as its output schedule • Dependency graph G4: t1

The Conflict Scheduler (cont) • So now a serial schedule equivalent to S in is the schedule obtained by first taking all actions of transaction t1 and then all actions of transaction t2. The order is based on the dependency graph.

Conclusion • Concurrency becomes important as NXD is gaining popularity • Conflict scheduler has improved concurrency compare to commit scheduler • The serializability of conflict scheduler has also been proven • Future development will aim to realize it in the industry

Scheduling of Transactions on XML Documents