Efficient incremental validation of xml documents
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Efficient Incremental Validation of XML Documents PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Efficient Incremental Validation of XML Documents. Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet Marcelo Arenas. Presented by Daria Barger. Outline. Introduction Types of constraints Update operations Incremental validation Experiments

Download Presentation

Efficient Incremental Validation of XML Documents

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efficient incremental validation of xml documents

Efficient Incremental Validation of XML Documents

Denilson Barbosa

Alberto O.Mendelson

Leonid Libkin

Laurent Mignet

Marcelo Arenas

Presented by Daria Barger

Daria Barger – DB Seminar


Outline

Outline

  • Introduction

  • Types of constraints

  • Update operations

  • Incremental validation

  • Experiments

  • Conclusions

  • Future work

Daria Barger – DB Seminar


Introduction

Introduction

  • The problems of storing and querying XML documents have attracted a great deal of interest.

  • Other aspects of XML data management, however, have not yet been satisfactorily explored.

  • Among them is the problem of checking that documents are valid with respect to their specifications, and that they remain valid after updates.

Daria Barger – DB Seminar


Efficient incremental validation of xml documents

DTD

  • One popular form of XML document specification is the Document Type Definition (DTD).

  • A DTD D is a grammar that defines a set of documents L(D).

  • Each document in L(D) is said to be valid with respect to D .

Daria Barger – DB Seminar


The validation problem

The Validation Problem

The validationproblem is:

Given a DTD D and an XML document X, is it the case that X  L(D) ?

The incrementalvalidationproblem is:

Let U be some update operation.

Given X  L(D), is it the case that

U(X)  L(D)?

Daria Barger – DB Seminar


Validation of structural constraints

Validation of structural constraints

Content Model:

Element- valid iff the string formed by concatenating its children elements belongs to L(E), the language denoted by E.

Elements are declared in DTD by rules of the form:

<!ELEMENT e c>

<?xml version="1.0"?>

<!ELEMENT db (person*)>

<!ELEMENT person(name, dep, email, tel*)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT dep(#PCDATA)>

<!ELEMENT email(#PCDATA)>

<!ELEMENT tel(#PCDATA)>

Content Model:

#PCDATA – validation can be done trivially

Daria Barger – DB Seminar


Validation of attributes

Validation of attributes

Attributes validation is trivial, except for

ID and IDREF attribute types.

Valid XML document should hold:

  • Values of all ID attributes are unique

  • Value of each IDREF attribute must be equal to the value of some ID attribute

Daria Barger – DB Seminar


1 unambiguous regular expressions

1-unambiguous regular expressions

Marking:

The specification of XML DTDs restricts the regular expression used for defining element content to be 1- unambiguous (deterministic).

Position – subscripted symbol in E`.

For given position x, Χ (x) denotes a corresponding (unmarked) symbol in Σ.

For example: pos(E’) = {a,b1,b2,c}

Χ (b1) =b

Daria Barger – DB Seminar


1 unambiguous regular expressions1

1-unambiguous regular expressions

A regular expression E is 1- unambiguous if and only if for all words u,v,w over the subscripted alphabet pos(E) and all x,y in pos(E), the conditions

uxv, uyw  L(E`) and x≠y

imply Χ(x) ≠ Χ(y)

Which regular expression is deterministic?

  • (ab)|(ac)

  • a(b|c)

  • a(a+b)*ac

Daria Barger – DB Seminar


The glushkov automaton for regular expressions

The Glushkov automaton for Regular Expressions

set of positions that appear as the first symbol of some word in L(E’)

set of positions that appear immediately after position x in some word in L(E’)

set of positions that appear as the last symbol of some word in L(E’)

Daria Barger – DB Seminar


Update operations

Update operations

A

p

A

A

y

A

A

A

A

A

A

A

A

A

A

A

A

  • Append(p,y) - insert element y as the last child of element p.

Append

Daria Barger – DB Seminar


Update operations 2

Update operations (2)

A

A

A

A

A

  • InsertBefore(x,y) – insert element y as immediate left sibling of element x.(This operation is not defined if x is the root of the document).

A

A

A

x

A

A

A

y

A

A

Insert Before

A

A

Daria Barger – DB Seminar


Update operations 3

Update operations(3)

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

x

  • Delete(x) – delete element x from the document. Note that if x is the root of the document the operation is trivially valid.

Delete(x)

Daria Barger – DB Seminar


Observation

Observation

The incremental validation concerns only the content of the element where the update takes place. For example, after an Append(p,y) operation only the content of p needs to be revalidated.

Daria Barger – DB Seminar


The approach

The approach

wk

w2

w1

p

w3

  • Together with the i-th child of p we store the value of for the automaton that validates the content model of p.

  • This requires auxiliary storage of size

    O(n log d), where n is a size of XML document, d is size of DTD

Daria Barger – DB Seminar


Append at the end

Append at the end

wk

w2

w1

p

y

w3

Append(p,y) operation

Daria Barger – DB Seminar


Arbitrary insertions and deletions

Arbitrary insertions and deletions

wk

w2

w1

wi

Delete(x) operation

p

Problem: Complexity

Daria Barger – DB Seminar


1 2 conflict free regular expression

1,2 Conflict Free Regular Expression

Possible solution:

Let’s consider E=a(b1*|cb2*)

W=acb…b. All b’s match state b2

Delete c from w, receive w’=ab…b

Now all b’s match state b1

We should re - validate the entire string

This condition does not hold always, e.g.

Daria Barger – DB Seminar


Definition of 1 2 conflict free

Definition of 1,2 Conflict-free

Let E be regular expression over alphabet Σ

Follow(E,x) – set of position in E that can follow x in some path through E.

Define

such that

E is 1,2 conflict - free regular expression if:

Daria Barger – DB Seminar


Restricted forms of dtd

Restricted forms of DTD

  • 1,2 Conflict Free DTD

    • There is no “flipping” between automata states after the update.

    • The per update complexity for 1,2 Conflict Free DTD is O(log n + log d) time and O(n log d) auxiliary space.

  • Conflict-free DTD:

    • No repeated symbols.

    • The per update complexity: O(log n + log d) and constant auxiliary space.

Daria Barger – DB Seminar


Incremental validation of id and idref for adding element

Incremental validation of ID and IDREF for adding element

Append(p,y) and InsertBefore(x,y) operations require checking that no two ID attributes are the same and every IDREF attribute in y refers to some existing document values.

The complexity:

O(|y|log n) time and linear auxiliary space.

|y| = size of added subtree.

Daria Barger – DB Seminar


Incremental validation of id and idref for deleting element

Incremental validation of ID and IDREF for deleting element

After Delete(x) operation we have to check that there is no subtree rooted at x that contains a node that has an ID attribute referenced by some other node that is not a descendant of x.

c

b

a

Checking reference counter in delete requires O(log n) time.

Updating reference counter in insert/removing IDREF attribute: O(h log n) time.

Daria Barger – DB Seminar


Valid insertion

Valid Insertion

1e+08

Incr CF –

Incr 1.2 CF –

Incr Arb –

Full Arb –

Full CF -

1e+06

Time [micro sec]

10000

100

64K

512K

4M

32M

256M

2G

Document size

Daria Barger – DB Seminar


Valid deletion

Valid Deletion

1e+08

Incr CF –

Incr 1.2 CF –

Incr Arb –

Full Arb –

Full CF -

1e+06

Time [micro sec]

10000

100

64K

512K

4M

32M

256M

2G

Document size

Daria Barger – DB Seminar


Invalid deletion

Invalid Deletion

Incr CF –

Incr 1.2 CF –

Incr Arb –

Full Arb –

Full CF -

1000

Time [micro sec]

100

10

64K

512K

4M

32M

256M

2G

Document size

Daria Barger – DB Seminar


Conclusions

Conclusions

  • Handled insertion and deletion of subtrees (not leaf nodes only).

  • Validated ID and IDREF attributes.

  • Characterize a class of DTDs appearing to capture most real life DTDs that admits a log time and constant space incremental validation algorithm.

  • Conducted experiments showing that the method is practical for large data documents and behaves much better than full revalidation.

Daria Barger – DB Seminar


Future work

Future Work

Handling complex updates, involving several insertions and deletions as a single transactions.

Daria Barger – DB Seminar


  • Login