1 / 17

Version Management for XML Documents Copy-Based vs Edit-Based Schemes

Version Management for XML Documents Copy-Based vs Edit-Based Schemes. Vassilis J. Tsotras Department of Computer Science and Engineering University of California, Riverside tsotras@cs.ucr.edu. Carlo Zaniolo Computer Science Department University of California, Los Angeles

annis
Download Presentation

Version Management for XML Documents Copy-Based vs Edit-Based Schemes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Version Management for XML Documents Copy-Based vs Edit-BasedSchemes Vassilis J. Tsotras Department of Computer Science and Engineering University of California, Riverside tsotras@cs.ucr.edu Carlo Zaniolo Computer Science Department University of California, Los Angeles zaniolo@cs.ucla.edu Shu-Yao Chien Computer Science Department University of California, Los Angeles csy@cs.ucla.edu

  2. The Problem • Managing (storing, querying) multiple versions documents is important for content providers and cooperative work • Temporal DBs: transaction time, CAD/OO applications • Web/XML changes/unifies everything • Traditional schemes (RCS, SCCS): not optimized for secondary store---no temporal clustering • DB-oriented approaches: not optimized for retrieval of complete documents • Transport level: exchange and processing (browser side) of multiversion documents also critical—need to reconcile storage and exchange representations.

  3. Version Management: Approaches • Time stamping of objects • Store all Snapshots: fast retrieval, excessive storage • Edit-Based Schemes store the Deltas. Minimal storage but slow retrieval. • Traditionally line-oriented DIFF, but semistructured objects in Lorel • Our Scheme: Usefulness Based Copy Control (UBCC) - Separate edit scripts from the objects. - Temporal Clustering of objects using page usefulness.

  4. VERSION 1 <root> <ch A> <sec D> ... </sec> <sec E> … </sec> </ch> <ch B> <sec F> … </sec> <sec G> … </sec> <sec H> … </sec> </ch> </root> Example: an Evolving XML Document Order 1 2 3 4 5 6 7 8 VERSION 2 <root> <ch A> <sec J> … </sec> <sec E> … </sec> </ch> <ch B> <sec F> … </sec> <sec G’> … </sec> </ch> <ch K> <sec L> … </sec> </ch> </root> Order 1 2 3 4 5 6 7 8 9

  5. Temporal Clustering by Page Usefulness • Usefulness: percentage of page occupied by objects from the current version—the rest is occupied by ‘dead’ objects from previous versions • We set a minimum usefulness requirement e.g. 50% • When the usefulness of a page fall below this minimum we copy its live objects to a new page

  6. Maintaining Page Usefulness above 70% by Copying Alive Objects VERSION 1 P1 ,U(P1) =75% P2 ,U(P2) = 50%< Umin=70% O1 O2 O3 O4 O5 O6 O7 O8 VERSION 2 DEL DEL DEL Copied O5 O6 O9 O10 P3 ,U(P3) = 100%

  7. Usefulness Based Copy Control (UBCC) • STEP 1 : Determine page usefulness for copying. • STEP 2 : Append new/copied objects into new pages by • their logical order. VERSION 1 P1 , U(P1) = 75% P2 , U(P2) = 50%< Umin=70% root ch A sec D sec E ch B sec F sec G sec H DEL DEL DEL VERSION 2 INS(sec J) INS(sec G’) INS(ch K), INS(sec L) COPY sec J ch B sec F sec G’ ch K sec L P3 , U(P3)=100% P4 , U(P4)=100%

  8. Document Object Order • Version 2 objects are not stored in sequence : VERSION 2 = ( root1 , sec A2 , sec J3 , sec E4 , ch B5 , sec F6 , sec G’7 , ch K8 , sec L9) P2 P1 root1 sec A2 sec D sec E4 ch B sec F sec G sec H P3 P4 sec J3 ch B5 sec F6 sec G’7 ch K8 sec L9 • Hence, we use the edit script.

  9. Beyond Edit-Based Versioning • The UBCC schemes achieves good storage and retrieval efficiency. • But it is not suitable at the transport level and for query on content • Thus, we propose a copy-based model which : • explores shared elements • needs no edit script • Yields a simple XML representation for the document history

  10. The XML Version Model (XVM) • XVM is a list of version nodes • Each version node is an ordered tree consisting of four types of nodes : • element node • attribute node • text node • copy record node • Minimal extensions to the Xpath data model—the copy record node is actually a link.

  11. V V E E E C E T T E T A A C A Tree Addr Ref : V1.2.1 T T A A Copy-Based XML Version Model (XVM) V Version node Element node E T Text node Attribute node C copy record node A

  12. V V V V2 V3 V1 E E chapter “Intro” chapter “Tutorial” E C E C V1.1 chapter “Second Ex” chapter “Second Ex” V2.1 E chapter “Intro” E E E E C C V2.2.1 section “Concepts” section “Scope” section “Test Data” section “Context” E E V2.1.2 section “Concepts” section “Scope” XVM --- Example Changes : 1. UPDATE the textual content of chapter “Second Ex” 2. COPY the “Concepts” section and insert after section “Test data”. Changes : 1. DELETE chapter “Tutorial” 2. INSERT chapter “Second Ex”

  13. XVM Version Retrieval --- Example V V V V2 V3 V1 E E chapter “Intro” chapter “Tutorial” E C E C V1.1 chapter “Second Ex” chapter “Second Ex” V2.1 E chapter “Intro” E E E E C C V2.2.1 section “Concepts” section “Scope” section “Test Data” section “Context” E E V2.1.2 section “Concepts” section “Scope”

  14. XVM Benefits • Transport Level:Represent XVM as an XML document—its DTD automatically generated from the document DTD • Storage Level: we extended the usefulness-based temporal clustering scheme to XVM

  15. XVM Implementation --- Use XML to Represent XVM • DTD Transformation : • Define three new elements : <Repository>, <Version> and <CopyRecord>. • For each element in the original DTD add to its content model a CopyRecord as an alternate. • Example : Version DTD <!ELEMENT Repository (Version)+> <!ELEMENT Version (volumn)> <!ELEMENT CopyRecord> <!ATTLIST CopyRecord Ref IDREF> <!ELEMENT volumn(chapter)*> <!ELEMENT chapter ((title,(sec)*)|CopyRecord)> <!ELEMENT title ((#PCDATA)|CopyRec)> <!ELEMENT sec ((#PCDATA)|CopyRec)> . . . Original DTD <!ELEMENT volumn (chapter)*> <!ELEMENT chapter (title,(sec)*)> <!ELEMENT title (#PCDATA)> <!ELEMENT sec (#PCDATA)> . . .

  16. Performance and Storage Cost

  17. Conclusion • UBCC is efficient at the storage level. • The copy-based scheme is effective as a storage representation and a transport representation • Our current research focuses on efficient evaluation of queries on versions: • content queries, • snapshot queries, • history queries.

More Related