1 / 21

University of Crete Department of Computer Science ΗΥ-5 61 Web Data Management

University of Crete Department of Computer Science ΗΥ-5 61 Web Data Management. XML Data Archiving Konstantinos Kouratoras. What is the problem?. Most research on database content Usually overwrite existing state Need of research on database history Lost scientific evidence

sean-madden
Download Presentation

University of Crete Department of Computer Science ΗΥ-5 61 Web Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of Crete Department of Computer Science ΗΥ-561Web Data Management XML Data Archiving KonstantinosKouratoras

  2. What is the problem? • Most research on database content • Usually overwrite existing state • Need of research on database history • Lost scientific evidence • No verification of findings basis XML Data Archiving – Konstantinos Kouratoras

  3. Why is this interesting? • History of the data • Scientific research • SWISS-PROT (protein sequence) • OMIM (human genes and genetic disorders) • Great deal of manual labour • Continuous changes • Access to old versions XML Data Archiving – Konstantinos Kouratoras

  4. First Approach • Object matching across versions • Changes descriptions • Archive space • History efficient queries XML Data Archiving – Konstantinos Kouratoras

  5. Proposed technique (1/2) Based on: • Hierarchical data • Key structured databases • Accretive databases XML Data Archiving – Konstantinos Kouratoras

  6. Proposed technique (2/2) • Merging versions into one hierarchy • Elements stored once • Timestamps • Sequence of versions • Time intervals • Inheritance • Keys for element identification XML Data Archiving – Konstantinos Kouratoras

  7. Example XML Data Archiving – Konstantinos Kouratoras

  8. XML Model (1/3) • Nodes values • T-node: data values • A-node: attribute name, attribute value • E-node (internal nodes): tag name • List of values of E and T children • Set of values of A children • Nodes value equality • Agree on their value • Path expression • Sequence of node names XML Data Archiving – Konstantinos Kouratoras

  9. XML Model (2/3) • Key • Pair of path expressions (Q, {P1,…,Pk}) • Q: target set of nodes • {P1,…,Pk}: Q key constraints • Relative key • Description dependent on ancestor node key • Weak entities XML Data Archiving – Konstantinos Kouratoras

  10. XML Model (3/3) • Keys for previous example • (/,(db,{})) • At most one db element at the root • (/db,(address,{})) • At most one address under db node • (/db,(emp,{id})) • Every employee within a db element can be uniquely identified by his id subelement • (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) • There can be at most one name, sal and tel node for each employee XML Data Archiving – Konstantinos Kouratoras

  11. Components (1/4) • Archiver components overview Archive Annotate Keys, Timestamps Nested Merge New Archive Keys Annotate Keys New version Archiver XML Data Archiving – Konstantinos Kouratoras

  12. Components (2/4) • Annotate keys • Elements annotation with key values • Uniquely identified nodes • Path from root to node • Key annotation XML Data Archiving – Konstantinos Kouratoras

  13. Components (3/4) • Nested merge • Identify corresponding elements • Merge elements • Update sets of timestamps • Nodes with no corresponding • Simply added XML Data Archiving – Konstantinos Kouratoras

  14. Components (4/4) XML Data Archiving – Konstantinos Kouratoras

  15. Experimental Results (1/2) • Competitive techniques • Incremental diff • Cumulative diff • Compression methods • Gzip (text) • Xmill (XML) XML Data Archiving – Konstantinos Kouratoras

  16. Experimental Results (2/2) XML Data Archiving – Konstantinos Kouratoras

  17. Efficient Retrievals (1/2) • Version retrieval • Binary tree for each node x with children as leaves • Timestamp • Archive offset XML Data Archiving – Konstantinos Kouratoras

  18. Efficient Retrievals (2/2) • Temporal history retrieval • Find keyed node x • Set of keyed children • Archive offset, timestamp offset • Sort list • Repeat for each keyed node XML Data Archiving – Konstantinos Kouratoras

  19. Conclusion • Efficient archiving technique • Meaningful change descriptions • Space overhead comparable to diff approach • OMIM archive for a year • Less than 1.12 times the space of last version • Less than 1.08 times the size of incremental-diff • 40% compression with XML compression tool • Works well with XML compression • Basic operations with single pass • XML output (further use) XML Data Archiving – Konstantinos Kouratoras

  20. Xarch (1/2) • Archiving tool • Extends archiving technique • Sort elements by key • External merge sort • Query language • Versions retrieval • History tracking XML Data Archiving – Konstantinos Kouratoras

  21. Xarch (2/2) • Query language example XML Data Archiving – Konstantinos Kouratoras

More Related