vist a dynamic index method for querying xml data by tree structures n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ViST: a dynamic index method for querying XML data by tree structures PowerPoint Presentation
Download Presentation
ViST: a dynamic index method for querying XML data by tree structures

Loading in 2 Seconds...

play fullscreen
1 / 30

ViST: a dynamic index method for querying XML data by tree structures - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

ViST: a dynamic index method for querying XML data by tree structures. Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva, November 2003. Overview. Modeling XML Queries Structure-encoded sequences Indexing ViST Experimental Results. Modeling XML Queries.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ViST: a dynamic index method for querying XML data by tree structures' - xiu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
vist a dynamic index method for querying xml data by tree structures

ViST: a dynamic index method for querying XML data by tree structures

Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu

Presenter: Elena Zheleva, November 2003

overview
Overview
  • Modeling XML Queries
  • Structure-encoded sequences
  • Indexing
  • ViST
  • Experimental Results
slide4
DTD of purchase records:

(!ELEMENT purchases (purchase*))

(!ELEMENT purchase (seller, buyer))

(!ATTRIST seller ID ID location CDATA name CDATA)

(!ELEMENT seller (item*))

(!ATTRIST buyer ID ID location CDATA name CDATA)

(!ELEMENT item (item*))

(!ATTRIST item name CDATA manufacturer CDATA)

modeling xml queries1
Modeling XML Queries
  • Focus in XML query language design: ability to express complex structural or graphical queries
modeling xml queries2
Modeling XML Queries
  • Querying XML data = finding sub structures of the data graph that match the sequence
  • Structure-encoded sequences: a sequential representation of both XML data and XML queries
structure encoded sequences1
Structure-Encoded Sequences
  • Maps the data and the queries
  • Matches the subsequence
  • Purpose: to avoid as many join operations as possible
  • Def. Sequence of (symbol, prefix) pairs
mapping data
Mapping Data
  • Represent XML document/tree in preorder
  • Represent in structure-encoded seq
mapping queries
Mapping Queries
  • Benefit of sequence matching: query gets processed as whole
  • Path Expression
querying xml
Querying XML
  • through Structure-Encoded Sequence Matching
role of indexing
Role of Indexing
  • To provide an algorithm to perform this sequence matching
  • Desired features for algorithm:
    • Efficient support for subsequence matching
    • Use well-supported DB indexing techniques such as B+ trees
    • Allow dynamic index insertion
what is indexing useful for
What is indexing useful for
  • Auxiliary access structures
    • Used to speed up the retrieval of records
    • In response to certain search conditions
  • Provide efficient support for arbitrary structured queries
    • Using wild-cards // and *
indexing1
Indexing
  • State-of the-art approaches
    • Indexes on paths
    • Indexes on nodes
    • Indexes on both (structures) – ViST
algorithms
Algorithms
  • Naïve Algorithm based on Suffix Trees
  • RIST: Relationships Indexed Suffix Tree
  • ViST: Virtual Suffix Tree
algorithm using suffix trees
Algorithm Using Suffix Trees
  • Suffix Tree: a compact index to all distinct, contiguous substrings of a string
  • D-Ancestorship – in XML doc tree
  • Through structure-encoded sequence
  • S-Ancestorship – in suffix tree
algorithm using suffix trees1
Algorithm Using Suffix Trees
  • Searches
    • first by S-Ancestorship: searching under suffix tree
    • then by D-Ancestorship: matching nodes and prefixes
  • Disadvantages:
    • Costly – traverse large portion of subtree
    • Most commercial DBMSs do not support
rist indexing by ancestor descendant relationships
RIST: Indexing by Ancestor-Descendant Relationships
  • Jumps directly to the nodes Y to which X is both a D-Ancestor and S-Ancestor
  • Index Construction: uses B+ trees
rist indexing by ancestor descendant relationships1
RIST: Indexing by Ancestor-Descendant Relationships
  • Subsequence Matching
  • Determine D-Ancestorship by prefixes
  • Determine S-Ancestorship by label <nx,sizex>
  • x – suffix tree node (root of S-tree)
  • nx – prefix traversal order
  • sizex – number of descendants
vist the virtual suffix tree
ViST: the Virtual Suffix Tree
  • Same sequence algorithm as RIST
  • BUT supports dynamic insertions
  • Uses dynamic method to assign labels
  • Once assigned, the labels are fixed and are not affected by subsequent data insertion or deletion
  • Labeling the suffix tree w/o building it
  • Relies on statistical information about the XML data
vist the virtual suffix tree1
ViST: the Virtual Suffix Tree

Index structure contains the sequence:

Sequence to be inserted:

Dynamic scope of x = <nx, sizex,kx>

experimental results
Experimental Results
  • Datasets used
    • DBLP: CS bibliography DB
      • 289,627 records/publications
      • Each publication – tree of max depth 6
      • Avg length of structure-encoded seq = 31
    • XMARK
      • 1 record
      • Complicated tree structure
    • Synthetic
experimental results1
Experimental Results
  • Comparison Methods
    • Index Fabric Algorithm – XML paths
    • XISS – uses nodes as basic query unit
    • ViST – appx. 1/10 of time to perform queries due to (multiple) join operations
experimental results remove
Experimental Results - remove
  • Index Structure and Size (1/3 less from suffix tree)
    • DocId B+ Tree – N elements
    • Combined D-ancestor and S-ancestor B+ tree - N x L elements
  • Index Construction
conclusion
Conclusion
  • XML Queries = Subsequence Matching
  • Advantages of ViST – algorithm for subsequence matching
    • Avoids expensive join operations
    • Index on both content and structure of XML documents
    • B+ trees – supported by disk-based data
    • Dynamic data insertion and deletion