covering index for branching path queries l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Covering Index for Branching Path Queries PowerPoint Presentation
Download Presentation
Covering Index for Branching Path Queries

Loading in 2 Seconds...

play fullscreen
1 / 43

Covering Index for Branching Path Queries - PowerPoint PPT Presentation


  • 320 Views
  • Uploaded on

Covering Index for Branching Path Queries Raghav kaushik University of Wisconsin Philip Bohannon Bell Laboratories Jeffrey F Naughton University of Wisconsin Henry F Korth Bell Laboratories SIGMOD 2002 Presented by: Yu Fan Overview Motivation Problem Introduction Background

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Covering Index for Branching Path Queries' - elina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
covering index for branching path queries

Covering Index for Branching Path Queries

Raghav kaushik

University of Wisconsin

Philip Bohannon

Bell Laboratories

Jeffrey F Naughton

University of Wisconsin

Henry F Korth

Bell Laboratories

SIGMOD 2002

Presented by: Yu Fan

overview
Overview
  • Motivation
  • Problem
  • Introduction
  • Background
  • Covering Index Definition Scheme
  • Performance Study
  • Conclusion
motivation
Motivation
  • Covering index is a well-known technique in relation database systems
    • Define an index that “cover” all attributes of a table that are referenced in a query
    • Evaluate query without the table
    • Speed up query performance
  • Can covering index used to accelerate the branching path queries?
    • Yes
problem
Problem
  • The existing index are large in practice
    • DataGuide
    • 1-Index
    • Forward and Backward Index (F&B Index)
the labeled graph data model
The Labeled Graph Data Model
  • Model XML or semi-structured data as a directed, node-labeled tree with extra set of special edges called idrefedges
  • Directed graph
branching path expressions
Branching Path Expressions
  • Forward and Backward Separators
    • If ni and ni+1 are separated by a
      • /: then ni is the parent of ni+1
      • //: then ni is the ancestor of ni+1
      • : then ni points to ni+1 through an idref edge
      • \: then ni is the child of ni+1
      • \\:then ni is the descendant of ni+1
      • : then ni is poined byni+1 through an idref edge
branching path expressions8
Branching Path Expressions
  • Label-path
    • A sequence of labels l1, l2,…lp separated by the separators
  • Node-path
    • A sequence of nodes n1,n2,…np separated by the separators
  • A node-path matches a label-path if the corresponding separators are the same and label(ni) = li
branching path expressions9
Branching Path Expressions
  • Primary path is the path that remains when all parts between brackets “[” and “]” are removed.
  • Example:

Root/metro/neighorhoods/neighbornood[/business hotel]/cultural museum

index graph
Index Graph
  • Index Graph I(G), where G is the data graph
  • A is the node in I, ext(A), the extent of A, is the subset of VG
  • Query result
    • A branching path expression P on I(G)
    • Union of the extents of the index nodes that result from evaluating P on I(G)
bisimularity
Bisimularity
  • Definition: a symmetric, binary relation  on VG is called a bisimulation if, for any two data nodes u and v with u  v, we have that:
    • u and v have the same label
    • If paru is the parent of u and parv is the parent of v, then paru  parv
    • If u’ points to u through an idref edge, then there is a v’ that points to v through an idref such that u’  v’, and vice-versa.
dataguide
DataGuide
  • Concise and accurate structural summaries of semi-structured databases
1 index
1-index
  • Index graph which is constructed on data graph G using bisimulation
  • Intuition: try to group together nodes if they have the same incoming paths
forward and backward index
Forward and Backward index
  • Construct F&B-Index on edge-labeled data graph
    • For every (edge) label l, add a new label l-1
    • For every edge e labeled l from node u to node v, add an (inverse) edge e-1 with label l-1 from v to u
    • Compute the 1-Index (or DataGuide) on this modified graph
succ stable and pred stable
Succ-Stable and Pred-Stable
  • For a set of nodes A, Let Succ(A) denote the set of successors of the nodes in A.
  • Given two sets of data graph nodes A and B, A is said to be succ-stable with respect to B if either A is a subset of Succ(B) or A and Succ(B) are disjoint
  • Pred-stable can be defined in the same way
stability
Stability
  • If A is succ-stable with respect to B and there is an edge from B to A, then every note in extent of A has a parent in the extent of B
  • Important for precision of index graph
  • Stabilize A and B
    • Splite A into A1 and A2
    • A1 is A  succ(B)
    • A2 is A – succ(B)
  • 1-Index
    • Initialization by label grouping
    • Splitting the label grouping till we obtain succ-stable refinement
another view of f b index
Another View of F&B-Index
  • Another way to build F&B-Index
    • Reverse all edges in G
    • Compute the bisimilarity partition
    • Set the current partition to what is output by the previous step
    • Reverse edges in G again
    • Compute the bisimilarity partition
    • Set the current partition to what is output by the previous step
    • Repeat the above steps till the current partition does not change
  • Obtain a partition of the data nodes that is both succ-stable and pred-stable
size of the f b index
Size of the F&B-Index
  • F&B-Index over a data graph G covers all branching path expressions over G
  • Any index graph that covers all branching path expressions over G must be a refinement of F&B Index
  • F&B-Index is the smallest index graph that covers all branching path expressions over G
  • F&B-Index is often big. It can approach the size of the base data itself
covering index definition scheme
Covering Index Definition Scheme
  • Eliminating branching path expressions which are deemed less important.
  • Smaller index handling the remaining branching queries more efficiently
  • Four approaches towards the goal
    • Tags to be indexed
    • Tree edges vs idref edges
    • Exploiting local similarity
    • Restricting tree depth
tags to be indexed
Tags to be indexed
  • Tags that never queried
    • Need not be indexed
    • Alter the label with a unique label: other
    • If not in the tree path to any node that is indexed, it can be assumed to be absent
  • Can have a lot of effect in practice
    • XMark data, 100MB(1.43M nodes)
    • F&B-Index has 436000 nodes
    • Ignore text tags such as bold and emph
    • Number of nodes drops to 18000
tree edges vs idref edges
Tree Edges vs idref Edges
  • Effect of idref edges
    • XMard data
    • F&B-Index on tree edges and idref edges has 1.35M nodes (ignore text nodes)
    • F&B-Index on only tree edges has 18000 nodes (ignore text nodes)
  • Give tree edges priority
  • Specify the set of idref edges to be indexed
exploiting local similarity
Exploiting Local similarity
  • Observations:
    • Most queries refer to short paths and seldom ask for long paths
    • Two nodes are locally similar, but they may be stored in different extents due to a variety of complex paths
  • Exploiting local similarity
    • Give up absolute precision and group similar pieces of data together
    • A(k)-Index
k bisimulation
K-bisimulation
  • Definition: k (k-bisimilarity) is defined inductively
    • For any two nodes, v and v, u 0 v iff u and v have the same label
    • Node u kv iff u k-1v, paru k-1 parv
    • For every u’ that points to u through an idref edge, there is a v’ that points to v through an idref edge such that u’ k-1 v’, and vice versa
a k index
A(k)-index
  • Constructed on data graph G using k-bisimulation
  • Precise for any simple path expression of length less than or equal to k
  • Use k to control the size of the index and the maximum area of the index graph affected
  • Increasing k refines the partition until a fixed point is reached, which is 1-Index.
restricting tree depth
Restricting Tree Depth
  • Tree Depth
    • Given a branching path expression
    • All nodes that do not have tree-depth 0
    • Nodes that have a path from some node in the primary path have tree-depth 1
    • Nodes that do not have tree-depth 1 and have a path to some node of tree-depth 1 have tree-depth 2
    • Nodes that do not have tree-depth 2 and have a path from some node of tree-depth 2 have tree-depth 3
    • And so on…
  • Tree depth of a query is the maximum tree-depth of its nodes
tree depth example
Tree Depth Example
  • Query example
    • //museums/history/museum[/featured and cultural\neighborhood [/cultural  museum [\art]]]
    • asks for history museums that have a featured exhibit and also have an art museum in the same neighborhood
f b index
F+B-Index
  • Consider one iteration of F&B-Index Computation
    • Reverse all edges in G.
    • Compute the bisimilarity partition
    • Reverse edges in G again
    • Compute the bisimilarity partition
  • Call this index graph F+B-Index
  • F+B+F+B-Index: two iteration
f b index29
F+B-Index
  • F+B-Index is accurate for branching path expressions that have tree depth at most 1
  • F+B+F+B-Index is accurate for branching path expressions that have tree depth at most 3
  • Can not handle all the queries
  • Meaningful queries are often with small tree depth
putting it together
Putting it together
  • Index definition
    • A set of tags T to be indexed.
    • For each of the forward and backward didrecions
      • Set of idref edges to be indexed (denote as reffwd and refback)
      • The extent of local similarity desired (denote as kfwd and kback)
    • Tree depth td, the number of iterations in the F&B-index computation to be performed
example
Example
  • Tags to be indexed
    • ROOT, metro, cinema-hall, neighborhoods, neighborhood, business
  • Local similatiry
    • kfwd= kback = ∞
    • td = ∞

ROOT

metro

business

neighborhoods

neighborhood

neighborhood

Cinema-halls

9,10

business

Cinema-hall

business

24,26

index selection
Index Selection
  • Given query
    • The tag should be indexed
    • kfwd≥ path length of the query
    • kback ≥ path length of the query
    • td ≥ tree depth of the query
  • More generic index, more queries coverd, worse performance we get.
  • Depends heavily on the data and the queries
performance study
Performance study
  • XMark XML benchmark dataset
    • Models an auction site
performance on queries
Performance on Queries
  • Use defn 5,6,8, called Iall, Ialmost-alland Ispecific
  • Use 5 different queries
  • Some index may not cover the queries due to the reduction
  • Three scenarios
    • RELSTORE: stored in relational system
    • NSTORE: stored using a native storage engine
    • RELPUBLISH: stored in relation system and queries are over an XML view of data
performance on queries41
Performance on Queries

(a)

(b)

(a): RELSTORE

(b): NSTORE

(c): RELPUBLISH

(c)

conclusion
Conclusion
  • Covering indexes are a promising approach to their efficient evaluation
  • F&B-Index can be a covering index for all set of branching path queries, but the size of the index is to big in practice
  • Using scheme definition, we can get much smaller covering indexes that cover certain classes of queries