Covering index for branching path queries
Download
1 / 43

Covering Index for Branching Path Queries - PowerPoint PPT Presentation

Covering Index for Branching Path Queries Raghav kaushik University of Wisconsin Philip Bohannon Bell Laboratories Jeffrey F Naughton University of Wisconsin Henry F Korth Bell Laboratories SIGMOD 2002 Presented by: Yu Fan Overview Motivation Problem Introduction Background

Related searches for Covering Index for Branching Path Queries

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Covering Index for Branching Path Queries

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Covering index for branching path queries l.jpg
Covering Index for Branching Path Queries

Raghav kaushik

University of Wisconsin

Philip Bohannon

Bell Laboratories

Jeffrey F Naughton

University of Wisconsin

Henry F Korth

Bell Laboratories

SIGMOD 2002

Presented by: Yu Fan


Overview l.jpg
Overview

  • Motivation

  • Problem

  • Introduction

  • Background

  • Covering Index Definition Scheme

  • Performance Study

  • Conclusion


Motivation l.jpg
Motivation

  • Covering index is a well-known technique in relation database systems

    • Define an index that “cover” all attributes of a table that are referenced in a query

    • Evaluate query without the table

    • Speed up query performance

  • Can covering index used to accelerate the branching path queries?

    • Yes


Problem l.jpg
Problem

  • The existing index are large in practice

    • DataGuide

    • 1-Index

    • Forward and Backward Index (F&B Index)


The labeled graph data model l.jpg
The Labeled Graph Data Model

  • Model XML or semi-structured data as a directed, node-labeled tree with extra set of special edges called idrefedges

  • Directed graph



Branching path expressions l.jpg
Branching Path Expressions

  • Forward and Backward Separators

    • If ni and ni+1 are separated by a

      • /: then ni is the parent of ni+1

      • //: then ni is the ancestor of ni+1

      • : then ni points to ni+1 through an idref edge

      • \: then ni is the child of ni+1

      • \\:then ni is the descendant of ni+1

      • : then ni is poined byni+1 through an idref edge


Branching path expressions8 l.jpg
Branching Path Expressions

  • Label-path

    • A sequence of labels l1, l2,…lp separated by the separators

  • Node-path

    • A sequence of nodes n1,n2,…np separated by the separators

  • A node-path matches a label-path if the corresponding separators are the same and label(ni) = li


Branching path expressions9 l.jpg
Branching Path Expressions

  • Primary path is the path that remains when all parts between brackets “[” and “]” are removed.

  • Example:

    Root/metro/neighorhoods/neighbornood[/business hotel]/cultural museum


Index graph l.jpg
Index Graph

  • Index Graph I(G), where G is the data graph

  • A is the node in I, ext(A), the extent of A, is the subset of VG

  • Query result

    • A branching path expression P on I(G)

    • Union of the extents of the index nodes that result from evaluating P on I(G)


Bisimularity l.jpg
Bisimularity

  • Definition: a symmetric, binary relation  on VG is called a bisimulation if, for any two data nodes u and v with u  v, we have that:

    • u and v have the same label

    • If paru is the parent of u and parv is the parent of v, then paru  parv

    • If u’ points to u through an idref edge, then there is a v’ that points to v through an idref such that u’  v’, and vice-versa.


Dataguide l.jpg
DataGuide

  • Concise and accurate structural summaries of semi-structured databases


1 index l.jpg
1-index

  • Index graph which is constructed on data graph G using bisimulation

  • Intuition: try to group together nodes if they have the same incoming paths


Forward and backward index l.jpg
Forward and Backward index

  • Construct F&B-Index on edge-labeled data graph

    • For every (edge) label l, add a new label l-1

    • For every edge e labeled l from node u to node v, add an (inverse) edge e-1 with label l-1 from v to u

    • Compute the 1-Index (or DataGuide) on this modified graph


Succ stable and pred stable l.jpg
Succ-Stable and Pred-Stable

  • For a set of nodes A, Let Succ(A) denote the set of successors of the nodes in A.

  • Given two sets of data graph nodes A and B, A is said to be succ-stable with respect to B if either A is a subset of Succ(B) or A and Succ(B) are disjoint

  • Pred-stable can be defined in the same way


Stability l.jpg
Stability

  • If A is succ-stable with respect to B and there is an edge from B to A, then every note in extent of A has a parent in the extent of B

  • Important for precision of index graph

  • Stabilize A and B

    • Splite A into A1 and A2

    • A1 is A  succ(B)

    • A2 is A – succ(B)

  • 1-Index

    • Initialization by label grouping

    • Splitting the label grouping till we obtain succ-stable refinement


Another view of f b index l.jpg
Another View of F&B-Index

  • Another way to build F&B-Index

    • Reverse all edges in G

    • Compute the bisimilarity partition

    • Set the current partition to what is output by the previous step

    • Reverse edges in G again

    • Compute the bisimilarity partition

    • Set the current partition to what is output by the previous step

    • Repeat the above steps till the current partition does not change

  • Obtain a partition of the data nodes that is both succ-stable and pred-stable


Size of the f b index l.jpg
Size of the F&B-Index

  • F&B-Index over a data graph G covers all branching path expressions over G

  • Any index graph that covers all branching path expressions over G must be a refinement of F&B Index

  • F&B-Index is the smallest index graph that covers all branching path expressions over G

  • F&B-Index is often big. It can approach the size of the base data itself


Covering index definition scheme l.jpg
Covering Index Definition Scheme

  • Eliminating branching path expressions which are deemed less important.

  • Smaller index handling the remaining branching queries more efficiently

  • Four approaches towards the goal

    • Tags to be indexed

    • Tree edges vs idref edges

    • Exploiting local similarity

    • Restricting tree depth


Tags to be indexed l.jpg
Tags to be indexed

  • Tags that never queried

    • Need not be indexed

    • Alter the label with a unique label: other

    • If not in the tree path to any node that is indexed, it can be assumed to be absent

  • Can have a lot of effect in practice

    • XMark data, 100MB(1.43M nodes)

    • F&B-Index has 436000 nodes

    • Ignore text tags such as bold and emph

    • Number of nodes drops to 18000


Tree edges vs idref edges l.jpg
Tree Edges vs idref Edges

  • Effect of idref edges

    • XMard data

    • F&B-Index on tree edges and idref edges has 1.35M nodes (ignore text nodes)

    • F&B-Index on only tree edges has 18000 nodes (ignore text nodes)

  • Give tree edges priority

  • Specify the set of idref edges to be indexed


Exploiting local similarity l.jpg
Exploiting Local similarity

  • Observations:

    • Most queries refer to short paths and seldom ask for long paths

    • Two nodes are locally similar, but they may be stored in different extents due to a variety of complex paths

  • Exploiting local similarity

    • Give up absolute precision and group similar pieces of data together

    • A(k)-Index


K bisimulation l.jpg
K-bisimulation

  • Definition: k (k-bisimilarity) is defined inductively

    • For any two nodes, v and v, u 0 v iff u and v have the same label

    • Node u kv iff u k-1v, paru k-1 parv

    • For every u’ that points to u through an idref edge, there is a v’ that points to v through an idref edge such that u’ k-1 v’, and vice versa


A k index l.jpg
A(k)-index

  • Constructed on data graph G using k-bisimulation

  • Precise for any simple path expression of length less than or equal to k

  • Use k to control the size of the index and the maximum area of the index graph affected

  • Increasing k refines the partition until a fixed point is reached, which is 1-Index.



Restricting tree depth l.jpg
Restricting Tree Depth

  • Tree Depth

    • Given a branching path expression

    • All nodes that do not have tree-depth 0

    • Nodes that have a path from some node in the primary path have tree-depth 1

    • Nodes that do not have tree-depth 1 and have a path to some node of tree-depth 1 have tree-depth 2

    • Nodes that do not have tree-depth 2 and have a path from some node of tree-depth 2 have tree-depth 3

    • And so on…

  • Tree depth of a query is the maximum tree-depth of its nodes


Tree depth example l.jpg
Tree Depth Example

  • Query example

    • //museums/history/museum[/featured and cultural\neighborhood [/cultural  museum [\art]]]

    • asks for history museums that have a featured exhibit and also have an art museum in the same neighborhood


F b index l.jpg
F+B-Index

  • Consider one iteration of F&B-Index Computation

    • Reverse all edges in G.

    • Compute the bisimilarity partition

    • Reverse edges in G again

    • Compute the bisimilarity partition

  • Call this index graph F+B-Index

  • F+B+F+B-Index: two iteration


F b index29 l.jpg
F+B-Index

  • F+B-Index is accurate for branching path expressions that have tree depth at most 1

  • F+B+F+B-Index is accurate for branching path expressions that have tree depth at most 3

  • Can not handle all the queries

  • Meaningful queries are often with small tree depth


Putting it together l.jpg
Putting it together

  • Index definition

    • A set of tags T to be indexed.

    • For each of the forward and backward didrecions

      • Set of idref edges to be indexed (denote as reffwd and refback)

      • The extent of local similarity desired (denote as kfwd and kback)

    • Tree depth td, the number of iterations in the F&B-index computation to be performed




Example l.jpg
Example

  • Tags to be indexed

    • ROOT, metro, cinema-hall, neighborhoods, neighborhood, business

  • Local similatiry

    • kfwd= kback = ∞

    • td = ∞

ROOT

metro

business

neighborhoods

neighborhood

neighborhood

Cinema-halls

9,10

business

Cinema-hall

business

24,26



Index selection l.jpg
Index Selection

  • Given query

    • The tag should be indexed

    • kfwd≥ path length of the query

    • kback ≥ path length of the query

    • td ≥ tree depth of the query

  • More generic index, more queries coverd, worse performance we get.

  • Depends heavily on the data and the queries


Performance study l.jpg
Performance study

  • XMark XML benchmark dataset

    • Models an auction site




Performance on queries l.jpg
Performance on Queries

  • Use defn 5,6,8, called Iall, Ialmost-alland Ispecific

  • Use 5 different queries

  • Some index may not cover the queries due to the reduction

  • Three scenarios

    • RELSTORE: stored in relational system

    • NSTORE: stored using a native storage engine

    • RELPUBLISH: stored in relation system and queries are over an XML view of data



Performance on queries41 l.jpg
Performance on Queries

(a)

(b)

(a): RELSTORE

(b): NSTORE

(c): RELPUBLISH

(c)


Conclusion l.jpg
Conclusion

  • Covering indexes are a promising approach to their efficient evaluation

  • F&B-Index can be a covering index for all set of branching path queries, but the size of the index is to big in practice

  • Using scheme definition, we can get much smaller covering indexes that cover certain classes of queries



ad
  • Login