1 / 24

Covering Indexes for Branching Path Queries

Covering Indexes for Branching Path Queries. Raghav Kaushik , Philip Bohannon, Jeffrey F Naughton and Henry F Korth. XML as Graph Data. Leaf nodes with attributes are suppressed. oid. label(3). Non-tree edges: model IDREF relationships in the document. Branching Path Expression.

tokala
Download Presentation

Covering Indexes for Branching Path Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Covering Indexes for Branching Path Queries RaghavKaushik, Philip Bohannon, Jeffrey F Naughtonand Henry F Korth Abdullah Mueen

  2. XML as Graph Data Leaf nodes with attributes are suppressed oid label(3) Non-tree edges: model IDREF relationships in the document Abdullah Mueen

  3. Branching Path Expression ROOT/metro/neighborhoods/neighborhood [/business=>cinema-hall]/cultural=>museum Abdullah Mueen

  4. Example (1) //hotel[/star][<=business\neighborhood[/cultural=>museum[\art]]] Abdullah Mueen

  5. Covering Index • A covering index can answer any query from a set of queries without consulting with the original document. • The GOAL of this paper is to find a covering index for “Branching Path Queries” . Abdullah Mueen

  6. k-bisimilarity R 0 Two nodes u and v are called k-bisimilar(u ≈k v) if label(u) = label(v) every incoming label path of length≤kto u matches with at least one incoming path of length≤kto v and vice versa. C A B 1 3 2 C B D 4 5 6 C D 7 8 D 9 • ≈k defines an equivalence class over the set of nodes in G • The algorithm for computing k-bisimulation will be shown later • 2,4 are 0-bisimilar. • 5,7 are 1-bisimilar • 8,9 are 2-bisimilar • 6,8 are 1-bisimilar Abdullah Mueen

  7. 1-index : Covering Index for Simple Path Expression R 0 SuccStable R 11 R 11 C A C B A 12 14 B 13 1 3 {3,5,7} 2 {1} {2,4} C D 15 A B 12 14 13 {6,8,9} C {1} B D SuccStable {3} 4 5 6 {2} A(0) A(1) C R 11 B D R 11 15 16 17 SuccStable C D {5,7} {6,8,9} 7 8 A C C {4} A B B 12 14 12 14 13 13 {1} {1} {3} {3} {2} {2} C D C B 9 B D D 15 16 17 15 16 17 {4} {5} {5} {6} {4} {6} D data graph G D C 18 C 19 18 19 {8} {7} {8,9} {7} D 18 A(2) A(3) = 1-index {9} Abdullah Mueen Abdullah Mueen 7

  8. Inverse edges R R 0 0 C C A A B B 1 3 1 3 2 2 C D C D B B 4 5 6 4 5 6 C D C D 7 8 7 8 D D 9 9 • 5,7 are not 1-bisimilar • 5,7 are 1-bisimilar Abdullah Mueen

  9. The F&B index • While there is no change • Reverse all edges • Compute Forward Bismilarity Partition • Reverse all edges again. • Compute Backward Bisimilarity Partition Abdullah Mueen

  10. Forward Bisimulation R R R R 0 0 0 0 C C C C A A A A B B B B 1 1 3 3 1 1 3 3 2 2 2 2 C C C C B B B B D D D D 4 4 5 5 6 6 4 4 5 5 6 6 C C C C D D D D 7 7 8 8 7 7 8 8 D D D D 9 9 9 9 Abdullah Mueen

  11. Backword Bisimulation R R R 0 0 0 C C C A A A B B B 1 3 1 3 1 3 2 2 2 C C C D D D B B B 4 5 6 4 5 6 4 5 6 C C C D D D 7 8 7 8 7 8 D D D 9 9 9 Abdullah Mueen

  12. Properties of F&B index • The F&B index over a data graph G covers all branching path expression. • F&B index is the smallest of the indexes that covers branching path queries. • Generally F&B is large for most of the real documents. Abdullah Mueen

  13. 1. Tags to be indexed • There are tags that are not used for Queries. • bold, emph • We specify set of tags to be indexed. • In a 100MB document, the F&B index on all tags has 436,000 nodes while ignoring formatting tags it has 18,000 nodes. Abdullah Mueen

  14. 2. IDREF edges to be indexed • IDREF edges are not counted in // operation. • IDREF edges are explicitly described in the path expression by => operator. • We specify the Set of IDREF edges to be indexed. • The 100MB document has 1.3 million nodes with all IDREF edges while it has 18,000 nodes without any IDREF edges and formatting tags. Abdullah Mueen

  15. 3. Exploiting Local Similarity • Long Queries are not frequent and interesting. • If we restrict the length of the possible queries, we can get much smaller index tree than the F&B index. • We specify the length of the local path by using k-bisimilarity instead of bisimilarity while computing the F&B index. Abdullah Mueen

  16. 4. Restricting Tree Depth • Long nested conditions are less likely to occur. • We specify the maximum depth of the conditional path expression by tree-depth (defined next). Abdullah Mueen

  17. tree depth //museums/history/museum[/featured and <=cultural\neighborhood[/cultural=>museum[\art]]] Abdullah Mueen

  18. Definition of an Index • A set of tags T • Set of IDREF edges on both directions reffwd and refbwd • Two parameters kbwdandkfwdto restrict the length of the path queries • One parameter tdto restrict the depth of the nested conditional expression. Abdullah Mueen

  19. The BPCI index • Remove all tags not in T such that the removal does not cut out a tag in T. • Start with label groupingas current partitionP • For i=0 and i≤td • Reverse all edges in G, retain IDREF edges only in reffwd . • P ← Forward kfwd-Bismilar Partition of P and inc(i) • Reverse all edges in G again, retain IDREF edges only in refbwd. • P ← Backward kbwd-Bisimilar Partition of P and inc(i) Abdullah Mueen

  20. Variations of BPCI Abdullah Mueen

  21. Testing if an Index covers a Query • Build the Query graph • Check if all tags and IDREF edges in the query are in T and in (refbwdU reffwd) • Check if the tree depth of the query is less than td of the index • Check if all paths in the query with even tree depth have length < kbwd • Check if all paths in the query with odd tree depth have length < kfwd Abdullah Mueen

  22. Result on Xmark benchmark Iall is the F&B index Iallmost-all is F&B with kfwd = 1 Ispecificis built on the query Abdullah Mueen

  23. Result Abdullah Mueen

  24. Conlclusion • BPCI is the covering index for Branching Path Queries. • By setting appropriate parameters, we can get a wide range of queries suitable for various applications. • Extensions • Updating and Bulk loading • Integration with value indexes Abdullah Mueen

More Related