1 / 24

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. Lars Arge 1 , Mark de Berg 2 , Herman Haverkort 3 and Ke Yi 1 Department of Computer Science Duke University Department of Computer Science TU Eindhoven Institute of Information and Computing Sciences

Download Presentation

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1 Department of Computer Science Duke University Department of Computer Science TU Eindhoven Institute of Information and Computing Sciences Utrecht University

  2. Problem Definition • Input: • N rectangles in the plane • Window query Q • Output: • All rectangles intersecting Q • Applications • Spatial databases • GIS • CAD • Computer vision • Robotics • …

  3. R-Tree Fanout: Ө(B) B: disk block size • Definition [Guttman84]: • Advantages: • Little redundancy • Multi-purpose • Easy to update G F E B A H I A B C D E F G H I C D

  4. How to Build an R-Tree • Repeated insertions • [Guttman84] • R+-tree [Sellis et al. 87] • R*-tree [Beckmann et al. 90] • Bulkloading • Hilbert R-Tree [Kamel and Faloutos 94] • Top-down Greedy Split [Garcia et al. 98] • Advantages: • Much faster than repeated insertions • Better space utilization • Usually produce R-trees with higher quality

  5. R-Tree Variant: Hilbert R-Tree • To build a Hilbert R-Tree (cost: O(N/B logM/BN) I/Os) • Sort the rectangles by the Hilbert values of their centers • Build a B-tree on top • 4D Hilbert R-tree Hilbert Curve

  6. R-Tree Variant: TGS R-Tree (Top-down Greedy Split) • To build a TGS R-tree • Start from the root and buildthe tree top-down • To build one node, use binary cutsuntil the desired fan-out is reached • To make a binary cut, consider4 orderings of the rectangles: xmin, ymin, xmax, ymax • In each ordering, consider the B cutting positions • Choose the one that minimizes the sum of the areas of the two resulted bounding boxes • Typical bulk-load cost: O(N/B log2N) I/Os

  7. Our Results • None of existing R-tree variants has worst-case query performance guarantee! • In the worst-case, a query can visit all nodes in the tree even when the output size is zero • Priority R-Tree • The first R-tree variant that answers a query by visiting nodes in the worst case • T: Output size • It is optimal! • There exists a dataset such that for any R-tree, there is an empty query that visits nodes. [Kanth and Singh 99, Agarwal et al. 02]

  8. Roadmap • Pseudo-PR-Tree • Has the desired worst-case guarantee • Not a real R-tree • Transform a pseudo-PR-Tree into a PR-tree • A real R-tree • Maintain the worst-case guarantee • Experiments • PR-tree • Hilbert R-tree (2D and 4D) • TGS-R-tree

  9. Building a Pseudo-PR-Tree priority leaves root Step 1: take out B extreme rectangles from each direction and put them into priority leaves

  10. Building a Pseudo-PR-Tree Step 2: Divide by the xmin coordinates and build subtrees recursively. Division is performed using xmin, ymin, xmax, ymax in a round-robin fashion, like a 4D kd-tree root Analysis sketch: # nodes with at least one priority leafcompletely reported: O(T/B) # nodes with no priority leaf completely reported:

  11. Pseudo-PR-Tree to a Real R-tree

  12. Query Complexity Remains Unchanged Next level: # nodes visited on leaf level

  13. PR-Tree: Bulkload & Updates • Bulkload • O(N/B∙log2N) I/Os→O(N/B∙logM/BN) I/Os, using “grid method” [Agarwal et al. 01] • The same as Hilbert R-tree, but with a larger constant • Updates • Can use any previous heuristic to update in O(logBN) I/Os • Without worst-case query guarantee • Use logarithmic method • Insert: O(logBN + 1/B · logM/BN log2(N/M)) I/Os • Delete: O(logBN) I/Os • Extending to d-dimensions • Query bound: O((N/B)1-1/d + T/B), still optimal • Bulkload & update bounds remain the same

  14. Experiments • Implemented with TPIE • Priority R-tree • Hilbert R-tree • 4D Hilbert R-tree • TGS R-tree • Real-life data • TIGER datasets • 16 million rectangles • Synthetic data • Varying from normal to extreme data • 10 million rectangles

  15. Experiments with Real-Life Data Query performance on the TIGER datasets Shown: # I/Os spent in answering a query T/B

  16. Experiments with Synthetic Data: SIZE Each side of a rectangle is uniformly distributed in [0, max_side] Queries are squares with area 1%

  17. Experiments with Synthetic Data: ASPECT Fix the area, vary aspect ratio

  18. Experiments with Synthetic Data: SKEWED Randomly place points, then do y’=yc on the y-coordinates

  19. Experiments with Synthetic Data: CLUSTER

  20. Conclusions • In theory • The PR-tree is the first R-tree variant that answers a window query in I/Os worst-case, which is optimal • In practice • Roughly the same as previous best R-trees on real-life and relatively nicely distributed data • Outperforms them significantly on more extreme data • Future work • How previous heuristics may affect the performance of the PR-tree in the dynamic case

  21. Lower Bound Construction • Each bounding box intersects at leastqueries • N/B bounding boxes • queries • There exists a query that intersects at least bounding boxes

  22. Pseudo-PR-Tree: Query Complexity • Nodes v visited where all rectangles in at least one of the priority leaves of v’s parent are reported: O(T/B) • Let v be a node visited but none of the priority leaves at its parent are reported completely, consider v’s parent u 2D 4D Q ymin = ymax(Q) xmax = xmin(Q)

  23. Pseudo-PR-Tree: Query Complexity • The cell in the 4D kd-tree of u is intersected by two different 3-dimensional hyper-planes • The intersection of each pair of such 3-dimensional hyper-planes is a 2-dimensional hyper-plane • Lemma: # of cells in a d-dimensional kd-tree that intersect an axis-parallel f-dimensional hyper-plane is O((N/B)f/d) • So, # such cells in a 4D kd-tree: • Total # nodes visited: u

  24. Experiments with Real-Life Data • Datasets: TIGER/Line data • Bulk-loading:

More Related