The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1 Department of Computer Science Duke University Department of Computer Science TU Eindhoven Institute of Information and Computing Sciences Utrecht University

Problem Definition • Input: • N rectangles in the plane • Window query Q • Output: • All rectangles intersecting Q • Applications • Spatial databases • GIS • CAD • Computer vision • Robotics • …

R-Tree Fanout: Ө(B) B: disk block size • Definition [Guttman84]: • Advantages: • Little redundancy • Multi-purpose • Easy to update G F E B A H I A B C D E F G H I C D

How to Build an R-Tree • Repeated insertions • [Guttman84] • R+-tree [Sellis et al. 87] • R*-tree [Beckmann et al. 90] • Bulkloading • Hilbert R-Tree [Kamel and Faloutos 94] • Top-down Greedy Split [Garcia et al. 98] • Advantages: • Much faster than repeated insertions • Better space utilization • Usually produce R-trees with higher quality

R-Tree Variant: Hilbert R-Tree • To build a Hilbert R-Tree (cost: O(N/B logM/BN) I/Os) • Sort the rectangles by the Hilbert values of their centers • Build a B-tree on top • 4D Hilbert R-tree Hilbert Curve

R-Tree Variant: TGS R-Tree (Top-down Greedy Split) • To build a TGS R-tree • Start from the root and buildthe tree top-down • To build one node, use binary cutsuntil the desired fan-out is reached • To make a binary cut, consider4 orderings of the rectangles: xmin, ymin, xmax, ymax • In each ordering, consider the B cutting positions • Choose the one that minimizes the sum of the areas of the two resulted bounding boxes • Typical bulk-load cost: O(N/B log2N) I/Os

Our Results • None of existing R-tree variants has worst-case query performance guarantee! • In the worst-case, a query can visit all nodes in the tree even when the output size is zero • Priority R-Tree • The first R-tree variant that answers a query by visiting nodes in the worst case • T: Output size • It is optimal! • There exists a dataset such that for any R-tree, there is an empty query that visits nodes. [Kanth and Singh 99, Agarwal et al. 02]

Roadmap • Pseudo-PR-Tree • Has the desired worst-case guarantee • Not a real R-tree • Transform a pseudo-PR-Tree into a PR-tree • A real R-tree • Maintain the worst-case guarantee • Experiments • PR-tree • Hilbert R-tree (2D and 4D) • TGS-R-tree

Building a Pseudo-PR-Tree priority leaves root Step 1: take out B extreme rectangles from each direction and put them into priority leaves

Building a Pseudo-PR-Tree Step 2: Divide by the xmin coordinates and build subtrees recursively. Division is performed using xmin, ymin, xmax, ymax in a round-robin fashion, like a 4D kd-tree root Analysis sketch: # nodes with at least one priority leafcompletely reported: O(T/B) # nodes with no priority leaf completely reported:

Pseudo-PR-Tree to a Real R-tree

Query Complexity Remains Unchanged Next level: # nodes visited on leaf level

PR-Tree: Bulkload & Updates • Bulkload • O(N/B∙log2N) I/Os→O(N/B∙logM/BN) I/Os, using “grid method” [Agarwal et al. 01] • The same as Hilbert R-tree, but with a larger constant • Updates • Can use any previous heuristic to update in O(logBN) I/Os • Without worst-case query guarantee • Use logarithmic method • Insert: O(logBN + 1/B · logM/BN log2(N/M)) I/Os • Delete: O(logBN) I/Os • Extending to d-dimensions • Query bound: O((N/B)1-1/d + T/B), still optimal • Bulkload & update bounds remain the same

Experiments • Implemented with TPIE • Priority R-tree • Hilbert R-tree • 4D Hilbert R-tree • TGS R-tree • Real-life data • TIGER datasets • 16 million rectangles • Synthetic data • Varying from normal to extreme data • 10 million rectangles

Experiments with Real-Life Data Query performance on the TIGER datasets Shown: # I/Os spent in answering a query T/B

Experiments with Synthetic Data: SIZE Each side of a rectangle is uniformly distributed in [0, max_side] Queries are squares with area 1%

Experiments with Synthetic Data: ASPECT Fix the area, vary aspect ratio

Experiments with Synthetic Data: SKEWED Randomly place points, then do y’=yc on the y-coordinates

Experiments with Synthetic Data: CLUSTER

Conclusions • In theory • The PR-tree is the first R-tree variant that answers a window query in I/Os worst-case, which is optimal • In practice • Roughly the same as previous best R-trees on real-life and relatively nicely distributed data • Outperforms them significantly on more extreme data • Future work • How previous heuristics may affect the performance of the PR-tree in the dynamic case

Lower Bound Construction • Each bounding box intersects at leastqueries • N/B bounding boxes • queries • There exists a query that intersects at least bounding boxes

Pseudo-PR-Tree: Query Complexity • Nodes v visited where all rectangles in at least one of the priority leaves of v’s parent are reported: O(T/B) • Let v be a node visited but none of the priority leaves at its parent are reported completely, consider v’s parent u 2D 4D Q ymin = ymax(Q) xmax = xmin(Q)

Pseudo-PR-Tree: Query Complexity • The cell in the 4D kd-tree of u is intersected by two different 3-dimensional hyper-planes • The intersection of each pair of such 3-dimensional hyper-planes is a 2-dimensional hyper-plane • Lemma: # of cells in a d-dimensional kd-tree that intersect an axis-parallel f-dimensional hyper-plane is O((N/B)f/d) • So, # such cells in a 4D kd-tree: • Total # nodes visited: u

Experiments with Real-Life Data • Datasets: TIGER/Line data • Bulk-loading:

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

Presentation Transcript

Tree Inventory PDA Utilities:

The Giving Tree

10/29/15

Binary Recursion Tree

Tell Me, Tree

Red-black trees

R99945020 林澤豪 F98942047 許芷榕 R00922113 謝宗潛 R 98922144 駱家淮

Chapter 12

Some evolutionary tree reconstruction problems in computational biology

Decision Tree Classification Prof. Navneet Goyal BITS, Pilani BITS C464 – Machine Learning

Traversing a Binary Tree Binary Search Tree Insertion Deleting from a Binary Search Tree

Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree

Bidirectional Online Construction of Affix Tree

Sorting

Binary Trees

Chapter 9

Tree Identification Powerpoint