R-Trees

R-Trees 2-dimensional indexing structure

R-trees • 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes with k children have k-1 split values

R-trees • Can store: • a set of polygons (regions of a subdivision) • a set of polygonal lines (or boundaries) • a set of points • a mix of the above • Stored objects may overlap

R-trees • Originally by Guttman, 1984 • Dozens of variations and optimizations since • Suitable for windowing, point location and intersection queries • Heuristic structure, no order bounds ( O(..) ) • Tree with higher degree: suitable for background storage (short search paths);one node per disk block

Every internal node contains entries (rectangle, pointer to child node) All leaves contain entries (rectangle, pointer to object) in database or file Rectangles are minimal bounding rectangles (MBR) The root has  2 and  M entries All other nodes have at least m and at most M entries All leaves have the same depth m > 1 and M > 2m(e.g. m = 200;M = 1000) Definition R-tree

Object descriptions

Grouping of objects Windowing query: the fewer rectangles intersected, the fewer subtrees to descend into

Grouping of objects • Objects close together in same leaves small rectangles  queries descend in only few subtrees • Group the child nodes under a parent node such that small rectangles arise

Heuristics for fast queries • Small area of rectangles • Small perimeter of rectangles • Little overlap among rectangles • Well-filled nodes (tree less deep  fewer disk accesses on each search path)

Example R-tree

Object descriptions

point containment query

Searching in an R-tree • Q is query object (point, window, object) • For each rectangle R in the current node,if Q and R intersect, • search recursively in the subtree under the pointer at R (at an internal node) • get the object corresponding to R and test for intersection with R (at a leaf)

Inserting in an R-tree • Determine minimal bounding rectangle (MBR) of new object • When not yet at a leaf (choose subtree): • determine rectangle whose area increment after insertion of R is smallest • increase this rectangle if necessary and insert R • At a leaf: • if there is space, insert, otherwise Split Node

Split Node • Divide the M+1 rectangles into two groups, each with at least m and at most M rectangles • Make a node for each group, with the rectangles and corresponding subtrees as entries • Hang the two new nodes under the parent node in the place of the overfull node; determine the new MBRs (if the root was overfull, make a new root with two child nodes) • If the parent has M+1 children, repeat Split Node with this parent

Split Node, example New MBRs

Strategies for Split Node, I • Determine R1 and R2 with largest MBR: the seeds for sets S1 and S2 • While |S1| , |S2| < M - m and not all rectangles distributed: • Take not yet distributed rectangle Rj, add tothe setwhose MBR increases least Linear R-tree of Guttman, 1984

Example Split Node I

Strategies for Split Node, II • Determine R1 and R2 with largest area(MBR)-area(R1) - area(R2): the seeds for sets S1 and S2 • While |S1| , |S2| < M - m and not all distributed: • Determine of every not yet distributed rectangle Rj:d1 = area increment of MBR(S1 Rj) (* w.r.t. MBR(S1) *)d2 = area increment of MBR(S2Rj) (* w.r.t. MBR(S2) *) • Choose Ri with maximal | d1 - d2 | ; add it to theset with smallest area increment Quadratic R-tree of Guttman, 1984

Example Split Node, II

Strategies for Split Node, III • Determine R1 and R2 with largest area(MBR)-area(R1) - area(R2): the seeds for setsS1 and S2 (* same as quadratic R-tree *) • Determine axis with largest normalized separation of R1 and R2( x-separation / x-range of MBR(R1 R2), ory-separation / y-range of MBR(R1 R2) ) • Sort rectangles according to that axis (lower left corner) and split evenly in subsets of size (M+1) / 2 Greene’s split, 1989

Example Split Node, III Y-axis has largestnormalized separation

Deletion from an R-tree • Find the leaf (node) and delete object; determine new (possibly smaller) MBR • If the node is too empty (<m entries): • delete the node recursively at its parent • insert all entries of the deleted node into the R-tree • Note: Insertions of entries/subtrees always occurs at the level where it came from

Insert as rectangle on middle level

Insert in a leaf object

R*-trees • Experimentally determined measures for choices at insertion (Choose Subtree, Split Node) • Experimentally determined algorithms for: • Choose Subtree • Split Node

R*-trees; Choose Subtree • At nodes directly above leaves: Choose entry (rectangle) with smallest overlap-increase • At higher nodes: Choose entry (rectangle) with smallest area-increase (same as before) R ,…, Rare the entry rectangles p 1

R*-trees; Split Node Determine split axis: • For both the x- and the y-axis: • sort the rectangles by smallest and largest coordinate • determine the M - 2m + 2 allowed distributions into two groups • determine for each: the perimeter of the two MBRs • add the M - 2m + 2 perimeter lengths • Choose the axis with smallest sum of perimeters m m M - 2m + 1

R*-trees; Split Node Determine split index (given the split axis): • Choose the distribution, among the M - 2m + 2, with the smallest area of intersection of the MBRs

Nearest neighbor queries • An R-tree can be used for nearest neighbor queries • The idea is to perform a DFS, maintain the closest object so far and use the distance for pruning pruned closest object so far queried

1 4 2 5 3

Forced reinsert • Build R-tree by repeated insertion: first inserted rectangles are possibly badly placed • Experiment: • make R-tree by inserting 20.000 rectangles • again, but afterwards, delete the first inserted 10.000 and insert them again! • Search time improvement of 20-50% !

Summary R-trees • Versatile 2-dimensional search tree (referred to as: indexing structure, or spatial index) • Some variant used in most GIS • Well-suited for windowing, point location, intersection, and nearest neighbor queries • Heuristic structure, no order bounds ( O(..) ) • Dynamic; insertions and deletions supported • Tree with higher degree: well-suited for background storage (short search paths)

R-Trees

R-Trees

Presentation Transcript

R-Trees

Lecture 5: Indexing: R-Trees

R-trees: An Average Case Analysis

R-TREES

Decision Trees in R

R* Trees

R-Trees

Trees, Trees, and More Trees

Nearest Neighbor Queries using R-trees

R-Trees

Trees! Trees! Trees!

Decision Trees in R

Lecture 5: Indexing: R-Trees

R-Trees

Nearest Neighbor Queries using R-trees

Nearest Neighbor Queries using R-trees

R-Trees (Rectangle-Trees)

Concurrent R-trees