1 / 34

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of Southern California. Motivating Example. Type in your street address in Google. Example (Cont…). Show me all the pizza places close by:.

rosalyn
Download Presentation

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-TREES: A Dynamic Index Structure for Spatial Searchingby A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of Southern California

  2. Motivating Example • Type in your street address in Google

  3. Example (Cont…) • Show me all the pizza places close by:

  4. Terminology • Example query is termed a spatial query. • R-tree is a spatial index structure. • K-D-B trees are useful for point data only. • Exact-point lookup! • Show me the USC Salvatory Computer Science building. • R-tree represents data objects in intervals in several dimensions. • Exact-point and range lookups! • Show me all Pizza places in a 2 mile radius of USC Salvatory Computer Science building. • R-tree is: • A height-balanced tree similar to B-tree with index records in its leaf nodes containing pointers to data objects. • A node is a disk page. • Assumes each tuple has a unique identifier, RID.

  5. R-Tree: Leaf Nodes • Leaf nodes contain index records: • (I, tuple-identifier) • tuple-identifier is RID, • I is an n-dimensional rectangle that bounds the indexed spatial object • I = (I0, I1, …, In-1) where n is the number of dimensions. • Ii is a closed bounded interval [a,b] describing the extent of the object along dimension i. • Values for a and b might be infinity, indicating an unbounded object along dimension i.

  6. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries.

  7. R-Tree: A 2-D (n=2) Example

  8. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries. • Questions?

  9. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries. • Questions? What is this?

  10. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries. • Questions? Disk Page address!

  11. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries. • Questions? How about this? What is it?

  12. R-Tree: Non-leaf nodes • Non-leaf nodes contain entries of the form: • (I, child-pointer) • Child-pointer is the address of a lower node in the R-Tree. • I covers all rectangles in the lower node’s entries. • Questions? An n dimensional rectangle: I = (I0, I1, …, In-1)

  13. R-tree: Properties • Assume: • M = Maximum number of entries in a node. • m <= M/2 • N = Number of records • R-tree has the following properties: • Every leaf node contains between m and M index records. Root node is the exception. • For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object represented in the indicated tuple. • Every non-leaf node has between m and M children. Root node is the exception. • For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node. • The root node has at least two children unless it is a leaf. • All leaves appear on the same level. • Height of a tree = Ceiling(logmN)-1. • Worst case utilization for all nodes except the root is m/M.

  14. Descend from root to leaf in a B+-tree manner. If multiple sub-trees contain the point of interest then follow all. Assume: EI denotes the rectangle part of an index entry E, Ep denotes the tuple-identifier or child-pointer. Search (T: Root of the R-tree, S: Search Rectangle) If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search(Ep, S). If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record. Searching

  15. Insertion • Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. • Insert (T: Root of the R-tree, E: new index entry) • Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. • Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. • Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. • Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

  16. ChooseLeaf (E: new index entry) Initialize: Set N to be the root node, Leaf check: If N is a leaf, return N. Choose subtree: Let F be the entry in N whose rectangle FI needs least enlargement to include E. Resolve ties by choosing the entry with the rectangle of smallest area. Descend until a leaf is reached: Set N to be the child node pointed to by Fp and repeat from step 2. Insertion: ChooseLeaf

  17. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. SplitNode: Node Splitting Total area is larger!

  18. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. SplitNode: Node Splitting Total area is larger!

  19. Node Splitting: How? • How to find the minimum area node split? • Exhaustive algorithm, • Quadratic-cost algorithm, • Linear cost algorithm.

  20. Exhaustive Algorithm • Generate all possible groups and choose the best with minimum area. • Number of possibilities ~ 2 to power of M-1 • M ~ 50  Number of possibilities ~ 600 Trillion

  21. Exhaustive Algorithm • Generate all possible groups and choose the best with minimum area. • Number of possibilities ~ 2 to power of M-1 • M ~ 50  Number of possibilities ~ 600 Trillion • US deficit pales!

  22. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Quadratic-Cost algorithm

  23. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Quadratic-Cost algorithm

  24. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Quadratic-Cost algorithm

  25. Identical to Quadratic with the following differences: Uses a different version of PickSeeds. PickNext simply chooses any of the remaining entries. Linear Cost Algorithm Linear: Choose two objects that are furthest apart. Quadratic: Choose two objects that create as much empty space as possible.

  26. Comparison • Linear node-split is simple, fast, and as good as quadratic! • Quality of the splits is slightly worse!

  27. Insertion • Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. • Insert (T: Root of the R-tree, E: new index entry) • Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. • Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. • Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. • Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

  28. AdjustTree • Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits.

  29. Deletes • Straightforward. The only complication is under-flows: • An under-full node can be merged with whichever sibling will have its area increased least. • Orphaned entries are inserted back into the R-Tree.

  30. R-Tree

  31. R-tree Variations • R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries. • No overlap for minimum bounding rectangels at the same level. • Specific object’s entry might be duplicated. • Insertions might lead to a series of update operations in a chain-reaction. • Under certain circumstances, the structure may lead to a deadlock, e.g., every rectangle encloses a smaller one.

  32. R*-tree [1990] • Node split is more sophisticated. • Does not obey the limitation of the number of pairs per node. • When a node overflows, p entries are extracted and reinserted in the tree (p might be 25%). • Considers minimization of: • the overlapping between minimum bounding rectangles at the same level. • the perimeter of the produced minimum bounding rectangles. • Insertion is more expensive while retrievals are faster.

  33. Static R-trees • Assumes the dataset is known in advance. • Static R-trees are more efficient than dynamic ones: • Tree structure is more compact, • Contains fewer news, • Overlap between minimum bounding rectangles is reduced.

  34. Summary • R-tree is a spatial index structure that provides competitive average performance. • Many different variations in the literature: • Spatio-temporal access methods, 3-d R-tree. • Historical R-trees and Time-Parameterized R-tree fo spatiotemporal applications. • Have been used to speed-up operations in OLAP applications, data warehouses and data mining.

More Related