R trees a dynamic index structure for spatial searching by a guttman sigmod 1984 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on
  • Presentation posted in: General

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of Southern California. Motivating Example. Type in your street address in Google. Example (Cont…). Show me all the pizza places close by:.

Download Presentation

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


R trees a dynamic index structure for spatial searching by a guttman sigmod 1984 l.jpg

R-TREES: A Dynamic Index Structure for Spatial Searchingby A. Guttman, SIGMOD 1984.

Shahram Ghandeharizadeh

Computer Science Department

University of Southern California


Motivating example l.jpg

Motivating Example

  • Type in your street address in Google


Example cont l.jpg

Example (Cont…)

  • Show me all the pizza places close by:


Terminology l.jpg

Terminology

  • Example query is termed a spatial query.

  • R-tree is a spatial index structure.

    • K-D-B trees are useful for point data only.

      • Exact-point lookup!

        • Show me the USC Salvatory Computer Science building.

    • R-tree represents data objects in intervals in several dimensions.

      • Exact-point and range lookups!

        • Show me all Pizza places in a 2 mile radius of USC Salvatory Computer Science building.

  • R-tree is:

    • A height-balanced tree similar to B-tree with index records in its leaf nodes containing pointers to data objects.

    • A node is a disk page.

    • Assumes each tuple has a unique identifier, RID.


R tree leaf nodes l.jpg

R-Tree: Leaf Nodes

  • Leaf nodes contain index records:

    • (I, tuple-identifier)

  • tuple-identifier is RID,

  • I is an n-dimensional rectangle that bounds the indexed spatial object

  • I = (I0, I1, …, In-1) where n is the number of dimensions.

  • Ii is a closed bounded interval [a,b] describing the extent of the object along dimension i.

  • Values for a and b might be infinity, indicating an unbounded object along dimension i.


R tree non leaf nodes l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.


R tree a 2 d n 2 example l.jpg

R-Tree: A 2-D (n=2) Example


R tree non leaf nodes8 l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.

  • Questions?


R tree non leaf nodes9 l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.

  • Questions?

What is this?


R tree non leaf nodes10 l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.

  • Questions?

Disk Page address!


R tree non leaf nodes11 l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.

  • Questions?

How about this? What is it?


R tree non leaf nodes12 l.jpg

R-Tree: Non-leaf nodes

  • Non-leaf nodes contain entries of the form:

  • (I, child-pointer)

  • Child-pointer is the address of a lower node in the R-Tree.

  • I covers all rectangles in the lower node’s entries.

  • Questions?

An n dimensional rectangle:

I = (I0, I1, …, In-1)


R tree properties l.jpg

R-tree: Properties

  • Assume:

    • M = Maximum number of entries in a node.

    • m <= M/2

    • N = Number of records

  • R-tree has the following properties:

    • Every leaf node contains between m and M index records. Root node is the exception.

    • For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object represented in the indicated tuple.

    • Every non-leaf node has between m and M children. Root node is the exception.

    • For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.

    • The root node has at least two children unless it is a leaf.

    • All leaves appear on the same level.

    • Height of a tree = Ceiling(logmN)-1.

    • Worst case utilization for all nodes except the root is m/M.


Searching l.jpg

Descend from root to leaf in a B+-tree manner.

If multiple sub-trees contain the point of interest then follow all.

Assume:

EI denotes the rectangle part of an index entry E,

Ep denotes the tuple-identifier or child-pointer.

Search (T: Root of the R-tree, S: Search Rectangle)

If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search(Ep, S).

If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record.

Searching


Insertion l.jpg

Insertion

  • Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree.

  • Insert (T: Root of the R-tree, E: new index entry)

    • Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E.

    • Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L.

    • Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed.

    • Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.


Insertion chooseleaf l.jpg

ChooseLeaf (E: new index entry)

Initialize: Set N to be the root node,

Leaf check: If N is a leaf, return N.

Choose subtree: Let F be the entry in N whose rectangle FI needs least enlargement to include E. Resolve ties by choosing the entry with the rectangle of smallest area.

Descend until a leaf is reached: Set N to be the child node pointed to by Fp and repeat from step 2.

Insertion: ChooseLeaf


Splitnode node splitting l.jpg

A full node contains M entries. Divide the collection of M+1 entries between 2 nodes.

Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches.

Heuristic: The total area of two covering rectangles after a split should be minimized.

SplitNode: Node Splitting

Total area is larger!


Splitnode node splitting18 l.jpg

A full node contains M entries. Divide the collection of M+1 entries between 2 nodes.

Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches.

Heuristic: The total area of two covering rectangles after a split should be minimized.

SplitNode: Node Splitting

Total area is larger!


Node splitting how l.jpg

Node Splitting: How?

  • How to find the minimum area node split?

    • Exhaustive algorithm,

    • Quadratic-cost algorithm,

    • Linear cost algorithm.


Exhaustive algorithm l.jpg

Exhaustive Algorithm

  • Generate all possible groups and choose the best with minimum area.

  • Number of possibilities ~ 2 to power of M-1

    • M ~ 50  Number of possibilities ~ 600 Trillion


Exhaustive algorithm21 l.jpg

Exhaustive Algorithm

  • Generate all possible groups and choose the best with minimum area.

  • Number of possibilities ~ 2 to power of M-1

    • M ~ 50  Number of possibilities ~ 600 Trillion

    • US deficit pales!


Quadratic cost algorithm l.jpg

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

Quadratic-Cost algorithm


Quadratic cost algorithm23 l.jpg

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

Quadratic-Cost algorithm


Quadratic cost algorithm24 l.jpg

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

Quadratic-Cost algorithm


Linear cost algorithm l.jpg

Identical to Quadratic with the following differences:

Uses a different version of PickSeeds.

PickNext simply chooses any of the remaining entries.

Linear Cost Algorithm

Linear: Choose two objects that are furthest apart.

Quadratic: Choose two objects that create as much empty space as possible.


Comparison l.jpg

Comparison

  • Linear node-split is simple, fast, and as good as quadratic!

    • Quality of the splits is slightly worse!


Insertion27 l.jpg

Insertion

  • Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree.

  • Insert (T: Root of the R-tree, E: new index entry)

    • Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E.

    • Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L.

    • Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed.

    • Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.


Adjusttree l.jpg

AdjustTree

  • Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits.


Deletes l.jpg

Deletes

  • Straightforward. The only complication is under-flows:

  • An under-full node can be merged with whichever sibling will have its area increased least.

    • Orphaned entries are inserted back into the R-Tree.


R tree l.jpg

R-Tree


R tree variations l.jpg

R-tree Variations

  • R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries.

    • No overlap for minimum bounding rectangels at the same level.

    • Specific object’s entry might be duplicated.

    • Insertions might lead to a series of update operations in a chain-reaction.

    • Under certain circumstances, the structure may lead to a deadlock, e.g., every rectangle encloses a smaller one.


R tree 1990 l.jpg

R*-tree [1990]

  • Node split is more sophisticated.

    • Does not obey the limitation of the number of pairs per node.

    • When a node overflows, p entries are extracted and reinserted in the tree (p might be 25%).

    • Considers minimization of:

      • the overlapping between minimum bounding rectangles at the same level.

      • the perimeter of the produced minimum bounding rectangles.

  • Insertion is more expensive while retrievals are faster.


Static r trees l.jpg

Static R-trees

  • Assumes the dataset is known in advance.

  • Static R-trees are more efficient than dynamic ones:

    • Tree structure is more compact,

    • Contains fewer news,

    • Overlap between minimum bounding rectangles is reduced.


Summary l.jpg

Summary

  • R-tree is a spatial index structure that provides competitive average performance.

  • Many different variations in the literature:

    • Spatio-temporal access methods, 3-d R-tree.

    • Historical R-trees and Time-Parameterized R-tree fo spatiotemporal applications.

  • Have been used to speed-up operations in OLAP applications, data warehouses and data mining.


  • Login