- 277 Views
- Uploaded on
- Presentation posted in: General

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

R-TREES: A Dynamic Index Structure for Spatial Searchingby A. Guttman, SIGMOD 1984.

Shahram Ghandeharizadeh

Computer Science Department

University of Southern California

- Type in your street address in Google

- Show me all the pizza places close by:

- Example query is termed a spatial query.
- R-tree is a spatial index structure.
- K-D-B trees are useful for point data only.
- Exact-point lookup!
- Show me the USC Salvatory Computer Science building.

- Exact-point lookup!
- R-tree represents data objects in intervals in several dimensions.
- Exact-point and range lookups!
- Show me all Pizza places in a 2 mile radius of USC Salvatory Computer Science building.

- Exact-point and range lookups!

- K-D-B trees are useful for point data only.
- R-tree is:
- A height-balanced tree similar to B-tree with index records in its leaf nodes containing pointers to data objects.
- A node is a disk page.
- Assumes each tuple has a unique identifier, RID.

- Leaf nodes contain index records:
- (I, tuple-identifier)

- tuple-identifier is RID,
- I is an n-dimensional rectangle that bounds the indexed spatial object
- I = (I0, I1, …, In-1) where n is the number of dimensions.
- Ii is a closed bounded interval [a,b] describing the extent of the object along dimension i.
- Values for a and b might be infinity, indicating an unbounded object along dimension i.

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.
- Questions?

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.
- Questions?

What is this?

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.
- Questions?

Disk Page address!

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.
- Questions?

How about this? What is it?

- Non-leaf nodes contain entries of the form:
- (I, child-pointer)
- Child-pointer is the address of a lower node in the R-Tree.
- I covers all rectangles in the lower node’s entries.
- Questions?

An n dimensional rectangle:

I = (I0, I1, …, In-1)

- Assume:
- M = Maximum number of entries in a node.
- m <= M/2
- N = Number of records

- R-tree has the following properties:
- Every leaf node contains between m and M index records. Root node is the exception.
- For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object represented in the indicated tuple.
- Every non-leaf node has between m and M children. Root node is the exception.
- For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.
- The root node has at least two children unless it is a leaf.
- All leaves appear on the same level.
- Height of a tree = Ceiling(logmN)-1.
- Worst case utilization for all nodes except the root is m/M.

Descend from root to leaf in a B+-tree manner.

If multiple sub-trees contain the point of interest then follow all.

Assume:

EI denotes the rectangle part of an index entry E,

Ep denotes the tuple-identifier or child-pointer.

Search (T: Root of the R-tree, S: Search Rectangle)

If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search(Ep, S).

If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record.

- Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree.
- Insert (T: Root of the R-tree, E: new index entry)
- Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E.
- Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L.
- Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed.
- Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

ChooseLeaf (E: new index entry)

Initialize: Set N to be the root node,

Leaf check: If N is a leaf, return N.

Choose subtree: Let F be the entry in N whose rectangle FI needs least enlargement to include E. Resolve ties by choosing the entry with the rectangle of smallest area.

Descend until a leaf is reached: Set N to be the child node pointed to by Fp and repeat from step 2.

A full node contains M entries. Divide the collection of M+1 entries between 2 nodes.

Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches.

Heuristic: The total area of two covering rectangles after a split should be minimized.

Total area is larger!

A full node contains M entries. Divide the collection of M+1 entries between 2 nodes.

Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches.

Heuristic: The total area of two covering rectangles after a split should be minimized.

Total area is larger!

- How to find the minimum area node split?
- Exhaustive algorithm,
- Quadratic-cost algorithm,
- Linear cost algorithm.

- Generate all possible groups and choose the best with minimum area.
- Number of possibilities ~ 2 to power of M-1
- M ~ 50 Number of possibilities ~ 600 Trillion

- Generate all possible groups and choose the best with minimum area.
- Number of possibilities ~ 2 to power of M-1
- M ~ 50 Number of possibilities ~ 600 Trillion
- US deficit pales!

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

A heuristic to find a small-area split.

Cost is quadratic in M and linear in the number of dimensions.

Pick two of the M+1 entries to be the first elements of the two new groups.

Choose these in a manner to waste the most area if both were put in the same group.

Assign remaining entries to groups one at a time.

Identical to Quadratic with the following differences:

Uses a different version of PickSeeds.

PickNext simply chooses any of the remaining entries.

Linear: Choose two objects that are furthest apart.

Quadratic: Choose two objects that create as much empty space as possible.

- Linear node-split is simple, fast, and as good as quadratic!
- Quality of the splits is slightly worse!

- Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree.
- Insert (T: Root of the R-tree, E: new index entry)
- Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E.
- Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L.
- Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed.
- Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

- Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits.

- Straightforward. The only complication is under-flows:
- An under-full node can be merged with whichever sibling will have its area increased least.
- Orphaned entries are inserted back into the R-Tree.

- R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries.
- No overlap for minimum bounding rectangels at the same level.
- Specific object’s entry might be duplicated.
- Insertions might lead to a series of update operations in a chain-reaction.
- Under certain circumstances, the structure may lead to a deadlock, e.g., every rectangle encloses a smaller one.

- Node split is more sophisticated.
- Does not obey the limitation of the number of pairs per node.
- When a node overflows, p entries are extracted and reinserted in the tree (p might be 25%).
- Considers minimization of:
- the overlapping between minimum bounding rectangles at the same level.
- the perimeter of the produced minimum bounding rectangles.

- Insertion is more expensive while retrievals are faster.

- Assumes the dataset is known in advance.
- Static R-trees are more efficient than dynamic ones:
- Tree structure is more compact,
- Contains fewer news,
- Overlap between minimum bounding rectangles is reduced.

- R-tree is a spatial index structure that provides competitive average performance.
- Many different variations in the literature:
- Spatio-temporal access methods, 3-d R-tree.
- Historical R-trees and Time-Parameterized R-tree fo spatiotemporal applications.

- Have been used to speed-up operations in OLAP applications, data warehouses and data mining.