1 / 20

R-Tree Index

mizell
Download Presentation

R-Tree Index

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. R-Tree Index Basics, Variations, and Cost By: Michael Lindemuth & Mark Turner

    2. Overview Spatial Data Spatial Queries R-Tree Index Queries and Complexities Variations Implementations Conclusions

    3. Spatial Data Any Type of Geometry Point City Line Trail Polygon Border A Collection of Geometries Ski Resort Trails Any Coordinate System Meters Pixels WGS84 (GPS)

    4. Spatial Queries Standard Insert, and Delete Queries Spatial Range Queries Find all cities within 20 miles of Tampa Nearest Neighbor Queries Find the closest pizza place to my address Spatial Join Queries Find all neighborhoods that are within 20 miles of a university Geometry Set Operations Equal(), Disjoint(), Intersect(), Touch(), Cross(), Within(), Contains(), Overlap(), Distance(), Buffer(), ConvexHull(), Intersection(), Union(), Difference(), SymmDiff(),… OGIS Standard for SQL (http://www.opengeospatial.org/standards/sfs)

    5. R-Tree Overview Proposed by Antonin Guttman UC Berkley ACM SIGMOD 1984 All Spatial Data Enveloped Minimum Bounding Rectangle (MBR) Stored and Indexed According to MBR Structure Resembles B+-tree Height Balanced Dynamic Index Order of Queries Makes No Difference

    6. R-Tree Index Structure For an index record <I, tuple-identifier> I = (I0, I1, … In) n = Number of Dimensions in the Geometry Each I is a set of the form [a,b] describing the range of the rectangle along the dimension a or b can be equal to infinity Tuple-identifier points to a record Non-leaf nodes are in the form: <I, child-pointer> Same space complexity as a B+-tree, O(n)

    7. Six R-Tree Properties Given M is the maximum number of entries in one node Parameter m = M/2 specifies the minimum number of entries in a node Every Leaf Node Contains Between m and M index records unless it is root. For each index record, <I, tuple-identifier> in a leaf node is the smallest rectangle that spatially contains the n-dimensional data object. Every non-leaf node has between m and M children unless it is the root. For each entry <I, child-pointer> in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes. The root node has at least two children unless it is a leaf. All leaves appear on the same level.

    8. R-Tree Structure An Example Structure of an R-Tree Source: http://en.wikipedia.org/wiki/Image:R-tree.jpg

    9. Queries For all queries, it is possible to check if a point is within a rectangle in linear time. Query Types To Be Reviewed Insert Delete Nearest Neighbor Multidimensional Range Queries

    10. Insert Query Very Similar to B+-Trees Start at the Root Node Select the child that needs the least enlargement in order to fit the new geometry. Repeat until at a leaf node. If leaf node has available space insert Else split the entry into two nodes Update parent nodes Update the entry that pointed to the node with a new MBR Add a new entry for the second new node If there is no space in the parent node, split and repeat

    11. Insert Query Complexity IMPORTANT: Make sure nodes are split so they cover the smallest possible area. Minimize search time Example from Textbook Slides Given N = Number of entries in each node T = Tree height Worst Case 2 * N * T O(n)

    12. Delete Query Also similar to B+-trees Search for Node to Remove If node with the removed entry has too few entries, reallocate them Recursively check the parent nodes until reaching the root Update all MBR and remove all nodes that underfull Reinsert all entries removed from the removed nodes according to the INSERT algorithm.

    13. Delete Query Complexity Given N = number of entries in each node T = tree height Complexity 2 * N * T

    14. Nearest Neighbor Query Two Options Branch-and-bound search Best first search Branch-and-bound Find two distances to each object Minimum distance from the search point to any side of the other object’s MBR Minmax distance Least of the furthest distance in every dimension Lowest upper bound on the distance from the point to an object Best First Search Calculates minmax distance for all objects R-Tree sorted by minmax distance Removes nodes from sorted tree If node has no children it is the nearest neighbor.

    15. Nearest Neighbor Complexity Branch-and-Bound Takes longer because it searches all nodes that have not been pruned Best First Search Investigates only the closest nodes Large priority queue data structure in memory Can cause thrashing Run-time complexity subject to geometries How many overlap and how large

    16. Multidimensional Range Queries If the current node is not a leaf, check all the children with an MBR that overlap the range. For all entries that overlap, search all children nodes If a node is a child, check all entries and any that overlap are a match.

    17. Multidimensional Complexity Worst Case Linear Search Every MBR overlaps the search area Best Case No more than one overlap at each level O(logM n) Again, dependent on geometries

    18. Variations R+-Tree Split Entries in the tree so that there is no overlap No more multiple paths to reach a solution Child pointers duplicated within the tree R*-Tree Do not split nodes on insert Take entries from the overfull node and reinsert them into the tree Changes MBRs Saves time and possibly rebalances the tree

    19. Implementations PostgreSQL (PostGIS extensions), MySQL, and Oracle All Use R-Trees for Spatial Indexing Used for CAD/CAM software Circuit Design Geographic Information Systems Other alternatives B+-Trees (Single and Multi-dimensional) Transpose many dimensions to a single using some function. Hilbert curves Hard to find nearest neighbors K-d tree Nearest neighbors is more difficult Not Balanced Grid files Larger than R-Tree Not Balanced

    20. Conclusions R-Trees are Everywhere MBRs are the defining concept Rest is mostly B+-Tree Good for Defining and Relating Spatial Data Multiple Variations Basic still used by commercial DBMS platforms

    21. Works Cited N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: an efficient and robust access method for points and rectangles," SIGMOD Rec., vol. 19, pp. 322-331, 1990. A. Guttman, "R-trees: a dynamic index structure for spatial searching," in Proceedings of the 1984 ACM SIGMOD international conference on Management of data Boston, Massachusetts: ACM, 1984. C. Murray, Oracle Spatial Developer's Guide, 11g Release 1 (11.1). Redwood City, CA: Oracle USA, Inc., 2007. T. K. Sellis, N. Roussopoulos, and C. Faloutsos, "The R+-Tree: A Dynamic Index for Multi-Dimensional Objects," in Proceedings of the 13th International Conference on Very Large Data Bases: Morgan Kaufmann Publishers Inc., 1987. S. Shekhar and S. Chawla, Spatial Databases: A Tour. Upper Saddle River, New Jersey: Pearson Education, Inc., 2003.

More Related