R-Trees: A Dynamic Index Structure for Spatial Data

R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman

R-Tree: Why, What … ? • Why do we need R-Trees? • What are R-Trees? • How do I perform operations? • Alternatives? Why not a B+ tree?

Properties of R-Trees • Height Balanced • 2 types of nodes • Leaves point to disk pages • Records in the leaves point to actual data objects • For a max capacity of M, min occupancy should be M/2 • Completely dynamic • Guaranteed Fan-out of M/2 • Every leaf record is a smallest bounding box. • Root has at least two children

R-Trees: The Structure. • Internal nodes : ( rectangle, child pointer) • N dimensional rectangle. • Pointer to all rectangles that are cointained. • Leaf Nodes : (MBR , tuple-identifier) • MBR is minimum bounding rectangle • Tuple-identifier is a pointer to the data object.

R-tree of order 4

m n o p a c d e k l b f g h i j Example

m n o p a c d e k l b f g h i j m Example a b c d

m n o p a c d e k l b f g h i j n m Example e f a b c d

m n o p a c d e k l b f g h i j n m o p Example e f a b g h c d i

R-Trees: Operations • Inserts • Deletes • Updates ( delete and re-insert) • Queries/Searches • Names of all the roads in 1 sq km area? • Which buildings would be encountered between Roger’s Hall and Reitz Union? • Give me all rectangles that are contained in the input rectangle. • Give me all rectangles intersecting this rectangle.

Insert • Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. • Which leaf to insert into? (Choose Leaf) • How to split a node? (Node Split)

n m o p Insert: Choose Leaf

Insert : Choose Leaf m

Insert: Choose Leaf n

Insert: Choose Leaf o

Insert: Choose leaf p

Node Splitting • Quadratic method • Select max area gradient in the nodes as seeds. • Start clustering from the seeds • Linear method • Select seeds with max separation using max x, y • Randomly assign rectangles to seeds

Delete • Search for the rectangle • If the rectangle is found, remove it. • If the node is deficient, • Put the remaining entries in a re-insert queue. • Adjust the parent rectangle if needed. • Continue this till you reach the root. • Re-insert in such a way that all internal nodes remain above the leaf nodes. • Adjust the rectangles making them smaller. • Alternative sibling combination like a B-tree. • But re-insertion shows similar performance and is simple to implement.

Performance Tests • R-Trees in C under UNIX on VAX11/780 computer running on 2D data(1057) for 5 page sizes • Linear node split was better than quadratic as expected. • CPU time unchanged with page sizes, indicating that when one side became full all split algorithms simply put everything in the other side. • Delete is affected by the fill factor. • Search insensitive to the fill factor and split algorithm used. • Storage space is a function of the fill factor, page size and split algorithm • All split algorithms came in 10% of the best exhaustive search and split algorithm.

Performance: 2nd Innings • Same configuration but on various data sizes 1057, 2238, 3295 and 4559 rectangles. • Low CPU cost, close to 150 micro seconds. • Comparable performance of split algorithms • Most space was used by the leaf nodes

Conclusions from the paper. • R-Tree perform well for spatial data with non zero node sizes. • With smaller node structure can be used as an in-memory spatial data index. • CPU performance of in-memory R-tree index is comparable and there is no IO cost. • Linear split was almost as good as others. • It was fast. • Node split quality was a bit off-target, but it did not hurt the search performance noticeably. • Possible use with abstract data types and abstract indexes to streamline handling of spatial data.

R-Trees: A Dynamic Index Structure for Spatial Data

R-Trees: A Dynamic Index Structure for Spatial Data

Presentation Transcript

Econometric Analysis of Panel Data

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

From Gene Trees to Species Trees

Red-black trees

CSE 332 Data Abstractions : B Trees and Hash Tables Make a Complete Breakfast

Modeling spatial structure from point samples

Trees

SPATIAL ORGANIZATION

Introduction to Graph Data Structure Applications Graph Searching Minimum Spanning Trees

Spatial Access Methods

SPATIAL ORGANIZATION

Algorithms and Data Structures (CSC112)

Advanced Data Structures NTUA 2007 R-trees and Grid File

Spatial Data Mining: Accomplishments and Research Needs

Geographic Data and Relationships

Managing Uncertainty in Spatial and Spatio -temporal Data

Chapter 10 BINARY TREES

Representation and Management of Data on the Internet

Managing Uncertainty in Spatial and Spatio -temporal Data

Trees

AVL Trees