134 Views

Download Presentation
## R-Trees: A Dynamic Index Structure for Spatial Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**R-Trees: A Dynamic Index Structure for Spatial Data**Antonin Guttman**R-Tree: Why, What … ?**• Why do we need R-Trees? • What are R-Trees? • How do I perform operations? • Alternatives? Why not a B+ tree?**Properties of R-Trees**• Height Balanced • 2 types of nodes • Leaves point to disk pages • Records in the leaves point to actual data objects • For a max capacity of M, min occupancy should be M/2 • Completely dynamic • Guaranteed Fan-out of M/2 • Every leaf record is a smallest bounding box. • Root has at least two children**R-Trees: The Structure.**• Internal nodes : ( rectangle, child pointer) • N dimensional rectangle. • Pointer to all rectangles that are cointained. • Leaf Nodes : (MBR , tuple-identifier) • MBR is minimum bounding rectangle • Tuple-identifier is a pointer to the data object.**m**n o p a c d e k l b f g h i j Example**m**n o p a c d e k l b f g h i j m Example a b c d**m**n o p a c d e k l b f g h i j n m Example e f a b c d**m**n o p a c d e k l b f g h i j n m o p Example e f a b g h c d i**R-Trees: Operations**• Inserts • Deletes • Updates ( delete and re-insert) • Queries/Searches • Names of all the roads in 1 sq km area? • Which buildings would be encountered between Roger’s Hall and Reitz Union? • Give me all rectangles that are contained in the input rectangle. • Give me all rectangles intersecting this rectangle.**Insert**• Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. • Which leaf to insert into? (Choose Leaf) • How to split a node? (Node Split)**n**m o p Insert: Choose Leaf**Node Splitting**• Quadratic method • Select max area gradient in the nodes as seeds. • Start clustering from the seeds • Linear method • Select seeds with max separation using max x, y • Randomly assign rectangles to seeds**Delete**• Search for the rectangle • If the rectangle is found, remove it. • If the node is deficient, • Put the remaining entries in a re-insert queue. • Adjust the parent rectangle if needed. • Continue this till you reach the root. • Re-insert in such a way that all internal nodes remain above the leaf nodes. • Adjust the rectangles making them smaller. • Alternative sibling combination like a B-tree. • But re-insertion shows similar performance and is simple to implement.**Performance Tests**• R-Trees in C under UNIX on VAX11/780 computer running on 2D data(1057) for 5 page sizes • Linear node split was better than quadratic as expected. • CPU time unchanged with page sizes, indicating that when one side became full all split algorithms simply put everything in the other side. • Delete is affected by the fill factor. • Search insensitive to the fill factor and split algorithm used. • Storage space is a function of the fill factor, page size and split algorithm • All split algorithms came in 10% of the best exhaustive search and split algorithm.**Performance: 2nd Innings**• Same configuration but on various data sizes 1057, 2238, 3295 and 4559 rectangles. • Low CPU cost, close to 150 micro seconds. • Comparable performance of split algorithms • Most space was used by the leaf nodes**Conclusions from the paper.**• R-Tree perform well for spatial data with non zero node sizes. • With smaller node structure can be used as an in-memory spatial data index. • CPU performance of in-memory R-tree index is comparable and there is no IO cost. • Linear split was almost as good as others. • It was fast. • Node split quality was a bit off-target, but it did not hurt the search performance noticeably. • Possible use with abstract data types and abstract indexes to streamline handling of spatial data.