1 / 37

R-Tree: Spatial Representation on a Dynamic-Index Structure

R-Tree: Spatial Representation on a Dynamic-Index Structure. Advanced Algorithms & Data Structures Lecture Theme 03 – Part I K. A. Mohamed Summer Semester 2006. Overview. Representing and handling spatial data The R-Tree indexing approach , style and structure Properties and notions

Download Presentation

R-Tree: Spatial Representation on a Dynamic-Index Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-Tree: Spatial Representation on a Dynamic-Index Structure Advanced Algorithms & Data Structures Lecture Theme 03 – Part I K. A. Mohamed Summer Semester 2006

  2. Overview • Representing and handling spatial data • The R-Tree indexing approach, style and structure • Properties and notions • Searching and inserting index Entry-records • Deleting and updating • Performance analyses • Node splitting algorithms • Derivatives of the R-Trees • Applications

  3. Spatial Database (Ia) • Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

  4. Spatial Database (Ib) • Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search. Spatial object: Contour (outline) of the area around the building(s). Minimum bounding region (MBR) of the object.

  5. Spatial Database (Ic) • Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick relational-topological search. MBR of the city neighbourhoods. MBR of the city defining the overall search region.

  6. Spatial Database (II) Notion: To retrieve data items quickly and efficiently according to their spatial locations. • Involves 2D regions. • Need to support 2D range queries. • Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM). In general: • Spatial data objects often cover areas in multidimensional spaces. • Spatial data objects are not well-represented by point-location. • An ‘index’ based on an object’s spatial location is desirable.

  7. M E H P T X B D F G I K L N O Q S V W Y Z The Indexing Approach • A B-Tree (Rosenberg & Snyder, 1981) is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children). • The keys and the subtrees are arranged in the fashion of a search tree. • Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large. • The B-Tree is designed (among other objectives): • to branch out this large number of directions, and • to contain a lot of keys in each node so that the height of the tree is relatively short.

  8. The R-Tree Index Structure • An R-Tree is a height-balanced tree, similar to a B-Tree. • Index records in the leaf nodes contain pointers to the actual spatial-objects they represent. • Leaves in the structure all appear on the same level. • Spatial searching requires visiting only a small number of nodes. • The index is completely dynamic: inserts and deletes can be intermixed with searches. • No periodic reorganisation is required.

  9. The R-Tree Index Structure • A spatial database consists of a collection of tuples representing spatial objects, known as Entries. • Each Entry has a unique identifier that points to one spatial object, and its MBR; i.e. Entry = (MBR, pointer).

  10. R-Tree Index Structure – Leaf Entries • An entryE in a leaf node is defined as (Guttman, 1984): E = (I, tuple-identifier) • Where I refers to the smallest bindingn-dimensional region (MBR) that encompasses the spatial data pointed to by its tuple-identifier. • I is a series of closed-intervals that make up each dimension of the binding region. Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb].

  11. R-Tree Index Structure – Leaf Entries • In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb]. • If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension.

  12. B c I(A) I(a) I(B) I(b) … I(c) I(M) I(d) d b a R-Tree Index Structure – Non-Leaf Entries • An entryE in a non-leaf node is defined as: E = (I, child-pointer) • Where the child-pointer points to the child of this node, and I is the MBR that encompasses all the regions in the child-node’s pointer’s entries.

  13. Properties Let M be the maximum number of entries that will fit in one node. Let m≤ M/2 be a parameter specifying the minimum number of entries in one node. Then an R-Tree must satisfy the following properties: • Every leaf node contains between m and M index records, unless it is the root. • For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier. • Every non-leaf node has between m and M children, unless it is the root. • For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node. • The root has two children unless it is a leaf. • All leaves appear on the same level.

  14. Node Overflow and Underflow • A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-boundM. • The ‘overflow’ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement. • A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-boundm. • The underflow node must be condensed, and its entries dispersed for global optimum arrangement.

  15. Search Typical query: Find and report all university building sites that are within 5km of the city centre. Key: a: FAW b: Uni-Klinikum c: Psychologie d: Biotechnology e: Institutsviertel f: Rektorat g: Uni-zentrum h: Biology / Botanischer i: Sportszentrum Approach: • Build the R-Tree using rectangular regions a, b, … i. • Formulate the query range Q. • Query the R-Tree and report all regions overlappingQ.

  16. Search Strategy Let Q be the query region. Let T be the root of the R-Tree.  Search all entry-records whose regions overlapsQ. Search sub-trees: • If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q. Search leaf nodes: • If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q.

  17. @ r6 r2 r5 r8 @ r2 @ r5 @ r8 r0 r1 r7 r3 r4 @ r0 @ r1 @ r7 @ r3 @ r4 a d b f h c g e i Search Typical query: Find and report all university buildings/sites that are within 5km of the city centre. R-Tree settings: M = m =

  18. Search • The search algorithm descends the tree from the root in a manner similar to a B-Tree. • More than one subtree under a node visited may need to be searched. • Cannot guarantee good worst-case performance. • Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space. • So that only data near the search area need to be examined. • Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure.

  19. Insertion • New index entry-records are added to the leaves. • Nodes that overflow are split, and splits propagate up the tree. • A split-propagation may cause the tree to grow in height. The mainInsertroutine • Let E = (I, tuple-identifier) be the new entry to be inserted. • Let T be the root of the R-Tree. • [Ins_1]Locate a leafL starting from T to insert E. • [Ins_2]AddE to L. If L is already full (overflow), splitL into L and L’. • [Ins_3]Propagate MBR changes (enlarged or reduced) upwards. • [Ins_4]Growtreetaller if node split propagation causes T to split.

  20. B @rN A A B C E.I C rN Insertion – Choose Leaf  [Ins_1]Locate a leaf L starting from T to insert E = (I, tuple-identifier). • Notion (i): Select the path that would require the least enlargement to include E.I. • Notion (ii): Resolve ties by choosing the child-node with the smallest MBR. • Invoke: L = ChooseLeaf(E, T).

  21. Insertion – Choose Leaf Algorithm: ChooseLeaf (E, N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L where E should be inserted. • IfN is leaf ThenReturnN • Let FS be the set of current entries in the node N • Let F =(I, child-pointer)  FS, so that F.I satisfies the Insertion-Notions • ReturnChooseLeaf (E, F.child-pointer)

  22. Insertion – Add and Adjust  [Ins_2]AddE to L. • Notion (i): If L has room for another entry, install E. • Notion (ii): Otherwise splitL to obtain L and L’, which between them, will contain all previous entries in L and the new E (consolidated for local optima).  [Ins_3]Propagate MBR changes upwards by invoking AdjustTree (L, L’). • Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR. • Notion (ii): If L’ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L’.

  23. @G K @K X Y Z @Y a b c Insertion – Propagating Node Splits Example. Found L = @Y to insert new E = e. R-Tree settings: M = 3, m = 1.

  24. Insertion – Adjust Tree (I) Algorithm: AdjustTree (N, N’) Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N’, if not NULL, that accompanies N. Outputs: (i) N as above, (ii) N’ as above. • IfN is the root ThenReturn {(i) N, (ii) N’} • Let PN be the parent node of N. • Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N. • AdjustI_N so that it tightly encloses all entry regions in N.

  25. Insertion – Adjust Tree (II) • IfN’ is Not NULL Then • If number of entries in PN < M-1 Then • Create a new Entry EN’ = (I_N’, child-pointer_N’) • Install EN’ in PN • ReturnAdjustTree (PN, NULL) • Else • Set {PN, PN’} = SplitNode(PN, EN’) • ReturnAdjustTree (PN, PN’) • End If • Else • ReturnAdjustTree (PN, NULL) • End If

  26. @T (root) A B C @C @C’ E G F H Insertion – Grow Tree Taller  [Ins_4]Grow Tree taller. • Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes.

  27. Insertion – Summary • The height of the R-Tree containing n entry-records is at most logmn – 1, because the branching factor of each node is at least m. • The maximum number of nodes is: • Worst case space utilisation for all nodes except the root is: • Nodes will tend to have more than m entries, and this will:

  28. Overview • Representing and handling spatial data • The R-Tree indexing approach, style and structure • Properties and notions • Searching and inserting index Entry-records • Deleting and updating • Performance analyses • Node splitting algorithms • Derivatives of the R-Trees • Applications

  29. Deletion • Current index entry-records are removed from the leaves. • Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree. • A condense propagation may cause the tree to shorten in height. The main Delete routine • Let E = (I, tuple-identifier) be a current entry to be removed. • Let T be the root of the R-Tree. • [Del_1]Find the leafL starting from T that contains E. • [Ins_2]RemoveE from L, and condense ‘underflow’ nodes. • [Ins_3]Propagate MBR changes upwards. • [Ins_4]Shortentree if T contains only 1 entry after condense propagation.

  30. Deletion – Find Leaf  [Del_1]Find the leafL starting from T that contains E. Algorithm: FindLeaf (E, N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L containing E. • IfN is leaf Then • IfN contains EThenReturnN • ElseReturn NULL • Else • Let FS be the set of current entries in N. • For each F = (I, child-pointer)  FS where F.I overlaps E.IDo • Set L = FindLeaf (E, F.child-pointer) • IfL is not NULL ThenReturnL • NextF • Return NULL • End If

  31. Deletion – Remove and Adjust  [Del_2]RemoveE from L, and condense ‘underflow’ nodes.  [Del_3]Propagate MBR changes upwards. • Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR. • Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated.

  32. Deletion – Remove and Adjust • Propagate these notions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes. • Start the propagation by setting N = L, and QS = . • Re-insert the entries from the eliminated nodes in QS back into the tree. • Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier. • Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree.

  33. @ r6 @ r7 @ r2 r0 r2 r3 r4 r1 r6 r5 @ r0 @ r3 @ r4 @ r5 @ r1 a k i f c d g b j l h m e n Deletion – Propagating Node Condensation • Example: Delete the index entry-record b. R-Tree settings: M = 4, m = 2. • Spatial constraint: a.I will form smallest MBR with r4.

  34. Deletion – Condense Tree (I) Algorithm: CondenseTree (N, QS) Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS. • IfN is NOT the root Then • Let PN be the parent node of N. • Let EN = (I_N, child-pointer_N) in PN. • IfN.entries < mThen • Delete EN from PN • Add N to QS • Else • Adjust I_N so that it tightly encloses all entry regions in N. • End If • CondenseTree (PN, QS)

  35. Deletion – Condense Tree (II) • Else If N is root AND Q is NOT  Then • For each QQSDo • For each EQDo • IfQ is leaf ThenInsert (E) • ElseInsert (E) as a node entry at the same node level as Q • End If • NextE • NextQ • End If

  36. Deletion – Summary Why ‘re-insert’ orphaned entries? • Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an ‘underflow’ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes. • Both methods can cause the nodes to split. • Eventually all changes need to be propagated upwards, anyway.  Re-insertion accomplishes the same thing, and: • It is simpler to implement (and at comparable efficiency). • It incrementally refines the spatial structure of the tree. • It prevents gradual deterioration if each entry was located permanently under the same parent node.

  37. Performance with respect to Parameter m • A high value of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates. • The height of the tree will be kept to a minimum. • High search performance is maintained. • However, the risk of overflow and underflow is also high. • A small value of m is good when frequent updates and modifications of the underlying database is required. • The nodes are less dense. • Maintenance is less costly.

More Related