1 / 34

Search Trees

Search Trees. R-Trees. Introduction In order to handle spatial data efficiently, as required in CAD and Geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations. R-Trees.

landen
Download Presentation

Search Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Trees

  2. R-Trees • Introduction In order to handle spatial data efficiently, as required in CAD and Geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations

  3. R-Trees • R-Tree Index Structure An R-tree is a height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects Leaf nodes in an R-tree contain index record entries of the form (I, tuple-identifier) where tuple-identifier refers to a tuple in the database and I is an n-dimensional rectangle Non-leaf nodes contain entries of the form (I, child-pointer) where child-pointer is the address of a lower node in the R-tree and I covers all rectangles in the lower node’s entries

  4. R-Trees • Properties of R-Tree Every leaf node contains between m(<=M/2) and M index records unless it is the root For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple Every non-leaf node has between m and M children unless it is the root For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node The root node has at least two children unless it is a leaf and all leaves appear on the same level

  5. R-Tree structure

  6. R-Tree structure

  7. R-Trees …. contd • Algorithm Search Given an R-tree whose root node is T, find all index records whose rectangles overlap a search rectangle S S1 [Search subtrees] If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search on the tree whose root node is pointed to by Ep S2 [Search leaf node] If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record

  8. R-Trees …. contd • Algorithm Insert L1 [Find position for new record] Invoke ChooseLeaf to select a leaf node L in which to place E L2 [Add record to leaf node] If L has room for another entry, install E. Otherwise invoke SplitNode to obtain L and LL containing E and all the old entries of L L3 [Propagate changes upward] Invoke AdjustTree on L, also passing LL if a split was performed L4 [Grow tree taller] If node split propagation caused the root to split, create a new root whose children are the two resulting nodes

  9. R-Trees …. contd • Algorithm ChooseLeaf CL1 [Initialize] Set N to be the root node CL2 [Leaf check] If N is a leaf, return N CL3 [Choose subtree] If N is not a leaf, let F be the entry in N whose rectangle FI needs least enlargement to include EI. Resolve ties by choosing the entry with the rectangle of smallest area. CL4 [Descend until a leaf is reached] Set N to be the child node pointed to by Fp and repeat from CL2

  10. R-Trees …. contd • Algorithm AdjustTree AT1 [Initialize] Set N=L. If L was spilt previously, set NN to be the resulting second node AT2 [Check if done] If N is the root, stop AT3 [Adjust covering rectangle in parent entry] Let P be the parent node of N, and let En be N’s entry in P. Adjust EnI so that it tightly encloses all entry rectangles in N AT4 [Propagate node split upward] If N has a partner NN resulting from an earlier split, create a new entry Enn with EnnP pointing to NN and EnnI enclosing all rectangles in NN. Add Enn to P if there is room. Otherwise, invoke SpiltNode to produce P and PP containing Enn and all P’s old entries … contd

  11. R-Trees …. contd • Algorithm AdjustTree … contd AT5 [Move up to next level] Set N=P and set NN=PP if a split occurred. Repeat from AT2

  12. R-Trees …. contd • Algorithm Deletion D1 [Find node containing record] Invoke FindLeaf to locate the leaf node L containing E. Stop if the record was not found. D2 [Delete record] Remove E from L D3 [Propagate changes] Invoke CondenseTree, passing L D4 [Shorten tree] If the root node has only one child after the tree has been adjusted, make the child the new root

  13. R-Trees …. contd • Algorithm FindLeaf FL1 [Search subtrees] If T is not a leaf, check each entry F in T to determine if FI overlaps EI. For each such entry invoke FindLeaf on the tree whose root is pointed to by Fp until E is found or all entries have been checked D2 [Search leaf node for record] If T is a leaf, check each entry to see if it matches E. If E is found return T

  14. R-Trees …. contd • Algorithm CondenseTree CT1 [Initialize] Set N=L. Set Q, the set of eliminated nodes, to be empty CT2 [Find parent entry] If N is the root, go to CT6. Otherwise let P be the parent of N, and let En be N’s entry in P CT3 [Eliminate under-full node] If N has fewer than m entries, delete En from P and add N to set Q CT4 [Adjust covering rectangle] If N has not been eliminated, adjust EnI to tightly contain all entries in N … contd

  15. R-Trees …. contd • Algorithm CondenseTree … contd CT5 [Move up one level in the tree] Set N=P and repeat from CT2 CT6 [Re-insert orphaned entries] Re-insert all entries of nodes in set Q. Entries from eliminated leaf nodes are re-inserted in tree leaves as described in Algorithm Insert, but entries from higher-level nodes must be placed higher in the tree, so that leaves of their dependent subtrees will be on the same level as leaves of the main tree

  16. R-Trees …. contd • Algorithm Quadratic Split QS1 [Pick first entry for each group] Apply PickSeeds to choose two entries to be the first elements of the groups. Assign each to a group. QS2 [Check if done] If all entries have been assigned, stop. If one group has so few entries that all the rest must be assigned to it in order for it to have the minimum number m, assign them and stop. QS3 [Select entry to assign] Invoke PickNext to choose the next entry to assign. Add it to the group whose covering rectangle will have to be enlarged least to accommodate it. Resolve ties by adding the entry to the group with smaller area, then to the one with fewer entries, then to either. Repeat from QS2

  17. R-Trees …. contd • Algorithm PickSeeds PS1 [Calculate inefficiency of grouping entries together] For each pair of entries E1 and E2, compose a rectangle J including E1I and E2I. Calculate d=area(J)-area(E1I)-area(E2I). PS2 [Choose the most wasteful pair] Choose the pair with the largest d

  18. R-Trees …. contd • Algorithm PickNext PN1 [Determine cost of putting each entry in each group] For each entry E not yet in a group, calculate d1=the area increase required in the covering rectangle of Group 1 to include E1. Calculate d2 similarly for Group 2. PN2 [Find entry with greatest preference for one group] Choose any entry with the maximum difference between d1 and d2

  19. Generalized Search Tree (GiST) • Why GiST Extensible both in data types supported and in the queries applied on this data Allows new data types to be indexed in a manner that supports the queries natural to the data type Unifies previously disparate structures for currently common data types Example B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications

  20. GiST …. contd • Definition A GiST is a balanced multi-way tree of variable fan-out between kM and M Where k is the fill factor With the exception of the root node that can have fan-out from 2 to M Leaf nodes: (p,ptr) ptr: Identifier of some tuple of the DB Non-leaf nodes: (p,ptr) ptr: Pointer to another tree node and p: Predicate used as a search key

  21. GiST …. contd • Properties Every node contains between kM and M index entries unless it is the root. For each index entry (p,ptr) in a leaf node, p holds for the tuple For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr The root has at least two children unless it is a leaf All leaves appear on the same level

  22. GiST …. contd • GiST Methods Key Methods The methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree Tree Methods Provided by the GiST, and may invoke the required key methods

  23. GiST …. contd • GiST Key Methods … contd E is an entry of the form (p,ptr) , q is a query, P a set of entries Consistent(E,q) returns false if p^q guaranteed unsatisfiable, true otherwise. Union(P) returns predicate r that holds for all predicates in P Compress(E) returns (p’,ptr). Decompress(E) returns (r,ptr) where pr. This is a lossy compression as we do not require pr

  24. GiST …. contd • GiST Key Methods … contd Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2). PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here

  25. GiST …. contd • GiST Tree Methods Search Controlled by the Consistent Method. Insert Controlled by the Penalty and PickSplit. Delete Controlled by the Consistent

  26. R (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) New (q,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (q,ptr) (p,ptr) Example New (q,ptr) Penalty = m Penalty = n m < n Penalty =i Penalty = j j < i Full.. Then split according to PickSplit

  27. Applications • GiST Over Z (B+ Trees) • GiST Over Polygons in R2 (R Trees)

  28. B+ Trees Using GiST p here is on the form Contains([xp,yp),v) Consistent(E,q) returns true if If q= Contains([xq,yq),v): (xp<yq)^(yp>xq) If q= Equal (xq,v): xp xq <yp Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

  29. B+ Trees Using GiST … contd Penalty(E,F) If E is the leftmost pointer on its node, returns MAX(y2-y1,0) If E is the rightmost pointer on its node, returns MAX(x1-x2,0) Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0) PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

  30. B+ Trees Using GiST … contd Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x Decompress(E) If E is the leftmost key on a non-leaf node let x= - otherwise let x=E.p.x If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

  31. R-Trees Using GiST The key here is in the form (xul,yul,xlr,ylr) Query predicates are: Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1xul2) ^(yul1yul2) ^ (xlr1xlr2) ^ (ylr1ylr2) Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1xlr2) ^(yul1ylr2) ^ (xul2xlr1) ^ (ylr1yul2) Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1=xul2) ^(yul1=yul2) ^ (xlr1=xlr2) ^ (ylr1=ylr2)

  32. R-Trees Using GiST … contd Consistent(E,q) p contains (xul1,yul1,xlr1,ylr1), and q is either Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2) Returns true if Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

  33. R-Trees Using GiST … contd Penalty(E,F) Compute q= Union(E,F) and return area(q) – area(E.p) PickSplit(P) Variety of algorithms are provided to best split the entries in a over-full node

  34. R-Trees Using GiST … contd Compress(E) Form the bounding rectangle of E.p Decompress(E) The identity function

More Related