1 / 71

The Planar Point Location Problem

I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University (2003). The Planar Point Location Problem.

jock
Download Presentation

The Planar Point Location Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O-efficient Point Location using Persistent B-TreesLars Arge, Andrew Danner, and Sha-Mayn TehDepartment of Computer Science, Duke University (2003) Michal Balas

  2. The Planar Point Location Problem Storing a planar subdivision defined by N line segments such that the region containing a query point p can be computed efficiently Michal Balas

  3. Planar Point Location Applications • Geographic Information systems (GIS) • Spatial Databases • Graphics Usually the datasets are larger than the size of physical memory and must reside on disk Michal Balas

  4. Previous Works • So far, few theoretically I/O efficient structures were developed, but all are relatively complicated and none of them was implemented • Vahrenhold and Hinrichs (2001) suggested a heuristic structure that is simple and efficient but theoretically non optimal Michal Balas

  5. Goal find a planar point location structure that minimizes the number of I/Os needed to answer a query, which is efficient both in theory and in practice. Michal Balas

  6. Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas

  7. Vertical Ray Shooting • A generalized version of the Planar Point Location problem • Given a set of N non-intersecting segments in the plane, construct a data structure such that the segment directly above a query point p can be found efficiently • We will consider this problem. Michal Balas

  8. Example Michal Balas

  9. Vertical Ray Shooting • Based on the persistent search tree idea of Sarnak and Tarjan (1986). • Any vertical line l in the plane introduces an “above-below” order on the segments it intersects. • We will “sweep” the plane from left to right with a vertical line • Our “critical” x-axis points are the endpoints of all segments Michal Balas

  10. Vertical Ray Shooting & Persistent Search Trees • Sort critical points by x-values • For each critical point pi=(xi,yi) we can build a search tree for the segments intersecting a vertical line at xi according to the y-values (at xi) • Until the next critical point pi+1 the tree is static – it will change only in the next begin/end point of a segment Michal Balas

  11. Vertical Ray Shooting & Persistent Search Trees • Worst case analysis: • Hold a search tree to each critical point • Space: O(n2) Michal Balas

  12. Vertical Ray Shooting & Persistent Search Trees • We should use the fact that two consecutive trees (versions) differ only by one insertion or deletion (assuming distinct x-values for all endpoints). Michal Balas

  13. Vertical Ray Shooting & Persistent Search Trees • Persistent data structure • Preserves versions. In ordinary (ephemeral) data structures there is only one last version (every update changes the data structure so its state before the update can no longer be accessed) • Each update creates a version • The current version of the structure can be modified and all versions of the structure, past and present, can be accessed. Michal Balas

  14. Vertical Ray Shooting & Persistent Search Trees • We would like to save a version of the search tree for each critical point. Since we want to be space efficient, we will use persistent search tree. • A persistent search tree differs from an ordinary search tree in that after an insertion or deletion, the old version of the tree can still be accessed. • Here the persistent search tree should supports insertions and deletions in the present and queries in the past. (partially persistent) Michal Balas

  15. Vertical Ray Shooting & Persistent Search Trees • We will insert a segment into the persistent search tree when its left endpoint is encountered • We will delete a segment persistently from the tree when its right endpoint is encountered. • Two consecutive versions of the tree differ only by a certain number of deletions and insertions (in the distinct x-values case by 1 only) Michal Balas

  16. Vertical Ray Shooting & Persistent Search Trees • Given a query point p=(x,y) , we will search for the position of y in the version of the search tree when the sweep line was at x. Michal Balas

  17. x Vertical Ray Shooting & Persistent Search Trees • Path Copying: • A balanced search tree • When x is inserted the changes are only on the path from the root to x • Instead of copying the whole tree we will copy only the updated path • The roots will be ordered by version Michal Balas

  18. Vertical Ray Shooting & Persistent Search Trees • Path Copying: • Space: O(nlogn) – better, but not good enough r1 r2 x Michal Balas

  19. right left t1 t2 Vertical Ray Shooting & Persistent Search Trees • Extra Pointers : • Instead of copying the path, we will save for each node a few pointers ( a list of left children and right children, thought it’s a binary tree) Michal Balas

  20. Vertical Ray Shooting & Persistent Search Trees • Extra Pointers : • Here there is no limitation on the # of pointers per node • In the worst case, it will take O(logn) time to find the relevant version per node (the pointers are in a binary search tree) – which is not optimal • We need constant time per node Michal Balas

  21. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution: • Limited node copying, k extra pointers per node • k should be a small positive number (k=1 will do) • When a pointer is added to a node, if there is no empty slot for a new pointer, we copy the node, setting the initial left and right pointers of the copy to their latest values. • Update the parent with the new copy, if the parent has no free slot the process is repeated. Michal Balas

  22. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • Amortized analysis: we will see that every set of m operations takes O(m) space. • The potential of the structure is defined to be: F = # live nodes – (1/k)*(# free slots in the live nodes) • amortized space cost of update = (actual # of nodes it creates) – DF Michal Balas

  23. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • We will show that amortized space cost of an update is bounded by O(1) per update. • If a new unused slot in node v is used, but the node is still not full, then the actual # of new nodes created is 0, DF is (-1/k) (#free slots in live nodes decreased by 1), thus amortized space cost of this update is 1/k. • If node copying has occurred, the actual # of new nodes created is 1, DF is 1 (#free slots in live nodes increased by k), thus amortized space cost of this update is 0. Michal Balas

  24. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • During an update, node copying continues in the path from node to root until the root is copied or a node with a free slot is reached. • The amortized space cost of node copying is 0 and of occupying a free slot is 1/k Michal Balas

  25. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • The total amortized space cost of an update is constant (0 or 1/k) • The space of rebalance information per node is constant • In red-black trees, rebalancing after deletion or insertion can be done in O(1) rotations and O(1) color changes per update in the amortized case • Since an insertion or deletion requires O(1) new pointers not counting node copying, the amortized space cost of an update is O(1) Michal Balas

  26. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • sum up over all updates: amortized space cost over all updates = cn = required space – (Fend – Fstart) • Fstart=0 (we start with an empty data structure) • Fend=O(n) (according to the potential function definition, this is an upper bound on the potential in the end) • Required space = cn + O(n) = O(n) (this is a bound on the number of nodes created) Michal Balas

  27. Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution – Complexity • O(log m) query time (m is the total # of updates) • O(log n) update time (n is the current size of the set) • O(1) amortized space per update • O(nlogn) preprocessing time Michal Balas

  28. Where are we going? The use of Persistent Data structures (always preserves the previous version of itself when it is modified) The use of B-trees in the I/O Model (B-tree is the I/O model equivalent of a search tree) I/O efficient Persistent B-tree (works great with totally ordered elements) Modified I/O efficient Persistent B-tree (only elements present in the same version of the structure need to be comparable) Michal Balas 28

  29. Vertical Ray Shooting & Persistent Search Trees • Two segments that cannot be intersected with the same vertical line are not comparable ( “above-below”) • Corollary: Not all segments stored in the persistent structure over its lifespan are comparable An I/O efficient structure cannot directly be obtained using a persistent B-tree (because standard persistent B-trees require total order on all elements) Michal Balas

  30. Vertical Ray Shooting & Persistent Search Trees • To make the structure I/O-efficient, we need to modify the tree so it will only require elements present in the same version of the structure to be comparable Michal Balas

  31. Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas

  32. Review: The I/O Model • Infinite disk size • M - Main Memory size • B - Block size • N - elements in the structure D M Block I/O Michal Balas

  33. Review: The I/O Model - Cont • Computation can only occur on data stored in main memory. • We are interested in the number of I/Os used to answer a query. • The B-tree is the external memory equivalent of the balanced search tree in internal memory. Michal Balas

  34. Review: B-tree • A balanced search tree • All leaves are on the same level • All internal nodes (except the root) have between B/2 and B children (q(B)) • A node/leaf can be stored in O(1) blocks Michal Balas

  35. Review: B-tree - Cont • Space complexity of the tree: O(N/B) blocks (where N is the number of elements) – linear • Tree height: O(logBN) • Insert/Delete can be done with O(logBN) I/Os Michal Balas

  36. Review: B+-tree • It is a B-tree in which all elements are stored in the leaves. • The internal nodes contain “routing elements”. Michal Balas

  37. B-tree Example (B+-tree) 3 5 1 2 3 4 5 6 7 d2 d3 d4 d5 d6 d7 d1 Michal Balas

  38. Where are we going? The use of Persistent Data structures (always preserves the previous version of itself when it is modified) The use of B-trees in the I/O Model (B-tree is the I/O model equivalent of a search tree) I/O efficient Persistent B-tree (works great with totally ordered elements) Modified I/O efficient Persistent B-tree (only elements present in the same version of the structure need to be comparable) Michal Balas 38

  39. Review: Persistent B-tree • Directed acyclic graph • The elements are in the sinks (leaves) • “routing elements” in internal nodes • Elements (and nodes) augmented with “existence interval” • In this interval the element is “alive” • An element is “alive” - between its insert and its delete version Michal Balas

  40. Review: Persistent B-tree - Cont • Nodes “alive” at time t form a (aB,B) B-tree, 0<a<1/2 • We will work with a=1/4 • Additional invariant: • A new node must contain between (a+g)B and (1-g)B alive elements ( a > g ) • For g=1/8, a=1/4,new node contains between (3/8)B and (7/8)B alive elements • We require that g>2/B, a-g >=1/B, 2a+3g<= 1-3/B Michal Balas

  41. Review: Persistent B-tree - Cont • In order to find the appropriate root at time t, the roots are stored in a standard B-tree • Takes O(logBN) I/Os • A node/leaf contains O(B) elements = O(1) blocks # Blocks needed to hold the structure: O(N/B) Michal Balas

  42. Persistent B-tree Insert • x is the element to insert into the current version of the tree • Search the leaf l and insert x (O(logBN) I/Os) • if l contains > B elements -> Block overflow • Version-Split (copy all k alive elements from l to a new node v and mark l as dead) • If k is in [(3/8)B,(7/8)B] - simple • If k > (7/8)B – strong overflow • If k < (3/8)B – strong underflow Strong overflow/underflow violates the additional invariant we defined earlier Michal Balas

  43. Persistent B-tree Insert • If k is in [(3/8)B,(7/8)B] : recursively update parent(l): persistently delete the reference to l and insert a reference to v Michal Balas

  44. Persistent B-tree Insert - Cont • If k > (7/8)B – strong overflow • split create nodes v1, v2 each with k/2 elements. k/2 is in ((3/8)B,(7/8)B) (this is not tight) • Update parent(l) recursively: persistently delete the reference to l and insert two references to v1, v2 Michal Balas

  45. Persistent B-tree Insert - Cont • If k < (3/8)B – strong underflow • Version-split of sibling l’ of l -> obtain k’ other alive elements (k’ is in [aB,B]) k+k’>= 2aB, and a > g, thus k+k’ > (a+g)B (the invariant…) 1) ifk+k’ <= (1-g)B: merge -create a new leaf with k+k’ elements 2) if k+k’ >(1-g)B: share – split to create two new leaves. • Update parent(l) recursively: persistently delete two references and insert one or two Michal Balas

  46. Persistent B-tree Delete • x is the element to delete from the current version of the tree • Search the leaf l that contains and mark x as dead (O(logBN) I/Os) • if l contains < (1/4)B alive elements -> Block underflow (this is also a strong underflow, since k < (3/8)B ) • Version-Split on a sibling node to obtain k+k’ elements. k+k’>= 2aB -1 , and a- g > =1/B, thus k+k’ > (a+g)B (the invariant…) mark l dead and create a new node v with k+k’ elements (merge) if there is a strong overflow in v – share (as in insert) • Update parent(l) recursively: persistently delete two references and insert one or two references Michal Balas

  47. Persistent B-tree – Rebalance Operations Delete Insert Block Overflow Block Underflow Done 0,0 Version-split Version-split Done -1,+1 Strong Underflow Strong Overflow Merge Split Done -2,+1 Done -1,+2 Strong Overflow Split Done -2,+2 Michal Balas

  48. Persistent B-tree - Complexity • Updates: O(logBN) I/Os • search and rebalance on one path from root to leaf • What about the required space? Michal Balas

  49. Persistent B-tree - Complexity • A few observations: • A rebalance operation on leaf creates <= 2 new nodes • Once a leaf is created, at least gB updates have to be performed on it before another rebalance operation will occur. • Two version-splits might only create one new leaf • Each time a leaf is created or a leaf version-split performed, a corresponding insertion or deletion is performed recursively one level up the tree. • During N updates: • # leaves created <= 2N/gB = O(N/B) • # leaf version-splits<= 2N/gB • # nodes created one level up the tree <= 22N/(gB)2 • By induction: # nodes created i levels up the tree <= 2i+1N/(gB)i+1 • Total # nodes created <= (it is also the # of blocks used after N updates) • Space: O(N/B) blocks Michal Balas

  50. Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas

More Related