1 / 43

I/O-Algorithms

I/O-Algorithms. Lars Arge University of Aarhus March 1, 2005. I/O-Model. Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem

galeno
Download Presentation

I/O-Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O-Algorithms Lars Arge University of Aarhus March 1, 2005

  2. I/O-algorithms I/O-Model • Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem • I/O: Movement of block between memory and disk D Block I/O M P

  3. I/O-algorithms Fundamental Bounds [AV88] Internal External • Scanning: N • Sorting: N log N • Permuting • List rank • Searching:

  4. I/O-algorithms Data Structures until now • One-dimensional searching • B-trees: Node degree (B)queries in Rebalancing using split/fuse  updates in • Weight-balanced B-tress: Weight rather than degree constraint  Ω(w(v)) updates below v between rebalancing operations • Persistent B-trees: NO(N/B) space Update in current version in Search in all previous versions in

  5. x I/O-algorithms Data structures until now • Last week: “dimension 1.5” searching: • Interval stabbing: External interval tree O(N/B) space, query, update • We utilized a number of tools/techniques • B-trees, weight-balanced B-trees, persistent B-trees • Logarithmic method • Global rebuilding • Construction using buffer technique

  6. v $m$ blocks I/O-algorithms Interval Management • External interval tree: • Fan-out weight-balanced B-tree on endpoints • Intervals stored in O(B) secondary structure in each internal node • Query efficiency using filtering • Bootstrapping used to avoid O(B) search cost in each node • Size O(B2) underflow structure in each node • Constructed using sweep and persistent B-tree • Dynamic using global rebuilding v

  7. (x1,x2) x1 x2 (x,x) x q3 q1 q2 I/O-algorithms 3-Sided Range Searching • Interval management corresponds to simple form of 2d range search • More general problem: Dynamic3-sidede range searching • Maintain set of points in plane such that given query (q1,q2, q3), all points (x,y) with q1x q2 and yq3 can be found efficiently

  8. q3 q1 q2 I/O-algorithms 3-Sided Range Searching : Static Solution • Construction: Sweep top-down inserting x in persistent B-tree at (x,y) • O(N/B) space • I/O construction using buffer technique • Query (q1,q2, q3): Perform range query with [q1,q2] in B-tree at q3 • I/Os • Dynamic?

  9. 9 16.20 4 16 5,6 19,9 5 1 13 19 13,3 1,2 9,4 20,3 1 4 5 9 13 16 20 19 4,1 I/O-algorithms Internal Priority Search Tree • Base tree on x-coordinates with nodes augmented with points • Heap on y-coordinates • Decreasing y values on root-leaf path • (x,y) on path from root to leaf holding x • If v holds point then parent(v) holds point

  10. I/O-algorithms Internal Priority Search Tree 9 • Linear space • Insert of (x,y) (assuming fixed x-coordinate set): • Compare y with y-coordinate in root • Smaller: Recursively insert (x,y) in subtree on path to x • Bigger: Insert in root and recursively insert old point in subtree  O(log N) update Insert (10,21) 16.20 10,21 4 16 5,6 19,9 5 1 13 19 13,3 1,2 9,4 20,3 1 4 5 9 13 16 20 19 4,1

  11. I/O-algorithms Internal Priority Search Tree 9 • Query with (q1,q2, q3) starting at root v: • Report point in v if satisfying query • Visit both children of v if point reported • Always visit child(s) of v on path(s) to q1 andq2  O(log N+T) query 16.20 4 4 16 5,6 19,9 19 4 5 1 13 19 13,3 1,2 9,4 20,3 1 4 5 9 13 16 20 19 4,1

  12. I/O-algorithms Externalizing Priority Search Tree 9 • Natural idea: Block tree • Problem: • I/Os to follow paths to to q1 andq2 • But O(T) I/Os may be used to visit other nodes (“overshooting”)  query 16.20 4 16 5,6 19,9 5 1 13 19 13,3 1,2 9,4 20,3 1 4 5 9 13 16 20 19 4,1

  13. I/O-algorithms Externalizing Priority Search Tree 9 • Solution idea: • Store B points in each node  • O(B2) points stored in each supernode • B output points can pay for “overshooting” • Bootstrapping: • Store O(B2) points in each supernode in static structure 16.20 4 16 5,6 19,9 5 1 13 19 13,3 1,2 9,4 20,3 1 4 5 9 13 16 20 19 4,1

  14. I/O-algorithms External Priority Search Tree • Base tree: Weight-balanced B-tree with branching parameter 1/4B and leaf parameter B on x-coordinates • Points in “heap order”: • Root stores B top points for each of the child slabs • Remaining points stored recursively • Points in each node stored in “O(B2)-structure” • Persistent B-tree structure for static problem  Linear space

  15. I/O-algorithms External Priority Search Tree • Query with (q1,q2, q3) starting at root v: • Query O(B2)-structure and report points satisfying query • Visit child v if • v on path to q1 orq2 • All points corresponding to v satisfy query

  16. I/O-algorithms External Priority Search Tree • Analysis: • I/Os used to visit node v • nodes on path to q1 orq2 • For each node v not on path to q1 orq2 visited, B points reported in parent(v)  query

  17. I/O-algorithms External Priority Search Tree • Insert (x,y) (ignoring insert in base tree - rebalancing): • Find relevant node v: • Query O(B2)-structure to find B points in root corresponding to node u on path to x • If y smaller than y-coordinates of all B points then recursively search in u • Insert (x,y) in O(B2)-structure of v • If O(B2)-structure contains >B points for child u, remove lowest point and insert recursively in u • Delete: Similarly u

  18. I/O-algorithms External Priority Search Tree • Analysis: • Query visits nodes • O(B2)-structure queried/updated in each node • One query • One insert and one delete • O(B2)-structure analysis: • Query: • Update in O(1) I/Os using update block and global rebuilding  I/Os u

  19. v v’ v’’ I/O-algorithms Dynamic Base Tree • Deletion: • Delete point as previously • Delete x-coordinate from base tree using global rebuilding  I/Os amortized • Insertion: • Insert x-coordinate in base tree and rebalance (using splits) • Insert point as previously • Split: Boundary in v becomes boundary in parent(v)

  20. v’ v’’ I/O-algorithms Dynamic Base Tree • Split: When v splits B new points needed in parent(v) • One point obtained from v’ (v’’) using “bubble-up” operation: • Find top point p in v’ • Insert p in O(B2)-structure • Remove p from O(B2)-structure of v’ • Recursively bubble-up point to v’ • Bubble-up in I/Os • Follow one path from v to leaf • Uses O(1) I/O in each node  Split in I/Os

  21. v’ v’’ I/O-algorithms Dynamic Base Tree • O(1) amortized split cost: • Cost: O(w(v)) • Weight balanced base tree: inserts below v between splits  • External Priority Search Tree • Space: O(N/B) • Query: • Updates: I/Os amortized • Amortization can be removed from update bound in several ways • Utilizing lazy rebuilding

  22. q4 q3 q1 q2 I/O-algorithms Two-Dimensional Range Search • We have now discussed structures for special cases of two-dimensional range searching • Space: O(N/B) • Query: • Updates: • Cannot be obtained for general 2d range searching: • query requires space • space requires query q q3 q1 q q2

  23. I/O-algorithms External Range Tree • Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates  height • Points below each node stored in 4 linear space secondary structures: • “Right” priority search tree • “Left” priority search tree • B-tree on y-coordinates • Interval tree  space

  24. I/O-algorithms External Range Tree • Secondary interval tree structure: • Connect points in each slab in y-order • Project obtained segments in y-axis • Intervals stored in interval tree • Interval augmented with pointer to corresponding points in y-coordinate B-tree in corresponding child node

  25. I/O-algorithms External Range Tree • Query with (q1,q2, q3 , q4) answered in top node with q1 andq2 in different slabs v1 and v2 • Points in slab v1 • Found with 3-sided query in v1 using right priority search tree • Points in slab v2 • Found with 3-sided query in v2 using left priority search tree • Points in slabs between v1 and v2 • Answer stabbing query with q3 using interval tree  first point above q3 in each of the slabs • Find points using y-coordinate B-tree in slabs v1 v2

  26. I/O-algorithms External Range Tree • Query analysis: • I/Os to find relevant node • I/Os to answer two 3-sided queries • I/Os to query interval tree • I/Os to traverse B-trees  I/Os v1 v2

  27. I/O-algorithms External Range Tree • Insert: • Insert x-coordinate in weight-balanced B-tree • Split of v can be performed in I/Os  I/Os • Update secondary structures in all nodes on one root-leaf path • Update priority search trees • Update interval tree • Update B-tree  I/Os • Delete: • Similar and using global rebuilding v1 v2

  28. I/O-algorithms Summary: External Range Tree • 2d range searching in space • I/O query • I/O update • Optimal among query structures q4 q3 q1 q2

  29. I/O-algorithms kdB-tree • kd-tree: • Recursive subdivision of point-set into two half using vertical/horizontal line • Horizontal line on even levels, vertical on uneven levels • One point in each leaf  Linear space and logarithmic height

  30. I/O-algorithms kd-Tree: Query • Query • Recursively visit nodes corresponding to regions intersecting query • Report point in trees/nodes completely contained in query • Query analysis • Horizontal line intersect Q(N) = 2+2Q(N/4)= regions • Query covers T regions • I/Os worst-case

  31. I/O-algorithms kdB-tree • KdB-tree: • Blocking of kd-tree but with B point in each leaf • Query as before • Analysis as before except that each region now contains B points  I/O query

  32. I/O-algorithms Construction of kdB-tree • Simple algorithm • Find median of y-coordinates (construct root) • Distribute point based on median • Recursively build subtrees • Idea in improved algorithm [AAPV01] • Construct levels at a time using O(N/B) I/Os • Recursively build subtrees

  33. I/O-algorithms Construction of kdB-tree • Sort N points by x- and by y-coordinates using I/Os • Building levels ( nodes) in O(N/B) I/Os: 1. Construct by grid with points in each slab 2. Count number of points in each grid cell and store in memory 3. Find slab s with median x-coordinate 4. Scan slab s to find median x-coordinate and construct node 5. Split slab containing median x-coordinate and update counts 6. Recurse on each side of median x-coordinate using grid (step 3) • Grid grows to during algorithm • Each node constructed in I/Os

  34. I/O-algorithms kdB-tree • kdB-tree: • Linear space • Query in I/Os • Construction in I/Os • Dynamic? • Deletes in I/Os using global rebuilding • Insertions?

  35. I/O-algorithms Insertions using Logarithmic Method • Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0 • Build kdB-tree Di on Si • Operations: • Query: Query each Di • Insert: Find first empty Diand construct Di out of elements in S0,S1, … Si-1 • I/Os  per moved point • Point moved O(log N) times  I/Os amortized • Delete: Delete from relevant Di  (ignore in log-method)

  36. I/O-algorithms O-Tree Structure • O-tree: • B-tree on vertical slabs • B-tree on horizontal slabs in each vertical slab • kdB-tree on points in each leaf

  37. I/O-algorithms O-Tree Query • Perform rangesearch with q1 and q2 in vertical B-tree • Query all kdB-trees in leaves of two horizontal B-trees with x-interval intersected but not spanned by query • Perform rangesearch with q3 and q4horizontal B-trees with x-interval spanned by query • Query all kdB-trees with range intersected by query

  38. I/O-algorithms O-Tree Query Analysis • Vertical B-tree query: • Query of all kdB-trees in leaves of two horizontal B-trees: • Query horizontal B-trees: • Query kdB-trees not completely in query • Query in kdB-trees completely contained in query:  I/Os

  39. I/O-algorithms O-Tree Update • Insert: • Search in vertical B-tree: I/Os • Search in horizontal B-tree: I/Os • Insert in kdB-tree: I/Os • Use global rebuilding when structures grow too big/small • B-trees not contain elements • kdB-trees not contain elements  I/Os • Deletes can be handled in I/Os similarly

  40. I/O-algorithms Summary: O-Tree • 2d range searching in linear space • I/O query • I/O update • Optimal among structures using linear space • Can be extended to work in d-dimensions with optimal query bound q4 q3 q1 q2

  41. q3 q1 q2 q4 q3 q1 q2 I/O-algorithms Summary: 3 and 4-sided Range Search • 3-sided 2d range searching: External priority search tree • query, space, update • General (4-sided) 2d range searching: • External range tree: query, space, update • O-tree: query, space, update

  42. q3 q1 q2 q4 q3 q1 q2 I/O-algorithms Techniques (one final time) • Tools: • B-trees • Persistent B-trees • Buffer trees • Logarithmic method • Weight-balanced B-trees • Global rebuilding • Techniques: • Bootstrapping • Filtering (x,x)

  43. I/O-algorithms Other results • Many other results for e.g. • Higher dimensional range searching • Range counting • Halfspace (and other special cases) of range searching • Structures for moving objects • Proximity queries • Structures for objects other than points (bounding rectangles) • Many heuristic structures in database community • Implementation efforts: • LEDA-SM (MPI) • TPIE (Duke/Aarhus)

More Related