# I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis - PowerPoint PPT Presentation

I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis

1 / 41
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis

## I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus

2. The Union-Find Problem • A universe of N elements: x1, x2, …, xN • Initially N singleton sets: {x1}, {x2 }, …, {xN} • Each set has a representative • Maintain the partition under • Union(xi, xj) : Joins the sets containing xi and xj • Find(xi) : Returns the representative of the set containing xi

3. The Solution representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a b j a m path compression link-by-rank e g e g n

4. Complexity • O(N α(N)) for a sequence of N union and find operations [Tarjan 75] • α(•) : Inverse Ackermann function (very slow!) • Optimal in the worst case [Tarjan79, Fredman and Saks 89] • Batched (Off-line) version • Entire sequence known in advance • Can be improved to linear on RAM [Gabow and Tarjan 85] • Not possible on a pointer machine [Tarjan79]

5. Simple and Good, as long as … The entire data structure fits in memory

6. The I/O Model Main memory of size M One I/O transfers B items between memory and disk Disk of infinite size

7. Our Results • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os expected • Same as sorting • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Contour trees : O(sort(N)) I/Os

8. I/O-Efficient Batched Union-Find • Assumption: No redundant unions • Each union must join two different sets • Will remove later • Two-stage algorithm • Convert to interval union-find • Compute an order on the elements s.t. each union joins two adjacent sets • Solve batched interval union-find

9. Union Graph (Tree if no redundant unions) 1: Union(d, g) 2: Union(a, c) 3: Union(r, b) 4: Union(a, e) 5: Union(e, i) 6: Union(r, a) 7: Union(a, d) g 8: Union(d, h) r 9: Union(b, f) r r 9 3 6 6 3 f a b a b 4 4 2 9 2 7 7 c d e f c d e 1 8 5 1 5 g h i g i 8 h Equivalent union trees

10. Transforming the Union Tree r r r 7 3 3 3 6 6 6 8 8 a b a h b d a h b 4 2 9 2 9 9 4 4 7 7 1 2 c d e f c d e f g c e f 1 8 5 1 5 5 i g h i g i r 7 9 6 3 8 d a h b f Weights along root-to-leaf path decrease 1 2 4 5 g c e i

11. Formulating as a Batched Problem r 3 6 a b r 7 4 9 2 9 6 3 7 8 d a h b f c d e f 1 2 1 8 5 4 5 g c e i g h i For each edge, find the lowest ancestor edgewith a higher weight

12. Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 Euler Tour x: positions in the tour y: weight In O(sort(N)) I/Os [Chiang et al. 95]

13. Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 For each edge, find the lowestancestor edgewith a higher weight For each segment, find the shortest segment above and containing it

14. Distribution Sweeping M/B vertical slabs checkedrecursively Total cost: O(sort(N)) checked here

15. In-Order Traversal r 3 9 6 Weights along root-to-leaf path decrease 7 8 b a d h f 1 2 4 5 c e i g • At u, with child u1,…, uk(in increasing order of weight) • Recursively visit subtree at u1 • Return u • For i=2 ,…, kRecursively visit subtree at ui b r c a e i g d h f Claim: this traversalproduces the right order

16. Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp representative

17. Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp Four instances of batched ray shooting: O(sort(N))

18. Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp Four instances of batched ray shooting: O(sort(N))

19. Handling Redundant Unions • Union tree becomes a general graph • Compute the minimum spanning tree • O(sort(N)) I/Os (randomized) [Chiang et al. 95] O(sort(N) loglog B) I/Os (deterministic) [Arge et al. 04] • Deterministic O(sort(N)) I/Os if graph is planar • Only MST edges are non-redundant

20. Applications Topological Persistence Contour Trees

21. Application: Topological Persistence • Introduced by Edelsbrunner et al. 2000 • Measure importance on a surface • Feature extraction • Topological de-noising • Many applications • Surface modeling • Shape analysis • Terrain analysis • Computational Biology

22. Topological Persistence Illustrated

23. Formulated as Batched Union-Find • Represented as a triangulated mesh • Consider minimum-saddle pairs • When reach • A minimum or maximum: do nothing • A regular point u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) lower link

24. Experiment 1:Random Union-Find 128MB memory

25. Experiment 2: Topological Persistence on Terrain Data Neuse River Basin of North Carolina: ~ 0.5 billion points

26. Experiment 2: Topological Persistence on Terrain Data 128MB memory Entire data set (0.5b): IM fails and EM takes 10 hours

27. Contour Trees

28. Summary • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Contour trees : O(sort(N)) I/Os • Open Question: • On-line case: Can we get below O(N α(N)) I/Os?

29. Thank you!

30. Previous Results • Directly maintain contours • O(N log N) time [van Kreveld et al. 97] • Needs union-split-find for circular lists • Do not extend to higher dimensions • Two sweeps by maintaining components, then merge • O(N log N) time [Carr et al. 03] • Extend to arbitrary dimensions

31. Join Tree and Split Tree Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree

32. Final Contour Tree Hard to BATCH! 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree

33. Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 9 9 Now can BATCH! 8 8 8 u 7 7 u 7 u 6 6 6 v u v 5 5 5 w w w 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree

34. Map to Rectangles 9 9 w 8 8 u 7 7 u u 6 6 v v 5 5 w w 4 4 v 3 3 2 2 1 1 Can be solved in O(sort(N)) I/Os (practical, too) Join tree Split tree

35. Topological Persistence

36. Label Nodes with Intervals 9 8 7 6 5 4 3 2 1 Using Euler tour (O(sort(N) I/Os)

37. Map to Rectangles 9 9 w 8 8 u 7 7 u u 6 6 v v 5 5 w w 4 4 v 3 3 2 2 1 1 Can be solved in O(sort(N)) I/Os (practical, too) Join tree Split tree

38. Formulated as Batched Union-Find • Represented as a triangulated mesh • Consider minimum-saddle pairs • When reach • A minimum or maximum: do nothing • A regular poin u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) lower link

39. Experiment 1:Random Union-Find

40. Experiment 2: Topological Persistence on Terrain Data

41. Experiment 2: Topological Persistence on Terrain Data