1 / 53

UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001

UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001. Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01. Part 2 . Advanced Topics Applications Manufacturing

ayita
Download Presentation

UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UMass Lowell Computer Science 91.504Advanced AlgorithmsComputational GeometryProf. Karen DanielsSpring, 2001 Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01

  2. Part 2 Advanced Topics Applications Manufacturing Modeling/Graphics Wireless Networks Visualization Techniques (de)Randomization Approximation Robustness Representations Epsilon-net Decomposition tree Part 2

  3. Literature for Part II

  4. Literature for Part II

  5. Approximate Nearest Neighbor Searching “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions” Arya, Mount, Netanyahu, Silverman, Wu

  6. Goals • Fast nearest neighbor query in d-dimensional set of n points: • approximate nearest neighbor • distance within factor of (1+e) of true closest neighbor • preprocess using O(dnlogn) time, O(dn) space • Balanced-Box Decomposition (BBD) tree • note that space, time are indepenent of e • query in O(cd,elogn) time C++ code for simplified version is at http://www.cs.umd.edu/~mount/ANN

  7. Approach: Distance Assumptions • Use Lp (also called Minkowski) metric • assume it can be computed in O(d) time • pth root need not be computed when comparing distances • Approximate nearest neighbor • distance within factor of (1+e) of true closest neighbor p* • Can change e or metric without rebuilding data structure

  8. Approach: Overview • Preprocess points to create: • Balanced-Box Decomposition (BBD) tree • Query algorithm: for query point q • Locate leaf cell containing q in O(log n) time • Priority search: Enumerate leaf cells in increasing distance order from q • For each leaf cell, calculate distance from q to cell’s point • Keep track of closest point p seen so far • Stop when distance from q to leaf > dist(q,p)/(1+e) • Return p as approximate nearest neighbor to q.

  9. x4 x3 x2 y3 y2 y1 x1 >= < y3 y2 y2 y1 x1 x3 x2 x1 x4 3 4 6 9 7 5 1 2 8 Balanced Box Decomposition(BBD) Tree • Similar to kd-tree [Samet handout] • Binary tree • Tree structure stored in main memory • Cutting planes orthogonal to axes • “Alternating” dimensions • O(log n) height • Subdivides space into regions of O(d) complexity using d-dimensional rectangles • Can be built in O(dn log n) time One possible kd-like tree for the above points (not a BBD tree, though)

  10. Balanced Box Decomposition(BBD) Tree (continued) subdivision • Distinguishing features of BBD tree: • Cell is either • d-dimensional rectangle or • difference of 2 d-dimensional nested rectangles • In this sense, BBD tree is like: • Optimized kd-tree: partition points into roughly = sized sets [inner box shrink] • While descending in tree, number of points on path decreases exponentially • Specialized Quadtree: aspect ratio of box is bounded by a constant [hyperplane split] • While descending in tree, size of region on path decreases exponentially • Leaf may be associated with more than 1 point in/on cell: O(n) node • Inner boxes are “sticky”: if it is close to edge, it “sticks” tree split shrink

  11. Midpoint Algorithm for Splitting/ Shrinking single-stage simplified shrink • Split box b using hyperplane through center of b and orthogonal to ith coordinate axis (longest dimension) • Bounds aspect ratio what’s wrong with this approach? • Centroid shrink: produce O(1) subcells, each with <= 2nc/3 points [nc=# pts in current cell] • 3-stage: shrink, split, shrink 3-stage shrink, split, shrink

  12. Middle-Interval Algorithm for Splitting/ Shrinking • Flexibility for splitting plane choice • Choose plane from a central strip of current outer box

  13. Packing Constraint • Each subdivision cell satisfies this packing constraint: • Proof has 2 cases: • Overlapping boxes • Disjoint boxes: • Box of side 2r encloses ball of radius r • Aspect ratio 3:1 implies smallest side length >= s/3 • Densest packing given by regular grid of boxes of side length s/3 • Interval of length 2r can intersect no more than intervals • Account for all dimensions by raising to power d Given a BBD-tree for a set of data points in Rd, the number of leaf cells of size at least s>0 intersecting a (Minkowski Lm) open ball of radius r>0 is at most

  14. Priority Search from Query Point • Visit boxes in increasing order of distance from q • Similar to kd-tree priority search • Maintain priority queue of tree nodes • Node priority inversely related to dist(q,cell) • Search repeats: • Extract highest priority node • Descend subtree • visit leaf closest to q • add siblings to queue At start, root + v1, v2 , v3 , v4 are in priority queue node closest to query point

  15. xR xR xL xL xM yT yT (xR - x3 , yT - y3 ) (xR - x3 , yT - y3 ) (xL + x1 , yT - y1 ) (xM - x’1 , yT - y1 ) (xL + x2 , yB + y2 ) (xL + x2 , yB + y2 ) (xR - x4 , yB + y4 ) (xM + x’4 , yB + y4 ) yB yB Incremental, Relative Distance [Arya, Mount93] • Maintain sum of appropriate powers of coordinate differences between query point and nearest point of outer box • Incrementally update distance from parent box to each child when split is performed: • Closer child has same distance as parent • Further child’s distance needs only 1-coordinate update (along splitting dimension) • Can make a difference in higher dimensions! L1 distance

  16. Experiments Experiments generated points from a variety of probability distributions: Uniform Gaussian Laplace Correlated Gaussian Correlated Laplacian Clustered Gaussian Clustered Segments

  17. Experiments

  18. Experiments

  19. Experiments

  20. Conclusions • Algorithm is not necessarily practical for large dimensions • But, for dimensions <= ~20, does well • Shrinking helps with highly clustered datasets, but was not often needed in their experiments • Only needed for 5-20% of tree nodes • BBD tree (in paper’s form) is primarily for static point set • But, auxiliary data structure could maintain changes

  21. Derandomization for Efficient Geometric Partitioning “Bounded-Independence Derandomization of Geometric Partitioning with Applications to Parallel Fixed-Dimensional Linear Programming” Goodrich, Ramos

  22. Overview • Paper concerns geometric partitioning: • Given: • a collection X of n hyperplanes in Rd • a parameter r • Partitioning Goal: • partition Rd into O(rd ) constant-sized cells • so that each cell intersects few hyperplanes • Previous Work: • Random sampling -> partition in which each cell intersects at most en hyperplanes , where e=logr/r • Derandomization can be used for deterministic construction • Current Work: • Assume set is a special space with a special property • For such a set, construct (efficiently, deterministically, and in parallel) a (small-sized) approximation for the space • Apply to efficiently & deterministically solve parallel fixed-dimensional linear programming For other Goodrich papers, see http://www.cs.jhu.edu/~goodrich/cgc/pubs/

  23. Background: Derandomization • Common approach for randomized geometric algorithms: • use small-sized random samples • Derandomize: • quantify combinatorial properties of the random samples • show that sets with these properties can be constructed efficiently without randomization • Combinatorial properties often characterized by what the next long series of slides is about….

  24. Background: Configuration • Given an abstract set (universe) N of geometric objects • A configurations over N is a pair (D,L) = (D(s),L(s)), where D, L are disjoint subsets of N • Objects in D are: • triggers associated with s • objects that define s • d(s) = cardinality of D(s) = degree • Objects in L are: • stoppers associated with s • objects that conflict with s • l(s) = cardinality of L(s) = level = (absolute) conflict size Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  25. h1 h1 h2 h2 h3 h3 s h4 h4 H(R) N segments in R segments in N \ R Background: Configuration Example • N = {h1, h2 , h3 , h4 }= set of line segments in the plane • s is feasible if s occurs in trapezoidal decomposition H(R) for some subset R of N • trapezoids arising in incremental computation of H(N) • Here, R = {h3 , h4 } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  26. Background: Configuration Example • For a feasible trapezoid s define its: • trigger set D(s) = segments of N adjacent to boundary of s • conflict set L(s) = segments of N \ D(s) intersecting s Configuration (D(s), L(s)) where D(s)={h3, h4} and L(s)={h1, h2} h1 h2 h3 s h4 H(R) segments in N \ R segments in R Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  27. Background: Configuration Space • A configuration spaceP(N)over N is a (multi)set of configurations with the • Bounded Degree Property: • The degree of each configuration in P(N)is bounded (by a constant -- something independent of N) Note: The term configuration space is also used in motion planning. In that context, is refers to the motion planning search space. Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  28. Background: Configuration Example • Associate with each feasible s a configuration (D(s), L(s)) • If N in general position, d(s) = cardinality of D(s) <= 4 • since s is a trapezoid • Due to bounded degree d(s) , result P(N)is a configuration space of all feasible trapezoids over N configurations for feasible trapezoids s1 and s2 D(s2)={h3, h4} D(s1)={h3, h4} h1 h2 L(s2)={0} L(s1)={h1, h2} h3 s1 s2 h4 H(R) segments in N \ R segments in R Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  29. h3 s1 s2 h4 2 feasible trapezoids for N = {h3 , h4 } Background: Configuration Example • If we restrict N to be {h3 , h4 }, then • s1 , s2 are 2 feasible trapezoids • D(s1)= D(s2)={h3, h4} • L(s1)= L(s2)={0} • 2 “distinct” configurations: • (D(s1), L(s1)) = (D(s2), L(s2)) • Size of P(N) includes such “duplicate” configurations • Reduced Size of P(N) excludes “duplicates” Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  30. h1 h2 h3 s h5 h4 Background: Configuration Example • Note that not every arrangement of line segments (before overlaying a trapezoidal decomposition on it) has the bounded degree property. • In general, it can have d(s) = O(N) • Can you think of another type of decomposition that has bounded degree? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  31. h3 s1 s2 h4 2 feasible trapezoids for N = {h3 , h4 } Background: Configuration Example • Definition: Pi(N) is set of configurations in P(N) with level i • [recall level is size of L(s), the conflict set] • P0(N) is active over N. • Example: P0(N) = {(D(s1), L(s1)), (D(s2), L(s2))} Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  32. h3 s1 s2 h4 Background: Configuration Example • Definition: A configuration space P(N)has bounded valence if the number of configurations in P(N)sharing the same trigger set is bounded (by a constant). • Example: For P(N)= our configuration space of all feasible trapezoids over N has bounded valence • all feasible trapezoids with same trigger set can be identified with trapezoids in trapezoidal decomposition formed by that trigger set • size of that trigger set is bounded by a constant, so number of such trapezoids is also bounded by a constant Trapezoidal decomposition induced by trigger set = {h3, h4} Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  33. Background: Configuration Example • Theorem: • Let: • P(N)be a configuration space of bounded valence • n=size of N • d = maximum degree of a configuration in P(N) • R = a random sample of N of size r • Then: • For each active configuration s in P0(R) • with probability > 1/2 • the conflict size of s relative to N is <= c(n/r) log r for large enough c • Expected reduced size: E[reduced size of P(R)] is in O(rd) • Example: For any random sample R of N of size r • each trapezoid in the trapezoidal decomposition H(R) has O([n/r] log r) conflict size with high probability • size of P(R) is in O(rd) • for bounded P(N)size and reduced size only differ by constant factor Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  34. Background: Range Space • Definition: • Let: P(N)be a configuration space • n=size of N • p’(r) be maximum reduced size function of P(N)for r <= n • P(N)has bounded dimension if there is a constant d such that p’(r) is in O(rd) for all r <= n • In this case, d is the dimension of P(N) • Bounded valence -> bounded dimension • Some important types of configuration spaces don’t have bounded valence but have bounded dimension • Range space: configuration space for which trigger set of every configuration is empty. In this case, a configuration is a range. • Half-space Range: Range = points in halfspace. P(N) = set of distinct ranges induced by (upper) halfspaces. Dualize -> line arrangement [it has bounded dimension] Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  35. Background: e-net of a Range Space • Theorem: • If: • P(N)is a configuration space of bounded dimension d • e >= 0 • R is a random subset of N • formed via r independent draws from N with replacement • r >= 8/e • then: • conflict size of each configuration s in P0(R)in (relative to N) of every range in P0(R) <= e n • with probability at least 1- 2p’(2r) 2-er/2 • For a range space P(N) • R is an e-net of the range space P(N) • for large enough r, a random sample of size r is an e-net with high probability Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  36. Example: h5 h7 h6 h3 h1 h2 h4 h8 p6 p3 p5 p4 p1 p7 p2 N=set of 1D points P(N)=space of ranges induced by rightwards half-spaces Background: VC-Dimension of a Range Space • Use to bound dimension when direct argument fails • Let P(N)be a range space • A subset M of N is shattered by N if every subset of M occurs as a range in P(M) • Reduced size of P(M) is 2m • VC-Dimension of P(N)is maximum size of a shattered subset of N. What is the VC-Dimension of P(N)? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

  37. What does all this have to do with the paper???? • Paper concerns geometric partitioning: • Given: • a collection X of n hyperplanes in Rd • a parameter r • Goal: • partition Rd into O(rd ) constant-sized cells • so that each cell intersects few hyperplanes • Previous Work: • Random sampling -> partition in which each cell intersects at most en hyperplanes , where e=logr/r • Derandomization can be used for deterministic construction • Current Work: • Assume set is a range space with bounded VC-exponent • VC-exponent is more general concept than VC-dimension • For such a set, construct (efficiently, deterministicaly, and in parallel) a (small-sized) approximation for the range space that is a variation on the e-net concept.

  38. Additional Handouts • Parallel programming • PRAM CREW, EREW models • Parallel geometric algorithms

  39. Project Update

  40. Project Deliverable Due DateGrade % Proposal Monday, 4/9 2% Interim Report Monday, 4/23 5% Final Presentation Monday, 5/7 8% Final Submission Monday, 5/14 10% 25% of course grade

  41. Guidelines: Presentation • 1/2 hour class presentation • Explain to the class what you did • Structure it any way you like! • Some ideas: • slides (electronic or transparency) • demo • handouts

  42. Guidelines: Final Submission • Abstract: Concise overview (at most 1 page) • Introduction: • Motivation: Why did you choose this project? • Related Work: Context with respect to CG literature • Summary of Results • Main Body of Paper: (one or more sections) • Conclusion: • Summary: What did you accomplish? • Future Work: What would you do if you had more time? • References: Bibliography (papers, books that you used) Well- written final submissions with research content may be eligible for publishing as UMass Lowell CS technical reports.

  43. Guidelines: Final Submission • Main Body of Paper: • If your project involves Theory/ Algorithm: • Informal algorithm description (& example) • Pseudocode • Analysis: • Correctness • Solutions generated by algorithm are correct • account for degenerate/boundary/special cases • If a correct solution exists, algorithm finds it • Control structures (loops, recursions,...) terminate correctly • Asymptotic Running Time and/or Space Usage

  44. Guidelines: Final Submission • Main Body of Paper: • If your project involves Implementation: • Informal description • Resources & Environment: • what language did you code in? • what existing code did you use? (software libraries, etc.) • what equipment did you use? (machine, OS, compiler) • Assumptions • parameter values • Test cases • tables, figures • representative examples

  45. Final Exam

  46. Final Exam: Date, Format • Format: • in class • open book, notes • similar to midterm: • 50% calculate/ manipulate • 50% design, analyze • Date Choices • Friday, 18 May at • 1:00-4:00 pm or • 5:30-8:30 pm • Wednesday, 23 May at • 9:00-12:00 am or • 1:00-4:00 pm or • 5:30-8:30 pm or

  47. Final Exam: Part I Material • O’Rourke CH 1-8: emphasis on chapters omitted from midterm (CH 7-8) • Some key themes • Common geometric/combinatorial structures: • Decomposition/Partition: • Triangulation • Trapezoidalization • Delaunay Triangulation • Voronoi Diagram • Arrangment (level, zone) • Enclosure: • Convex Hull • Nested Polytope Hierarchy • Visibility Polygon & Kernel of Star Polygon

  48. Final Exam: Part I Material • Some key themes (continued) • Algorithmic Paradigms • Sweep: sort, then sweep a line, parabolic front • Divide-and-Conquer • Incremental • Randomized • Output-Sensitive • Preprocessing for fast queries • Representations: • Quad-edge • O’Rourke • Geometric Primitives

  49. Final Exam: Part I Material • Some key themes (continued) • Math: • Convexity • Monotonicity • Distance Metrics • Visibility/ Star-shapedness • Euler’s Formula • Duality • Graphs • Point <-> Line • Parabolic • Minkowski Sum • Randomness • Graph Theory: Independent Set

  50. Final Exam: Part II Material • Part II • Translational Polygon Containment • Connected Dominating Sets for Wireless Networks • Mesh Generation using Delaunay Triangulation • Approximate Nearest Neighbor Searching • Derandomization for Efficient Geometric Partitioning

More Related