1 / 25

Navigating Nets: Simple algorithms for proximity search

Navigating Nets: Simple algorithms for proximity search. Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley). A classical problem. Fix a metric space (X,d) : X = set of points. d = distance function over X . Near-neighbor search (NNS) [Minsky-Papert]:

zareh
Download Presentation

Navigating Nets: Simple algorithms for proximity search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

  2. A classical problem Fix a metric space (X,d): • X = set of points. • d = distance function over X. Near-neighbor search (NNS) [Minsky-Papert]: • Preprocess a given n-point subset S  X. • Given a query point q 2 X, quickly compute the closest point to q among S. Navigating Nets

  3. Variations on NNS • (1+e)-approximate nearest neighbor search: • Find a2X such that d(q,a) · (1+) d(q,S). • Dynamic case: • Allow updates to S (insertions and deletions). • Distributed case: • No central index (e.g., nodes in a network). • Other cost measures (e.g., communication, stretch, load). Navigating Nets

  4. n n-1 n General metrics • Only oracle access to distance function d(¢,¢). • Models a complicated metric or on-demand measurement. • No “hashing of coordinates” or tuning for a specific metric. • Goal: efficient query (sublinear or polylog time). • Impossible, even if the data set S is a path metric: 1 2 n What about approximate NNS? Navigating Nets

  5. 1 1 1  Approximate NNS • Hard even for (near) uniform metrics • d(x,y) = 1 for all x,y2S. But many data sets lack large uniform subsets. Can we quantify this? Navigating Nets

  6. Abstract dimension • The doubling constantlX of a metric (X,d) is the minimum l such that every ball can be covered by l balls of half the radius. • The metric is doubling if lX = O(1). • The (abstract) dimension is dim(X) = log2lX. • Immediate properties: • dimA(Rd , || · ||2) = O(d). • dimA(X’)  dimA(X) for all X’  X. • dimA(X)  log |X|. (Equality for a uniform metric.) Navigating Nets

  7. Illustration • Grid with missing piece Navigating Nets

  8. Illustration • Grid with missing piece • Low-dimensional manifold (bounded curvature) Navigating Nets

  9. Illustration • Grid with missing piece • Manifold • Union of curves in Euclidean space Navigating Nets

  10. Embedding doubling metrics • Theorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<e<1, and let (X,d) be a doubling metric. Then (X,de) can be embedded with O(1) distortion into l2O(1). • Not true for =1 [Semmes, 1996]. • Motivation: Embed S and then apply Euclidean NNS. Navigating Nets

  11. Our results • Simple data structure for maintaining S: • (1+e)-NNS query time: (1/e)O(dim(S)) · log D (for e<½), where D=dmax/dmin is the normalized diameter of S (typically D=nO(1)). • Space: n · 2O(dim(S)). • Dynamic maintenance of S: • Insertion / deletion time: 2O(dim(S)) · log D · loglog D. • Additional properties: • Best possible dependency on dim(S) (in a certain model). • Oblivious to dim(S) and robust against “bad localities”. • Matches/improves known (more specialized) results. Navigating Nets

  12. Running example – a path metric: • A 16-net • An 8-net • A 4-net Nets • Definition: An r-net of X is a subset Y with 1. d(y1,y2)  r for all y1,y22 Y. 2. d(x,Y) < r for all x 2 XnY. (I.e., a maximal r-separated subset.) • Note: Compare vs. -net. Navigating Nets

  13. Y r Y Y Y More nets • Definition: An r-net of X is a subset Y with 1. d(y1,y2)  r for all y1,y22 Y. 2. d(x,Y) < r for all x 2 XnY. (I.e., a maximal r-separated subset.) • Note: Compare vs. -net. Navigating Nets

  14. A 16-net An 8-net A 4-net The data structure • For every r = 2i, let Yr be an r-net of S. • Only O(log D) values of r are non-trivial. • For every y 2 Yr maintain a navigation list Ly,r = {z 2 Yr/2: d(y,z)  2r} Navigating Nets

  15. 3r Yr More on the data structure • For every r = 2i, let Yr be an r-net of S. • Only O(log D) values of r are non-trivial. Yr/2 • For every y 2 Yr maintain a navigation list Ly,r = {z 2 Yr/2: d(y,z)  2r} Navigating Nets

  16. Space requirement Lemma: |Ly,r| 2O(dim(S)) for all y2Y, r¸0. Proof: • Ly,ris contained in a ball of radius 2r. • This ball can be covered by lS3 balls of radius r/4. • Every point in Ly,r  Yr/2must be covered by a distinct ball. • Hence, | Ly,r | lS3 = 23dim(S). Corollary: Total space is 2O(dim(S)) · n · log D. • We actually improve it to 2O(dim(S)) · n. Navigating Nets

  17. A 16-net • An 8-net • A 4-net Back to running example Navigating Nets

  18. $ Initiallyz16 = only point in Y16. Findz8 = closest Y8 point to $. $ Findz4 = closest Y4 point to $ etc. $ Navigating nets • Let $denote the query point. Navigating Nets

  19. How to find zr/2? • Assume each zr2Yr is the closest point to a (instead of to q). • Then d(zr,zr/2) · r+r/2 = 3r/2. • And zr/2 must be in zr‘s list Ly,r. • zr · r • a · r/2 • q • For zr to be closest Yr point to q, • It suffices that d(q,a) · r/4. • And then zr’s list Ly,r contains zr/2. • Note:d(q,zr) · 3r/2. · r/4 • zr/2 Navigating Nets

  20. Stopping point • If we find a point zr with d(q,zr) · 3r/2, • But not a point zr/2 with d(q,zr/2) · 3r/4, • We know that d(q,S) > r/4, • Yielding 6-NNS with query time 2O(dim(S)) · log D. • This can be extended to (1+)-NNS • Similar principles yield insertions and deletions. Navigating Nets

  21. Near-optimality • The basic idea: • Consider a uniform metric on l points. • Let the query point be at distance 1 from all of them, • Except for one point whose distance is 1-e. • Finding this point requires (in an oracle model) computing all l distances to q. • Can happen at every distance scale r. • We get a lower bound of 2W (dim(S))log D. Navigating Nets

  22. Related work – general metrics • Let KX be the smallest K such that |B(x,r)|  K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0. • Define the KR-dimension as log2 KX. • Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04]: • Space n · 2O(dim(S)) · log D. • Query time : 2O(dim(S)) · log D. • If dimKR(S) = O(1) the log D term is actually O(log n). • Our results extend to this setting: 1. KR-metrics are doubling: dim(X) 4dimKR(X). 2. Our algorithms actually give exact NNS. • Assumptions on query distribution [Clarkson’99]. Navigating Nets

  23. Related work – Euclidean metrics • Exact NNS for Rd: • O(d5 log n) query time and O(nd+d) space. [Meiser’93] • (1+e)-NNS for Rd: • O((d/e)d log n) query time and O(dn) space by quad-tree like decompositions [AMNSW’94]. • Our algorithm achieves similar bounds. • O(d polylog(dn)) query time and (dn)O(1) space is useful for higher dimensions [IM’98, KOR’98]. Navigating Nets

  24. Concluding remarks • Our approach: • A “decision tree” that is not really a tree (saves space). • In progress: • A different (static) scheme where log  is replaced by log n. • Bounds on the help of “ambient” space points. • Our data structure yields a spanner of the metric • Immediate: O(1) stretch with average degree 2dim(S). • More work: O(1) stretch with maximum degree 2dim(S). • [Guibas,’04] applied the nets data structure for moving points in the plane. Navigating Nets

  25. Thank you! Navigating Nets

More Related