1 / 41

Approximations and Streaming Algorithms for Geometric Problems

This talk explores the use of approximations and streaming algorithms for solving various geometric problems, such as clustering, graph analysis, and range searching. It discusses the concept of geometric data stream algorithms as data structures and introduces the methods of insertions-only clustering in geometric spaces and tree computations. Additionally, it covers the techniques of sketches, core-sets, and reduction of geometric problems to vector problems for efficient processing of streaming data.

matthewg
Download Presentation

Approximations and Streaming Algorithms for Geometric Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT

  2. Computational Model • Single* pass over the data: e1, e2, …,en • Bounded storage • Fast processing time per element *For the purpose of this talk

  3. Streaming Data Types • Vector problems: • Stream defines an array of numbers • Maintain stats of the array, e.g., median • Metric problems • Clustering • Graph problems, Text problems • Geometric Problems [this talk]

  4. Geometric Data Stream Algorithms as Data Structures • Data structures that support: • Insert(p) to P • Possibly: Delete(p) from P • Compute(P) • Use space that is sub-linear in |P|

  5. Insertions-only

  6. Clustering in Geometric Spaces • Problems: • k-center [Charikar-Chekuri-Feder-Motwani’97] • k-median [Guha-Mishra-Motwani-O’Callaghan’00, Meyerson’01, Charikar-O’Callaghan-Panigrahy’03] • Bounds: • poly(k,log n) space • O(1)-approximation

  7. k-median/k-center • k is given • Goal: choose k medians/centers to minimize: • k-median: the sum of the distances • k-center: the max distance

  8. Geometric Space • Bounds: • poly(k,log n) space • (1+)-approximation • Problems: • Diameter, Minimum Enclosing Ball [Agarwal-Har-Peled’01, Feigenbaum-Kannan-Zhang’02, Cormode-Muthukrishnan’02, Hershberger-Suri’04] • k-center [Agarwal-HarPeled’01, Agarwal-HarPeled-Varadarajan’04] • k-median [HarPeled-Mazumdar’04] • Range searching via -approximations: • [Suri-Toth-Zhou’04] • [Bagchi-Chaudhary-Eppstein-Goodrich’04] • …

  9. Dominant Approach: Merge and Reduce • Main ideas: • Design an (off-line) algorithm that converts the input into a “sketch”: • Small size • Sufficient to solve the problem • A sketch of sketches is a sketch • Partition the input in a tree-like fashion • Simulate tree computation in small space • Technique can traced back to ancient times i.e., 80’s [Munro-Paterson’80]

  10. Tree Computation p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16

  11. Analysis • Space: (sketch size)*log n • Time: sketch computation time • Question: Where do sketches come from ?

  12. Idea I: solution=sketch • Consider k-median • [GMMO’00] : approximate k-median of approximate weighted k-medians is an approximate k-median • Result: • Constant depth tree • Space: kn, >0 • O(1) -approximation • Works for any metric space 3 2 1 3 2 1 k=3

  13. Use the solution, ctd. • -Approximations: find a subset SP , such that for any rectangle/halfspace/etc R, |RS|/|S|=|RP|/|P| • [Matousek] : approximation of a union of approximations is an approximation • [BCEG’04] : convert it into streaming algorithm, applications • 1/2space • [STZ’04] : better/optimal bounds for rectangles and halfspaces

  14. Idea 2: Core-Sets [AHP’01] • Assume we want to minimize CP(o) • SP is an -core-set for P, if for any o, and a set T: CPT (o) = (1 ±) CST (o) • Note: this must hold for all o, not just the optimal one o P

  15. Example: Core-set for MEB • Compute extremal points: • Choose “densely” spaced direction v1 …vk • I.e., for any u there is vi such that u*vi ≥ ||u||2 / (1+) • For each direction maintain extremal point • k=O(1/)(d-1)/2suffice

  16. Stream Algorithms via Core-sets • Diameter/MEB/width: O(1/)(d-1)/2 log n space [AHP’01] • k-center: O(k/d) log n [HP’01] • k-median: • O(k/d) log n [HPM’04] • O(k2/d) [HPK’05] • O(k2d log6 n/) [Chen’05] • O(d3/7), k=1 [Indyk’05] • Faster algorithms and other results

  17. Limitations • Small core-sets might not exist • Do not support deletions

  18. Insertions and Deletions

  19. Insertions and Deletions • Technique: • Reduction of geometric problems to vector problems • Use of randomized linear embeddings • Problems: • Maintaining histograms of the data • Classic geometric problems (matching, MST, clustering etc)

  20. Streaming Algorithms for Vector Problems • Norm estimation: • Stream elements: (i,b) , i=1…m • Interpretation: xi=xi+b • Want to maintain ||x||p • Why ? Examples: • ||x||pp =Σi xip = #non-zero coordinates in x, as p0 • … • How ?

  21. Dimensionality reduction • x is an m-dimensional vector • A is a “random” m times k matrix, k “small” • Store Ax • Recover (1±)||x||2 from Ax (with prob. 1-1/N ) • [Alon-Matias-Szegedy’96] • Estimator: median[ (A1x)2+..+ (Ac x)2, (Ac+1x)2+..+ (A2cx)2,..]1/2 , c=1/2 , k=c log N • A: constructed from 4-wise independent random variables • [Johnson-Lindenstrauss’85] • Estimator: ||Ax||2 • A: each entry independently drawn from e.g. Gaussian distribution • constructed using Nisan’s PRG [Indyk’00] • [Indyk’00] • Estimator: median[ (A1x),…, (Ak x) ] • A: as above • Works for ||x||p any p(0,2] (using p-stable distributions)

  22. What it means • To know ||x||2, suffices to know Ax • Can maintain Ax when the coordinates are incremented: A(x+ bei)=Ax+ bA ei Ax A x

  23. Applications of Vector Approach • Histograms/wavelet approximation • Classic geometric problems (matching, MST, clustering etc)

  24. Histograms • View x as a function x:[1…n]  [1…M] • Approximate it using piecewise constant function h, with B pieces (buckets) • Problem can be formulated in 2D as well (buckets become rectangular tiles)

  25. Results: 1D • [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss’02] : • Maintains h with B pieces such that ||x-h||2 ≤ (1+)||x-hOPT||2 • Under increments/decrements of x • Space: poly(B,1/,log n) • Time: poly(B,1/,log n)

  26. Results: 2D • [Thaper-Guha-Indyk-Koudas’02] : • Maintains h with Blog (nM) tiles such that ||x-h||2 ≤ (1+)||x-hOPT||2 • Under increments/decrements of x • Space/Update time: poly(B,1/,log n) • Histogram reconstruction time: poly(B,1/, n) • [Muthukrishnan-Strauss’03] : • Maintains h with 4B tiles • Time: poly(B,1/, log(nM))

  27. Minimum Weight Bi-chromatic Matching • Estimate the cost of MWBM

  28. Minimum Weight Matching • Estimate the cost of MWM

  29. Minimum Spanning Tree • Estimate the cost of MST

  30. Facility Location • Goal: choose a set F of facilities to minimize the • sum of the distances to nearest facility plus • the number of facilities times f • Again, report the cost

  31. Approach [Indyk’04] • Assume P{1…}2 • Reduce to vector problems • Impose square grids G0…Gk, with side lengths 20,21, …, 2k, shifted at random. • For each square cell c in Gi, let nP(c) be the number of points from P in c. • The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 1 2 1 3 1 5 1 1

  32. Estimators • MST: ∑i 2i ∑c Gi [nP(c)>0] • MWM: ∑i 2i∑c Gi [nP(c) is odd] • MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| • Fac. Loc.: ∑i 2i∑c Gi min[nP(c), Ti] • K-median: ∑i 2i∑c Gi - B(Q, 2^i)nP(c) (given medians Q) Maintain #non-zero entries in nP[FM’85] Maintain L1 difference [I’00]

  33. Results [Frahling-Indyk-Sohler’05] […, Charikar’02, …] [Frahling-Sohler’05] Space: (log  +log n + K )O(1)

  34. XYZ Space: (K+log +  log n)O(1)

  35. Probabilistic embeddings into HST’s T 1 2 1 3 1 5 1 1 • Known[Bartal’96, Charikar-Chekuri-Goel-Guha-Plotkin’98]: • ||p-q|| ≤ Dtree (p,q) • E[ Dtree(p,q) ] ≤ ||p-q|| * O(log )

  36. MST • E[Cost(MST in T)] ≤ O(log ) Cost(MST) • Cost(MST in T)  Cost(T) • How to compute Cost(T) ? • Sum over all levels i, of the #nodes at i, times 2i • Node c exists iff ni(c)>0 1 2 1 3 1 5 1 1

  37. Matching • Algorithm: • Match what you can at the current level • Odd leftovers wait for the next level • Repeat • Optimal on the HST • Cost=∑i 2i ∑c Gi [nP(c) is odd] 1 0 1 1 1 0 1 1 0

  38. Conclusions • Algorithms for geometric data streams • Insertions-only: merge and reduce, coresets • Insertions and deletions: randomized linear embeddings

  39. Open Problems • Matchings, Facility Location, etc: • Replace log  by O(1) or even 1+ • Possible for: • MST [Frahling-Indyk-Sohler’05] • k-median [Frahling-Sohler’05] • Related to computing bi-chromatic matching [Agarwal-Varadarajan’04] • Min-sum clustering ?

  40. Open Problems • High dimensions: • Diameter: • 21/2-approx, O(d2 n1/2 ) space, follows from [Goel-Indyk-Varadarajan’01] • c-approx, O( dn1/(c2 - 1) )[Indyk’03] • Conjecture: 21/2-approx, O(d polylog n) space • Min-width cylinder: 18-approx, O(d) space [Chan’04]

  41. Open Problems • Range queries: • General lower bounds ? (Not just for - approximations) • (1/2) -bit bound for general queries follows from LB for dot product [Indyk-Woodruff’03] and is tight (for randomized algorithms) • What about e.g., half-space queries ? O(1/4/3) is known [STZ’04] • Other problems [STZ’04]

More Related