150 likes | 276 Views
Stream-based Geometric Algorithms. Piotr Indyk MIT. Streaming Algorithms for Geometric Problems. Input: a stream S=p 1 …p n of points in R d Goal: compute certain geometric quantity and/or structure Variations: Dynamic case: points can be deleted
E N D
Stream-based Geometric Algorithms Piotr Indyk MIT
Streaming Algorithms for Geometric Problems • Input: a stream S=p1…pn of points in Rd • Goal: compute certain geometric quantity and/or structure • Variations: • Dynamic case: points can be deleted • Sliding window: points disappear after some time t
Minimum Spanning Tree • The tree has representation size (n) • We only estimate the cost of MST
Facility Location • Goal: choose a set F of facilities to minimize the • sum of the distances to nearest facility plus • the number of facilities times f
K-median • K is given • Goal: choose K medians to minimize the sum of • the distances to the nearest median
Known Results • Computing Lp norms of a stream (Graham’s talk) • Clustering of points in metric spaces • Charikar et al ’97, ’03; Guha et al’00: • K-center and K-median • (K) space, no deletions • Meyerson’02: • Facility location • (|F|) space, no deletions
More of Known Results • Approximate diameter etc • Indyk’03: high dimensions • Feigenbaum et al, Hershberger et al, Cormode et al’03: low dimensions • Convex hulls etc
Our Results *follows Charikar’02; also Varadarajan’02 and Indyk-Thaper’02
Applications • MST, MWM: ? • MWBM: similarity of low-dim data sets • Fac. Loc. : “clusterability” of a data set • K-median: allocation of servers to clients (Muthu’03) • log D might be not so bad in practice (1.1 in Indyk-Thaper’03)
Approach • Impose square grids G0…Gk, with side lengths 20,21, …, 2k , shifted at random. • For each square cell c in Gi, let nP(c) be the number of points from P in c. • The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 1 2 1 3 1 1 3
Estimators • MST: ∑i 2i ∑c Gi [nP(c)>0] • MWM: ∑i 2i ∑c Gi [nP(c) is odd] • MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| • Fac. Loc.: ∑i 2i ∑c Gi min[nP(c), Ti] • K-median: ∑c Bj nP(c) for B1…Bl sampled from Gi’s with density 1/K
Proofs • View the grids as a probabilistic embedding of P into a tree (HST’s) • Show how to solve the problem in HST’s • Show how to express the solution using just nP(c)’s • First application of this kind of embeddings to streaming
Conclusions and Open Problems • Replace log D by O(1) • Other apps ?