The Intrinsic Dimension of Metric Spaces

The Intrinsic Dimension of Metric Spaces Anupam GuptaCarnegie Mellon University 3ème cycle romand de Recherche Opérationnelle, 2010 Seminar, Lecture #4

Metric space M = (V, d) set V of points symmetric non-negative distances d(x,y) triangle inequality d(x,y) ≤ d(x,z) + d(z,y) d(x,x) = 0 y z x

in the previous two lectures… We saw: every metric is (almost) a (random) tree every metric is (almost) an Euclidean metric every Euclidean metric is (almost) a low-dimensional metric Today: how to measure the complexity of a metric space and doing “better” dimension reduction

“better” dimension reduction Can we do “better” than random projections?

tight bounds on dimension reduction Theorem [1984] Given any n-point subset V of Euclidean space, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). Theorem (Alon): Lower bound of (log n/(²2 log ²-1)) again for the “uniform/equidistant” metric Un

so what do we do now? Note that these are uniform results. (“All n-point Euclidean metrics have property blah(n) …”) Can we give a better “per-instance” guarantee? (“For any n-point Euclidean metric M, we get property blah(M) …”) Note: If the metric contains a copy of equidistant metric Ut we know we need (log t) dimensions if we want O(1) distortion Similar lower bounds if our metric contains almost equidistant metrics But are these the only obstructions to low-dimensionality?

more nuanced results To paraphrase the results recent proven: If a Euclidean metric embeds into Rk for some dimension k with distortion O(1) then we can find an embedding into RO(k) with distortion O(log n) [Chan G. Talwar ’08, Abraham Bartal Neiman ’08] JL says: we can find an embedding into RO(log n) with distortion O(1)

here’s how we do it come up with convenient ways of capturing the property that there are no large near-equidistant metrics formally: define a notion of “intrinsic dimension” of a point setin Euclidean space in fact, our definitions will hold for anymetric space, not just for point sets in Euclidean space. And we will make use of this generality to get good algos…

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also, get “better” dimension reduction for metrics with low doubling dimension

pointwise dimension The pointwise dimension of a metric is at most k if for all points x, number of points within distance 2rof point x ≤ 2k (Number of points within distance r of point x) x r

pointwise dimension captures sets of regularly spaced points in Rk

using it for near-neighbor searching Given a metric space M = (X, d) and a subset S of X preprocess it so that given query point q we can return point in S (approximately) closest to q.

using it for near-neighbor searching Query point q. Suppose I know a point x at distance d(q,x) = r. If I find a point y at distance r/2 from q,  have made progress. Allowed operations: Can sample points from “balls” around any point in the original point set S. (sample from points in S  Ball(point x, radius r))

algorithm by picture Query point q. Suppose I know a point x at distance d(q,x) = r. Want to find a point y at distance r/2 from q. 3r/2 x r r/2 q

algorithm by picture Query point q. Suppose I know a point x at distance d(q,x) = r. Want to find a point y at distance r/2 from q. Suppose we sample from B(x, 3r/2) to find such a point y 3r/2 Bad case: Most points in this ball lie close to x x r Not possible! This would cause high “growth rate”!!! r/2 q

algorithm by picture |B(x,3r/2)| ≤ |B(q, 5r/2)| ≤ |B(q, 4r)| ≤ 23k |B(q,r/2)|  Probability of hitting a good point ≥ 1/23k 3r/2 Near-neighbor Algorithm: 1. Let x = closest point seen so far, with distance r = d(x,q) 2. Pick 23ksamples from B(x, 5r/2) 3. If see some point at distance ≤ r/2 from q, go to line1 else output closest point seen so far x r r/2 q

pointwise dimension used widely Used to model “tractable/simple” networks by [KamounKleinrock], [Kleinberg Tardos], [PlaxtonRajaramanRicha], [KargerRuhl], [BeygelzimerKakade Langford]… x r

but… Drawback: the definition is not closed under taking subsets (A subset of a low-dimensional metric can be high dimensional.) r  O(lg r2) 2-dimensional set

and that’s not all Pointwise dimension is a somewhat restrictive notion (unclear if “real” metrics have low pointwise dimension) Would like something a bit more robust and general

new notion: doubling dimension DimensiondimD(M) is at most ksuch that for any set S, if S has diameter DS it can be covered by 2k sets of diameter ½DS D

new notion: doubling dimension DimensiondimD(M) is at most k such that for any set S, if S has diameter DS it can be covered by 2k sets of diameter ½DS D

doubling generalizes geometric dimension Take k-dim Euclidean space Rk Claim: dimD(Rk) ≈ Θ(k) Easy to see for boxes Argument for spheres a bit more involved. 23 boxes to cover larger box in R3

“doubling metrics” Dimension at most k if every set S with diameter DS can be covered by 2k sets of diameter ½DS A family of metric spaces is called “doubling” if there exists a constant k such thatthe doubling dimension of these metrics is bounded by k

what is not a doubling metric? The equidistant metric Ut on t points has dimension (log t) Hence low doubling dimension captures the fact that the metric does not have large equidistant metrics.

doubling generalizes pointwise dim Fact: If a metric has pointwise dimension k, it has doubling dimension O(k)

the picture thus far…

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also get “better” dimension reduction for metrics with low doubling dimension

some useful properties of doubling dimension

this 2-dim set has (/d)2 points   small near-equidistant metrics (restated) Fact: Suppose a metric (X,d) has doubling dimension k. If any subset S µ X of points has all inter-point distances lying between ± and ¢ there are at most (¢/±)O(k) points in S.

the (simple) proof

advantages of this fact Thm:Doubling metrics admit O(dim(M))-padded decompositions Useful wherever padded decompositions are useful E.g.: can prove that all doubling metrics embed into ℓ2 with distortion

btw, just to check Natural Q:Do all doubling metrics embed into ℓ2 with distortion O(1)?

a substantial generalization Many geometric algorithms can be extended to doubling spaces… Small-world networks Traveling Salesman Sparse Spanners Approx. inference Network Design Clustering problems Well-separated pair decomposition Data structures Learnability Near neighbor search Compact routing Distance labeling Network triangulation Sensor placements

example application Assign labels L(x) to each host x in a metric space Looking just at L(x) and L(y), can infer distance d(x,y) Results labels with (O(1)/ε)dim × log n bits estimates within (1 + ε) factor y 010001 010001 Contrast with lower bound of n bit labels in general for any factor < 2 110001 110001 x f( , ) ≈ d(x,y)

another example [Arora 95] showed that TSP on Rk was (1+²)-approximable in time [Talwar 04] extended this result to metrics with doubling dimension k

example in action: sparse spanners for doubling metrics[Chan G. Maggs Zhou]

spanners Given a metric M = (V, d), a graph G = (V, E) is an (m, ²)-spanner if 1) number of edges in G is m 2) d(x,y) ≤ dG(x,y) ≤ (1 + ²) d(x,y) A reasonable goal: (?) ² = 0.1, m = O(n) Fact: For the equidistant metric Un, if ² < 1 then G = Kn

spanners for doubling metrics Theorem: [Chan G. Maggs Zhou] Given any metric M, and any ² < ½, we can efficiently find an (m, ²)-spanner G with m = n (1 + 1/²)dimD(M) Hence, for doubling metrics, linear-sized spanners! Independently proved by [HarPeled Mendel], they give better runtime Generalizes a similar theorem for Euclidean metricsdue to [Arya Das Narasimhan]

standard tool: nets Nets:A set of points N is an r-net of a set S if • d(u,v) ≥ r for any u,v2 N • For every w 2 S \ N, there is a u 2 Nwith d(u,w) < r r

standard tool: nets Nets:A set of points N is an r-net of S if • d(u,v) ≥ r for any u,v2 N • For every w 2 S \ N, there is a u 2 Nwith d(u,w) < r Fact: If a metric has doubling dim k and N is an r-net ) B(x,2r) Å N ·O(1)k

recursive nets 16 8 4 2 Suppose all the points were at least unit distance apart so you take a 2-net N1 of these points Now you can take a 4-net N2 of this net And so on…

recursive nets N4 N3 N2 N1 N0 = V Nt is a 2t-net of the set Nt-1  Nt is a 2t+1-net of the set V (almost)

the spanner construction N4 N3 N2 N1 N0 = V Nt is a 2t-net of the set Nt-1  Nt is a 2t+1-net of the set V (almost)

the sparsity

the stretch

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also get “better” dimension reduction for metrics with low doubling dimension

want to improve on J-L Theorem [1984] Given any n-point subset V of Euclidean space, one can map it into O(log n/²2)-dimensional space while incurring distortion at most (1+²). Want map to depend on the metric space. (JL just computes a random projection.) Fact: we need a non-linear map.

back to dimensionality reduction To paraphrase the best currently-known results: If a Euclidean metric embeds into Rk for some dimension k with distortion O(1) the metric has doubling dimension O(k) we can find an embedding into RO(k) with distortion O(log n)  

new: “per-instance” bounds Theorem: [Chan G. Talwar] Any metric with doubling dimension k embeds intoEuclidean space with T dimensions with distortion (where T 2 [ k log log n, log n]) Independently, [Abraham Bartal Neiman] get similar results.

special cases of interest If the metric is doubling, this quantity is sqrt{log n}. In general, this is never more than O(log n). Again generalizes the previous result. This generalizes result we talked about in Lecture #2: any metric embeds into Euclidean space with O(log n) distortion This is just the Johnson-Lindenstrauss lemma.

The Intrinsic Dimension of Metric Spaces

The Intrinsic Dimension of Metric Spaces

Presentation Transcript

Algorithmic Aspects of Finite Metric Spaces

Cover Trees For Nearest Neighbour Search in Metric Spaces

Analysis of the intrinsic profitability of segments

THE INTRINSIC VALUE OF CULTURE

E fficient similarity search in metric and nonmetric spaces

A semantic similarity metric combining features and intrinsic information content

Lower Bounds on the Distortion of Embedding Finite Metric Spaces in Graphs

Estimation of the Intrinsic Dimension

Scalable and Distributed Similarity Search in Metric Spaces

Maximum likelihood estimation of intrinsic dimension

Tighter local versus global properties of metric spaces

The Intrinsic Silicon

Hidden Metric Spaces and Navigability of Complex Networks

Local and Global Embeddings of Metric Spaces

NM-Tree : Flexible Approximate Similarity Search in Metric and Non-metric Spaces

Estimating Intrinsic Dimension

Embedding Metric Spaces in Their Intrinsic Dimension

Compact Metric Spaces as Minimal Subspaces of Domains of Bottomed Sequences

Intrinsic Robustness of the Price of Anarchy

Some New Fixed Point Theorems on S Metric Spaces