Advances in Metric Embedding Theory

Advances in Metric Embedding Theory Ofer Neiman Ittai Abraham Yair Bartal Hebrew University

Talk Outline Current results: • New method of embedding. • New partition techniques. • Constant average distortion. • Extend notions of distortion. • Optimal results for scaling embeddings. • Tradeoff between distortion and dimension. Work in progress: • Low dimension embedding for doubling metrics. • Scaling distortion into a single tree. • Nearest neighbors preserving embedding.

Embedding Metric Spaces • Metric spaces (X,dX), (Y,dy) • Embedding is a function f:X→Y • For non-contracting Embedding f, Given u,v in X let • Distortion c if max{u,v  X} distf(u,v) ≤ c

Low-Dimension Embeddingsinto Lp For arbitrary metric space on n points: • [Bourgain 85]: distortion O(log n) • [LLR 95]: distortion Θ(log n) dimension O(log2 n) • Can the dimension be reduced? • For p=2, yes using [JL]to dimension O(log n) • Theorem: embedding into Lpwith distortion O(log n),dimension O(log n) for any p. • Theorem: distortion O(log1+θ n),dimension Θ(log n/ (θ loglog n))

Average Distortion Embeddings • In many practical uses, the quality of an embedding is measured by its average distortion • Network embedding • Multi-dimensional scaling • Biology • Vision • Theorem: Every n point metric space can be embedded into Lpwith average distortion O(1), worst-case distortion O(log n) and dimension O(log n).

Variation on distortion: The Lq distortion of an embedding • Given a non-contracting embedding f from (X,dX) to (Y,dY): • Define it’s Lq-distortion Thm: Lq-distortion is bounded by O(q)

Partial & Scaling Distortion • Definition: A (1-ε)-partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy dist(u,v)<D(ε). • Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0 simultaneously. • [KSW 04]: • Introduce the problem in context of network embeddings. • Initial results. • [A+ 05]: • Partial distortion and dimensionO(log(1/ε)) for all metrics. • Scaling distortion O(log(1/ε)) for doubling metrics. • Thm: Scaling distortion O(log(1/ε)) for all metrics.

Lq-Distortion Vs Scaling Distortion • Lower boundΩ(log 1/ε) on partial distortion implies: Lq-distortion = Ω(min{q,log n}). • Upper boundO(log 1/ε) on Scaling distortion implies: • Lq-distortion = O(min{q,log n}). • Average distortion = O(1). • Distortion = O(log n). • For any metric: • ½ of pairs distortion are ≤ c log(2) = c • +¼ ofpairsdistortion are ≤ c log(4)= 2c • +⅛ ofpairsdistortion are ≤ c log(8) = 3c • …. • +1/n2 ofpairsdistortion are ≤ 2c log(n) • For ε<1/n2, no pairs are ignored.

Probabilistic Partitions • P={S1,S2,…St} is a partition of X if • P(x)is the cluster containing x. • P is Δ-bounded if diam(Si)≤Δfor all i. • A probabilistic partitionP is a distribution over a set of partitions. • P is η-padded if

Partitions and Embedding • Let Δi=4ibe the scales. • For each scale i, create a probabilistic Δi-boundedpartitions Pi,that are η-padded. • For each cluster choose σi(S)~Ber(½) i.i.d. fi(x)= σi(Pi(x))·d(x,X\Pi(x)) • Repeat O(log n) times. • Distortion : O(η-1·log1/pΔ). • Dimension : O(log n·log Δ). diameter of X =Δ Δi 8 4 x d(x,X\P(x))

Upper Bound fi(x)= σi(Pi(x))·d(x,X\Pi(x)) • For all x,yєX: • Pi(x)≠Pi(y)implies d(x,X\Pi(x))≤d(x,y) • Pi(x)=Pi(y)implies d(x,A)-d(y,A)≤d(x,y)

Lower Bound: y x • Take a scale i such that Δi≈d(x,y)/4. • It must be that Pi(x)≠Pi(y) • With probability ½ :d(x,X\Pi(x))≥ηΔi • With probability ¼ : σi(Pi(x))=1 and σi(Pi(y))=0

η-padded Partitions • The parameter η determines the quality of the embedding. • [Bartal 96]:η=Ω(1/log n) for any metric space. • [Rao 99]:η=Ω(1) used to embed planar metrics into L2. • [CKR01+FRT03]:Improved partitions with η(x)=log-1(ρ(x,Δ)). • [KLMN 03]:Used to embed general + doubling metrics into Lp : distortion O(η-(1-1/p)log1/pn), dimension O(log2n) The local growth rate of x at radius r is:

Uniform Probabilistic Partitions • In a Uniform Probabilistic Partition η:X→[0,1] • All points in a cluster have the same padding parameter. • Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any , η(x)=log-1ρ(v,Δ),where C1 C2 v2 v1 v3 η(C1)  η(C2) 

Embeddinginto one dimension • Let Δi=4i. • For each scale i, create uniformly padded probabilistic Δi-boundedpartitions Pi. • For each cluster choose σi(S)~Ber(½) i.i.d. , fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x)) • Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). • Lower bound: E[|f(x)-f(y)|] ≥Ω(d(x,y)) • ReplicateD=Θ(log n) times to get high probability.

Upper Bound:|f(x)-f(y)| ≤ O(log n) d(x,y) • For all x,yєX: - Pi(x)≠Pi(y)implies fi(x)≤ ηi-1(x)· d(x,y) - Pi(x)=Pi(y)impliesfi(x)-fi(y)≤ ηi-1(x)· d(x,y) Use uniform padding in cluster

Lower Bound: y x • Take a scale i such that Δi≈d(x,y)/4. • It must be that Pi(x)≠Pi(y) • With probability ½ : fi(x)= ηi-1(x)d(x,X\Pi(x))≥Δi

Coarse Scaling Embedding into Lp • Definition: For uєX, rε(u) is the minimal radius such that |B(u,rε(u))| ≥εn. • Coarse scaling embedding: For each uєX,preserves distances outsideB(u,rε(u)). rε(w) w rε(u) u rε(v) v

Scaling Distortion • Claim: If d(x,y) > rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε) • Let l be the scale d(x,y) ≤Δl < 4d(x,y) • Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) • Upper bound for high diameter terms • Upper bound for low diameter terms • ReplicateD=Θ(log n) times to get high probability.

Upper Bound for high diameter terms:|f(x)-f(y)| ≤ O(log 1/ε) d(x,y) Scale l such that rε(x)≤d(x,y) ≤Δl < 4d(x,y).

Upper Bound for low diameter terms:|f(u)-f(v)| =O(1) d(u,v) Scale l such that d(x,y) ≤Δl < 4d(x,y). • All lower levels i ≤ l are bounded by Δi.

Embedding into Lp • Partition P is (η,δ)-padded if • Lemma: there exists (η,δ)-padded partitions with η(x)=log-1(ρ(v,Δ))·log(1/δ), where v=minuєP(x){ρ(u,Δ)}. • Hierarchical partition : every cluster in level i is a refinement of cluster in level i+1. • Theorem: Every n point metric space can be embedded into Lp with dimension O(ep log n). For every q:

Embedding into Lp • Embedding into Lp with scaling distortion: • Use partitions with small probability of padding : δ=e-p. • Hierarchical Uniform Partitions. • Combination with Matousek’s sampling techniques.

Low Dimension Embeddings • Embedding with distortion O(log1+θ n),dimension Θ(log n/ (θ loglog n)). • Optimal trade-off between distortion and dimension. • Use partitions with high probability of padding : δ=1-log-θn.

Additional Results: Weighted Averages • Embedding with weighted average distortion O(log Ψ) for weights with aspect ratio Ψ • Algorithmic applications: • Sparsest cut, • Uncapacitated quadratic assignment, • Multiple sequence alignment.

Low Dimension EmbeddingsDoubling Metrics • Definition: A metric space has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius. • Doubling dimension = log λ. • [GKL03]:Embedding doubling metrics, with tight distortion. • Thm: Embedding arbitrary metrics into Lp with distortion O(log1+θn), dimensionO(log λ). • Same embedding, with similar techniques. • Use nets. • Use Lovász Local Lemma. • Thm: Embedding arbitrary metrics into Lp with distortion O(log1-1/pλ·log1/p n), dimension Õ(log n·logλ). • Use hierarchical partitions as well.

Scaling Distortion into trees • [A+ 05]:ProbabilisticEmbedding intoa distribution of ultrametrics with scaling distortion O(log(1/ε)). • Thm: Embedding into an ultrametric with scaling distortion . • Thm: Every graph contains a spanning tree with scaling distortion . • Imply : • Average distortion = O(1). • L2-distortion = • Can be viewed as a network design objective. • Thm: ProbabilisticEmbedding intoa distribution of spanning trees with scaling distortion Õ(log2(1/ε)).

New Results:Nearest-Neighbors Preserving Embeddings • Definition: x,y are k-nearest neighbors if |B(x,d(x,y))|≤k. • Thm: Embedding into Lp with distortion Õ(log k) on k-nearest neighbors, for all k simultaneously, and dimension O(log n). • Thm: For fixed k, embedding into Lp distortion O(log k) and dimension O(log k). • Practically the same embedding. • Every level is scaled down, higher levels more aggressively. • Lovász Local Lemma.

Nearest-Neighbors Preserving Embeddings • Thm: Probabilistic embedding into a distribution of ultrametrics with distortion Õ(log k) for all k-nearest neighbors. • Thm: Embedding into an ultrametric with distortion k-1 for all k-nearest neighbors. • Applications : • Sparsest-cut with “neighboring” demand pairs. • Approximate ranking / k-nearest neighbors search.

Conclusions • Unified framework for embedding arbitrary metrics. • New measures of distortion. • Embeddings with improved properties: • Optimal scaling distortion. • Constant average distortion. • Tight distortion-dimension tradeoff. • Embedding metrics in their doubling dimension. • Nearest-neighbors preserving embedding. • Constant average distortion spanning trees.

Advances in Metric Embedding Theory