1 / 43

Advances in Metric Embedding Theory

UCLA IPAM 07. Advances in Metric Embedding Theory. Yair Bartal Hebrew University & Caltech. Metric Spaces. Metric space: (X,d) d:X 2 → R + d( u,v)=d(v,u) d(v,w) ≤ d(v,u) + d(u,w) d(u,u)=0 Data Representation: Pictures (e.g. faces), web pages, DNA sequences, …

viveka
Download Presentation

Advances in Metric Embedding Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UCLA IPAM 07 Advances in Metric Embedding Theory Yair Bartal Hebrew University & Caltech

  2. Metric Spaces • Metric space:(X,d) d:X2→R+ • d(u,v)=d(v,u) • d(v,w) ≤ d(v,u) + d(u,w) • d(u,u)=0 • Data Representation: Pictures (e.g. faces), web pages, DNA sequences, … • Network: communication distance

  3. Metric Embedding • Simple Representation: Translate metric data into easy to analyze form, gain geometric structure: e.g. embed in low-dimensional Euclidean space • Algorithmic Application: Apply algorithms for a “nice” space to solve problem on “problematic” metric spaces

  4. Embedding Metric Spaces • Metric spaces (X,dX), (Y,dy) • Embedding is a function f:X→Y • For an embedding f, Given u,v in X let • Distortion c = max{u,v  X} distf(u,v) / min{u,v  X} distf(u,v)

  5. Special Metric Spaces • Euclidean space • lp metric in Rn: • Planar metrics • Tree metrics • Ultrametrics • Doubling

  6. Embedding in Normed Spaces • [Fréchet Embedding]: Any n-point metric space embeds isometrically in L∞ • Proof. w x y

  7. Embedding in Normed Spaces • [Bourgain 85]: Any n-point metric space embeds in Lp with distortion Θ(log n) • [Johnson-Lindenstrauss 85]: Any n-point subset of Euclidean Space embeds with distortion (1+e) in dimension Θ(-2log n) • [ABN 06, B 06]: Dimension Θ(log n) In fact:Θ*(log n/ loglog n)

  8. EmbeddingsMetrics in their Intrinsic Dimension • Definition: A metric space X has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius. • Doubling dimension: dim(X) = log λ • [ABN 07b]: Any n point metric space X can be embedded into Lp with distortion O(log1+θn), dimensionO(dim(X)) • Same embedding, using: • nets • Lovász Local Lemma • Distortion-Dimension Tradeoff

  9. Average Distortion • Practical measure of the quality of an embedding • Network embedding, Multi-dimensional scaling, Biology, Vision,… • Given a non-contracting embedding f:(X,dX)→(Y,dY): • [ABN06]: Every n point metric space embeds into Lpwith average distortion O(1), worst-case distortion Θ(log n) and dimension Θ(log n).

  10. The lq-Distortion • lq-distortion: [ABN 06]:lq-distortion is bounded by Θ(q)

  11. Dimension Reduction into Constant Dimension • [B 07]: Any finite subset of Euclidean Space embeds in dimension h with lq-distortioneO(q/h) ~ 1+ O(q/h) • Corollary: Every finite metric space embeds into Lpin dimension h with lq-distortion

  12. Local Embeddings • Def:Ak-local embeddinghas distortion D(k) if for every k-nearest neighbors x,y: distf(x,y) ≤ D(k) • [ABN 07c]: For fixed k, k-local embedding into Lp distortion Q(log k) and dimension Q(log k) (under very weak growth bound condition) • [ABN 07c]:k-local embedding into Lp with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension Q(log n) • Same embedding method • Lovász Local Lemma

  13. Local Dimension Reduction • [BRS 07]: For fixed k, any finite set of points in Euclidean space has k-local embedding with distortion (1+e) in dimension Q(-2log k) (under very weak growth bound condition) • New embedding ideas • Lovász Local Lemma

  14. Time for a…

  15. Metric Ramsey Problem • Given a metric space what is the largest size subspace which has some special structure, e.g. close to be Euclidean • Graph theory: Every graph of size n contains either a clique or an independent set of size Q(log n) • Dvoretzky’s theorem… • [BFM 86]: Every n point metric space contains a subspace of size W(ce log n) which embeds in Euclidean space with distortion (1+e)

  16. (u) (v) (w) x z (z)=0 0 = (z)  (w)/k(v)/k2(u)/k3 d(x,z)= (lca(x,z))= (v) Basic Structures: Ultrametric, k-HST [B 96] • An ultrametrick-embedsin ak-HST (moreover this • can be done so that labels are powers of k).

  17. 1 D2 D1/ k D2 1 1 1 D2 D2 D3 D2/ k 1 D3 D3 D3 D3 D3 Hierarchically Well-Separated Trees

  18. Properties of Ultrametrics • An ultrametric is a tree metric. • Ultrametrics embed isometrically inl2. • [BM 04]:Any n-point ultrametric (1+)- embeds in lpd, whered= O(-2log n) .

  19. A Metric Ramsey Phenomenon • Consider n equally spaced points on the line. • Choose a “Cantor like” set of points, and construct a binary tree over them. • The resulting tree is 3-HST, and the original subspace embeds in this tree with distortion 3. • Size of subspace: .

  20. Metric Ramsey Phenomena • [BLMN 03, MN 06, B 06]: Anyn-point metric space contains a subspace of size which embeds in an ultrametric with distortion Θ(1/e) • [B 06]: Anyn-point metric space contains a subspace of linear size which embeds in an ultrametric with lq-distortion is bounded by Õ(q)

  21. Metric Ramsey Theorems • Key Ingredient:Partitions

  22. Complete Representation via Ultrametrics ? • Goal: Given an n point metric space, we would like to embed it into an ultrametric with low distortion. • Lower Bound:W(n), in fact this holds event for embedding the n-cycle into arbitrary tree metrics [RR 95]

  23. C Probabilistic Embedding • [Karp 89]:Then-cycle probabilistically-embeds in n-line spaces with distortion 2 • If u,vare adjacent in the cycle Cthen E(dL(u,v))= (n-1)/n + (n-1)/n < 2 = 2dC(u,v)

  24. Probabilistic Embedding • [B 96,98,04, FRT 03]:Anyn-point metric space probabilistically embeds into an ultrametric with distortion Θ(log n) [ABN 05,06, CDGKS 05]: lq-distortion is Θ(q)

  25. Probabilistic Embedding • Key Ingredient:Probabilistic Partitions

  26. η x2 x1 η Probabilistic Partitions • P={S1,S2,…St} is a partition of X if • P(x)is the cluster containing x. • P is Δ-bounded if diam(Si)≤Δfor all i. • A probabilistic partitionP is a distribution over a set of partitions. • P is (η,d)-padded if • Call Pη-padded if d=1/2. • [B 96]h=Q(1/(log n)) • [CKR01+FRT03, ABN06]: η(x)= Ω(1/log (ρ(x,Δ))

  27. Partitions and Embedding • [B 96, Rao 99, …] • Let Δi=4ibe the scales. • For each scale i, create a probabilistic Δi-boundedpartitions Pi,that are η-padded. • For each cluster choose σi(S)~Ber(½) i.i.d. fi(x)= σi(Pi(x))·d(x,X\Pi(x)) • Repeat O(log n) times. • Distortion : O(η-1·log1/pΔ). • Dimension : O(log n·log Δ). diameter of X =Δ Δi 16 4 x d(x,X\P(x))

  28. Time to…

  29. Uniform Probabilistic Partitions • In a Uniform Probabilistic Partitionη:X→[0,1] all points in a cluster have the same padding parameter. • [ABN 06]: Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any , η(x)=log-1ρ(v,Δ),where • The local growth rate of x at radius r is: C1 C2 v2 v1 v3 η(C1)  η(C2) 

  30. Embedding into a single dimension • Let Δi=4i. • For each scale i, create uniformly padded probabilistic Δi-boundedpartitions Pi. • For each cluster choose σi(S)~Ber(½) i.i.d. , fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x)) • Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). • Lower bound: E[|f(x)-f(y)|] ≥Ω(d(x,y)) • ReplicateD=Θ(log n) times to get high probability.

  31. Upper Bound:|f(x)-f(y)| ≤ O(log n) d(x,y) • For all x,yєX: - Pi(x)≠Pi(y)implies fi(x)≤ ηi-1(x)· d(x,y) - Pi(x)=Pi(y)impliesfi(x)-fi(y)≤ ηi-1(x)· d(x,y) Use uniform padding in cluster

  32. Lower Bound: y x • Take a scale i such that Δi≈d(x,y)/4. • It must be that Pi(x)≠Pi(y) • With probability ½ : ηi-1(x)d(x,X\Pi(x))≥Δi

  33. Lower bound : E[|f(x)-f(y)|] ≥ d(x,y) • Two cases: • R < Δi/2 then • prob. ⅛: σi(Pi(x))=1 and σi(Pi(y))=0 • Then fi(x) ≥Δi ,fi(y)=0 • |f(x)-f(y)| ≥Δi/2 =Ω(d(x,y)). • R ≥Δi/2 then • prob. ¼: σi(Pi(x))=0 and σi(Pi(y))=0 • fi(x)=fi(y)=0 • |f(x)-f(y)| ≥Δi/2 =Ω(d(x,y)).

  34. Partial Embedding & Scaling Distortion • Definition: A (1-ε)-partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy distf(u,v) ≤ D(ε) • Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0 • [KSW 04] • [ABN 05, CDGKS 05]: • Partial distortion and dimensionQ(log(1/ε)) • [ABN06]: Scaling distortion Q(log(1/ε)) for all metrics

  35. lq-Distortion vs. Scaling Distortion • Upper boundD(e) = c log(1/e) on Scaling distortion: • ½ of pairs have distortion ≤ c log 2 = c • + ¼ ofpairs have distortion ≤ c log 4 = 2c • + ⅛ ofpairs have distortion ≤ c log 8 = 3c • …. • Average distortion = O(1) • Wost case distortion= O(log(n)) • lq-distortion = O(min{q,log n})

  36. Coarse Scaling Embedding into Lp • Definition: For uєX, rε(u) is the minimal radius such that |B(u,rε(u))| ≥ εn. • Coarse scaling embedding: For each uєX,preserves distances to v s.t. d(u,v) ≥rε(u). rε(w) w rε(u) u rε(v) v

  37. Scaling Distortion • Claim: If d(x,y) ≥ rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε) • Let l be the scale d(x,y) ≤Δl < 4d(x,y) • Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) • Upper bound for high diameter terms • Upper bound for low diameter terms • ReplicateD=Θ(log n) times to get high probability.

  38. Upper Bound for high diameter terms:|f(x)-f(y)| ≤ O(log 1/ε) d(x,y) Scale l such that rε(x)≤d(x,y) ≤Δl < 4d(x,y).

  39. Upper Bound for low diameter terms:|f(u)-f(v)| =O(1) d(u,v) Scale l such that d(x,y) ≤Δl < 4d(x,y). • All lower levels i ≤ l are bounded by Δi.

  40. Embedding into trees with Constant Average Distortion • [ABN 07a]:An embedding of any n point metric into a single ultrametric. • An embedding of any graph on n vertices into a spanning tree of the graph. • Average distortion = O(1). • L2-distortion = • Lq-distortion = Θ(n1-2/q), for 2<q≤∞

  41. Conclusion • Developing mathematical theory of embedding of finite metric spaces • Fruitful interaction between computer science and pure/applied mathematics • New concepts of embedding yield surprisingly strong properties

  42. Summary • Unified frameworkfor embedding finite metrics. • Probabilistic embeddinginto ultrametrics. • MetricRamsey theorems. • Newmeasuresof distortion. • Embeddings with strong properties: • Optimalscaling distortion. • Constantaverage distortion. • Tightdistortion-dimensiontradeoff. • Embedding metrics intheirintrinsic dimension. • Embedding that strongly preservelocality.

More Related