1 / 25

A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction. Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Data As High-Dimensional Vectors.

Download Presentation

A Nonlinear Approach to Dimension Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  2. Data As High-Dimensional Vectors • Data is often represented by vectors in Rd • Image  color or intensity histogram • Document  word frequency • A typical goal – Nearest Neighbor Search: • Preprocess data, so that given a query vector, quickly find closest vector in data set. • Common in various data analysis tasks – classification, learning, clustering. A Nonlinear Approach to Dimension Reduction

  3. Curse of Dimensionality [Bellman’61] • Cost of maintaining data is exponential in dimension • This observation extends to many useful operations • Nearest Neighbor Search [Clarkson’94] • Dimension reduction = represent high-dimensional data in a low-dimensional space • Map given vectors into a low-dimensional space, while preservingmost of the data • Goal: Trade-off accuracy for computational efficiency • Common interpretation: preserve pairwise distances A Nonlinear Approach to Dimension Reduction

  4. The JL Lemma Theorem [Johnson-Lindenstrauss, 1984]: For every n-point set X½Rm and 0<<1, there is a map :XRk, for k=O(-2log n),that preserves all distances within 1+: ||x-y||2 < ||(x)-(y)||2 < (1+) ||x-y||2, 8x,y2X. • Can be realized by a simple linear transformation • A random k£d matrix works – entries from {-1,0,1} [Achlioptas’01] or Gaussian [Gupta-Dasgupta’98,Indyk-Motwani’98] • Applications in a host of problems in computational geometry • Can we do better? • A nearly-matching lower bound was given by [Alon’03] • But it’s existential… A Nonlinear Approach to Dimension Reduction

  5. Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant(of a metric M) is the minimum value ¸>0such that every ball can be covered by ¸balls of half the radius • First used by [Assouad’83], algorithmically by [Clarkson’97] • The doubling dimension is dim(M)=log ¸(M) [Gupta-K.-Lee’03] • Applications: • Approximate nearest neighbor search [Clarkson’97, K.-Lee’04,…, Cole-Gottlieb’06, Indyk-Naor’06,…] • Spanners [Talwar’04,…,Gottlieb-Roditty’08] • Compact Routing [Chan-Gupta-Maggs-Zhou’05,…] • Network Triangulation [Kleinberg-Slivkins-Wexler’04,…] • Distance oracles [HarPeled-Mendel’06] • Embeddings [Gupta-K.-Lee’03, K.-Lee-Mendel-Naor’04, Abraham-Bartal-Neiman’08] Here ≤7. A Nonlinear Approach to Dimension Reduction

  6. A Stronger Version of JL? Theorem [Johnson-Lindenstrauss, 1984]: Every n-point set X½l2 and 0<<1, has a linear embedding :Xl2k, for k=O(-2log n),such that for all x,y2X, ||x-y||2 < ||(x)- (y)||2 < (1+) ||x-y||2. • The near-tight lower bound of [Alon’03] is existential • Holds for X=uniform metric, where dim(X)=log n • Open: Extend to spaces with doubling dimension¿log n? • Open: JL-like embedding into dimension k=O(dim(X))? • Even constant distortion would be interesting [Lang-Plaut’01,Gupta-K.-Lee’03]: • Cannot be attained by linear transformations [Indyk-Naor’06] • Example [Kahane’81, Talagrand’92]: xj = (1,1,…,1,0,…,0) 2 Rn (Wilson’s helix). I.e. ||xi-xj|| = |i-j|1/2. distortion = 1+². We present two partial resolutions, using Õ(dim2(X)) dimensions: 1. Distortion 1+ for a single scale, i.e. pairs where ||x-y||2[r,r]. 2. Global embedding of the snowflakemetric, ||x-y||½. 2’. Conjecture correct whenever ||x-y||2 is a Euclidean metric. A Nonlinear Approach to Dimension Reduction

  7. I. Embedding for a Single Scale • Theorem 1. For every finite subset X½l2, and all 0<<1, r>0, there is embedding f:Xl2k for k=Õ(log(1/)(dim X)2), satisfying 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X • Compared to open question: • Bi-Lipschitz only at one scale (weaker) • But achieves distortion = absolute constant (stronger) • This talk: illustrate the proof for constant distortion • The 1+  distortion is later attained for distances 2[’r, r] • Overall approach: divide and conquer A Nonlinear Approach to Dimension Reduction

  8. Covering radius: r Packing distance: r Step 1: Net Extraction • Extract from the point set X a r-net • Net properties: • r-Covering • r-Packing Packing ) a ball of radius s contains O((s/r)dim(X)) net points • We shall build a good embedding for net points • Why can we ignore non-net points? A Nonlinear Approach to Dimension Reduction

  9. Step 1: Net Extraction and Extension • Recall: ||f||Lip = min {L¸0: ||f(x)-f(y)|| · L||x-y|| for all x,y} • Lipschitz Extension Theorem [Kirszbraun’34]: For every X½l2, every map f:Xl2k can be extended to f’:l2l2k, such that • ||f’||Lip· ||f||Lip. • Therefore, a good embedding just for the net points suffices • Smaller net resolution  less distortion for the non-net points ¸r f’ f ¸9r ¸3r A Nonlinear Approach to Dimension Reduction

  10. Step 2: Padded Decomposition • Partition the space probabilistically into clusters • Properties [Gupta-K.-Lee’03, Abraham-Bartal-Neiman’08]: • Cluster diameter: bounded by dim(X)¢r. • Size: By the doubling property, bounded by (dim(X)/)dim(X) • Padding: Each point is r-padded with probability > 9/10 • Support: O(dim(X)) partitions r-padded ≤ dim(X)¢r not r-padded A Nonlinear Approach to Dimension Reduction

  11. Step 3: JL on Individual Clusters • For each partition, consider each individual cluster • Reduce dimension using the JL lemma • Constant distortion • Target dimension (logarithmic in size): O(log((dim(X)/)dim(X)) = Õ(dim(X)) • Then translate some point to the origin JL A Nonlinear Approach to Dimension Reduction

  12. The story so far… • To review • Step 1: Extract net points • Step 2: Build family of partitions • Step 3: For each partition, apply in each cluster JL and translate to origin • Embedding guarantees for a single partition • Intracluster distance: Constant distortion • Intercluster distance: • Min distance: 0 • Max distance: dim(X)¢r Not good enough • Let’s backtrack… A Nonlinear Approach to Dimension Reduction

  13. The story so far… • To review • Step 1: Extract net points • Step 2: Build family of partitions • Step 3: For each partition, apply in each cluster a Gaussian transform • Step 4: For each partition, apply in each cluster JL and translate to origin • Embedding guarantees for a single partition • Intracluster distance: Constant distortion • Intercluster distance: • Min distance: 0 • Max distance: dim(X)¢r • Not good enough when ||x-y||≈r • Let’s backtrack… A Nonlinear Approach to Dimension Reduction

  14. Step 3: Gaussian Transform • For each partition, apply within each cluster the Gaussian transform (kernel) to distances [Schoenberg’38] • G(t) = (1-e-t2)1/2 • “Adaptation” to scale r: Gr(t) = r(1-e-t2/r2)1/2 • A map f:l2l2 such that ||f(x)-f(y)|| = Gr(||x-y||) • Threshold: Cluster diameter is at most r (Instead of dim(X)¢r) • Distortion: Small distortion of distances in relevant range • Transform can increase dimension… but JL is the next step A Nonlinear Approach to Dimension Reduction

  15. Step 4: JL on Individual Clusters • Steps 3 & 4: • New embedding guarantees • Intracluster: Constant distortion • Intercluster: • Min distance: 0 • Max distance: r (instead of dim(X)¢r) • Caveat: Still not good enough when ||x-y||≈r • Also smooth the map near cluster boundaries Gaussian JL smaller diameter smaller dimension A Nonlinear Approach to Dimension Reduction

  16. Step 5: Glue Partitions • We have an embedding for each partition • For padded points, the guarantees are perfect • For non-padded points, the guarantees are weak • “Glue” together embeddings for different partitions • Concatenate all dim(X) embeddings (and scale down) • Non-padded case occurs 1/10 of the time, so it gets “averaged away” ||F(x)-F(y)||2 = ||f1(x)-f1(y)||2 + … + ||fm(x)-fm(y)||2¼ m¢||x-y||2 • Final dimension = Õ(dim2(X)): • Number of partitions: O(dim(X)) • Dimensions for partitioning: Õ(dim(X)) f1(x) = (1,7), f2(x) = (2,8), f3(x) = (4,9) ) F(x) = f1(x)  f2(x)  f3(x) = (1,7,2,8,4,9) A Nonlinear Approach to Dimension Reduction

  17. Step 6: Kirszbraun Extension Theorem • Kirszbraun’s theorem extends embedding to non-net points without increasing dimension Embedding Embedding + K. A Nonlinear Approach to Dimension Reduction

  18. Single Scale Embedding – Review • Steps: • Net extraction • Padded Decomposition • Gaussian Transform • JL • Glue partitions • Extension theorem • Theorem 1: • Every finite X½l2 admits embedding f:Xl2k for k=Õ((dim X)2), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X A Nonlinear Approach to Dimension Reduction

  19. Single Scale Embedding – Strengthened • Steps: • Net extraction  nets • Padded Decomposition Padding probability 1-. • Gaussian Transform • JL Already 1+ distortion • Glue partitions Higher percentage of padded points • Extension theorem • Theorem 1: • Every finite X½l2 admits embedding f:Xl2k for k=Õ((dim X)2), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Gaussian at scale r: ||f(x)-f(y)||=(1±)Gr(||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X A Nonlinear Approach to Dimension Reduction

  20. II. Snowflake Embedding • Theorem 2. For all 0<<1, every finite subset X½l2 admits an embedding F:Xl2k, for k=Õ(-4(dim X)2), with distortion 1+ for the snowflake metric ||x-y||½. • We’ll illustrate the construction for constant distortion. • The snowflake technique is due to [Assouad’83] • We give the first implementation achieving distortion 1+. A Nonlinear Approach to Dimension Reduction

  21. II. Snowflake Embedding • Theorem 2. For every 0<<1 and finite subset X½l2 there is an embedding F:Xl2k of the snowflake metric ||x-y||½ achieving dimension k=Õ(-4(dim X)2) and distortion 1+, i.e. • Compared to open question: • We embed the snowflake metric (weaker) • But achieve distortion 1+ (stronger) • We generalize [Kahane’81, Talagrand’92] who study Euclidean embedding of Wilson’s helix (real line w/distances |x-y|1/2) • We’ll illustrate the construction for constant distortion • The snowflake technique is due to [Assouad’83] • We give first implementation that achieves distortion 1+. A Nonlinear Approach to Dimension Reduction

  22. Assouad’s Technique • Basic idea. • Consider single scale embeddings for all r=2i • Fix points x,y2X, and suppose ||x-y||≈s • r = 16s • r = 8s • r = 4s • r = 2s • r = s • r = s/2 • r = s/4 • r = s/8 • r = s/16 y x Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| = s Gaussian: ||f(x)-f(y)||=(1±)Gr(||x-y||) Boundedness: ||f(x)|| ≤ r A Nonlinear Approach to Dimension Reduction

  23. Assouad’s Technique • Now scale down each embedding by r½ (snowflake) • r = 16s s  s½/4 • r = 8s s  s½/8½ • r = 4s s  s½/2 • r = 2s s  s½/2½ • r = s s  s½ • r = s/2 s/2  s½/2½ • r = s/4 s/4  s½/2 • r = s/8 s/8  s½/8½ • r = s/16 s/16  s½/4 ||f(x)-f(y)||/r½ ||f(x)-f(y)|| Combine these embeddings by addition (and staggering) A Nonlinear Approach to Dimension Reduction

  24. Snowflake Embedding – Review • Steps: • Compute single scale embeddings for all r=2i • Scale-down embedding for r by r½ • Combine embeddings by addition (with some staggering) • By taking more refined scales (powers of 1+ instead of 2) and further staggering, can achieve distortion 1+ for the snowflake • Theorem 2. For all 0<<1, every subset X½l2 embeds into l2k for k=Õ(-4(dim X)2), with distortion 1+ to the snowflake A Nonlinear Approach to Dimension Reduction

  25. Conclusion • Gave two (1+)-distortion low-dimension embeddings for doubling l2-spaces • Single scale • Snowflake • This framework can be extended to l1 and l∞. Some obstacles: • Dimension reduction: Can’t use JL • Lipschitz extension: Can’t use Kirszbraun • Threshold: Can’t use Gaussian transform • Many of the steps in the single-scale embedding are nonlinear, although most “localities” are mapped (near) linearly • Explain empirical success (e.g. Locally Linear Embeddings)? • Applications? • Clustering is one potential area … Thank you! A Nonlinear Approach to Dimension Reduction

More Related