slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Near-Isometric Linear Embeddings PowerPoint Presentation
Download Presentation
Learning Near-Isometric Linear Embeddings

Loading in 2 Seconds...

play fullscreen
1 / 66

Learning Near-Isometric Linear Embeddings - PowerPoint PPT Presentation


  • 134 Views
  • Updated on

Learning Near-Isometric Linear Embeddings. Chinmay Hegde MIT Aswin Sankaranarayanan CMU Wotao Yin UCLA Edward Snowden Ex-NSA. Richard Baraniuk Rice University. NSA PRISM. 4972 Gbps. Source: Wikipedia.org. NSA PRISM. 4972 Gbps. Source: Wikipedia.org. NSA PRISM.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Learning Near-Isometric Linear Embeddings


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Learning Near-Isometric Linear Embeddings ChinmayHegde MIT AswinSankaranarayanan CMU Wotao Yin UCLA Edward Snowden Ex-NSA Richard Baraniuk Rice University

    2. NSA PRISM 4972 Gbps Source: Wikipedia.org

    3. NSA PRISM 4972 Gbps Source: Wikipedia.org

    4. NSA PRISM Source: Wikipedia.org

    5. NSA PRISM Source: Wikipedia.org

    6. NSA PRISM DIMENSIONALITY REDUCTION Source: Wikipedia.org

    7. Large Scale Datasets

    8. Intrinsic Dimensionality • Why? Geometry, that’s why • Exploit to perform more efficientanalysis and processing of large-scale data Intrinsic dimension << Extrinsic dimension!

    9. Dimensionality Reduction Goal: Create a (linear) mapping from RN to RM with M < N that preserves the key geometric properties of the data ex: configuration of the data points

    10. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry

    11. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry • Approach 1: PCA via SVD of training signals • find average best fitting subspace in least-squares sense • average error metric can distortpoint cloud geometry

    12. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP

    13. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP • but not the Restricted Itinerary Property [Maduro, Snowden ’13]

    14. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data

    15. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

    16. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

    17. Why Near-Isometry? • Sensing • guarantees existence of a recoveryalgorithm • Machine learning applications • kernelmatrix depends only on pairwise distances • Approximate nearest neighbors for classification • efficient dimensionality reduction

    18. Existence of Near Isometries • Johnson-LindenstraussLemma • Given a set of Q points, there exists a Lipchitz map that achieves near-isometry (with constant ) provided • Random matrices with iidsubGaussian entries work • c.f. so-called “compressive sensing” [J-L, 84] [Frankl and Meahara, 88][Indyk and Motwani, 99] [Achlioptas, 01][Dasgupta and Gupta, 02]

    19. L1 Energy http://dealbook.nytimes.com/2013/06/28/oligarchs-assemble-team-for-oil-deals/?_r=0 L1 Energy

    20. Existence of Near Isometries • Johnson-LindenstraussLemma • Given a set of Q points, there exists a Lipchitz map that achieves near-isometry (with constant ) provided • Random matrices with iidsubGaussian entries work • c.f. so-called “compressive sensing” • Existence of solution! • but constants are poor • oblivious to data structure [J-L, 84] [Frankl and Meahara, 88][Indyk and Motwani, 99] [Achlioptas, 01][Dasgupta and Gupta, 02]

    21. Near-Isometric Embedding • Q. Can we beat random projections? • A. … • on the one hand: lower bounds for JL [Alon ’03]

    22. Near-Isometric Embedding • Q. Can we beat random projections? • A. … • on the one hand: lower bounds for JL [Alon ’03] • on the other hand: carefully constructed linearprojections can often do better • Our quest: An optimization based approach for learning“good” linear embeddings

    23. Normalized Secants • Normalized pairwise vectors[Whitney; Kirby; Wakin, B ’09] • Goal is to approximately preserve the length of • Obviously, projecting in direction of is a bad idea

    24. Normalized Secants • Normalized pairwise vectors • Goal is to approximately preserve the length of • Note: total number of secants is large:

    25. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that Erratum alert: we will use Qto denote both the number of data points and the number of secants

    26. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that

    27. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that

    28. Lifting Trick • Convert quadratic constraints in into linearconstraints in • After designing , obtain via matrix square root

    29. Relaxation • Convert quadratic constraints in into linearconstraints in Relax rank minimization to nuclear norm minimization

    30. NuMax • Semi-Definite Program (SDP) • Nuclear norm minimization with Max-norm constraints (NuMax) • Solvable by standard interior point techniques • Rank of solution is determined by

    31. Practical Considerations • In practice Nlarge, Q very large! • Computational cost per iterationscales as

    32. Solving NuMax • Alternating Direction Method of Multipliers (ADMM) • - solve for P using spectral thresholding • - solve for L using least-squares • - solve for q using “clipping” • Computational/memory cost per iteration:

    33. Accelerating NuMax • Poor scaling with N and Q • least squares involves matrices with Q2 rows • SVD of an NxN matrix • Observation 1 • intermediate estimates of P are low-rank • use low-rank representation to reduce memory and accelerate computations • use incremental SVD for faster computations

    34. Accelerating NuMax • Observation 2 • by KKT conditions, by complementary slackness, only constraints that are satisfied with equality determine solutions (“active constraints”) Analogy: Recall support vector machines (SVMs)., where we solve The solution is determined only by the support vectors – those for which

    35. NuMax-CG • Observation 2 • by KKT conditions, by complementary slackness, only constraints that are satisfied with equality determine solutions (“active constraints”) • Hence, given feasibility of a solution P*, only secants vkfor which |vkTP*vk– 1| = determine the value of P* • Key: Number of “support secants” << total number of secants • and so we only need to track the support secants • “column generation” approach to solving NuMax

    36. Computation Time Can solve for datasetswith Q=100k points in N=1000 dimensions in a few hours

    37. Squares – Near Isometry • Images of translating blurred squares live on a K=2 dimensional smooth manifold in N=256 dimensional space • Project a collection of these images into M-dimensional space while preserving structure(as measured by isometry constant ) N=16x16=256

    38. Squares – Near Isometry • M=40 linear measurements enough to ensure isometryconstant of = 0.01 N=16x16=256

    39. Squares – Near Isometry

    40. Squares – Near Isometry

    41. Squares – Near Isometry

    42. Squares – Near Isometry

    43. Squares – CS Recovery • Signal recovery in AWGN N=16x16=256

    44. MNIST (8) – Near Isometry N=20x20=400 M = 14 basis functions achieve = 0.05

    45. MNIST (8) – Near Isometry N=20x20=400

    46. MNIST – NN Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images • Nearest neighbor (NN) classifier • Test on 10000 images • Miss-classification rate of NN classifier: 3.63%

    47. MNIST – Naïve NuMax Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images, so >1.8 billion secants! • NuMax-CG took 3-4 hours to process • Miss-classification rate of NN classifier: 3.63% • NuMax provides the best NN-classification rates

    48. Task Adaptivity • Prune the secants according to the task at hand • If goal is signal reconstruction, then preserve all secants • If goal is signal classification, then preserve inter-class secants differently from intra-class secants • Can preferentially weight the training set vectors according to their importance (connections with boosting)