1 / 17

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception. Instructor: Tony Jebara. Topic 12. Manifold Learning (Unsupervised) Beyond Principal Components Analysis (PCA) Multidimensional Scaling (MDS) Generative Topographic Map (GTM) Locally Linear Embedding (LLE) Convex Invariance Learning (CoIL)

amanda
Download Presentation

Advanced Machine Learning & Perception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

  2. Tony Jebara, Columbia University Topic 12 • Manifold Learning (Unsupervised) • Beyond Principal Components Analysis (PCA) • Multidimensional Scaling (MDS) • Generative Topographic Map (GTM) • Locally Linear Embedding (LLE) • Convex Invariance Learning (CoIL) • Kernel PCA (KPCA)

  3. Tony Jebara, Columbia University Manifolds • Data is often embedded in a lower dimensional space • Consider image of face being translated from left-to-right • How to capture the true coordinates of the data on the • manifold or embedding space and represent it compactly? • Open problem: many possible approaches… • PCA: linear manifold • MDS: get inter-point distances, find 2D data with same • LLE: mimic neighborhoods using low dimensional vectors • GTM: fit a grid of Gaussians to data via nonlinear warp • Linear after Nonlinear normalization/invariance of data • Linear in Hilbert space (Kernels)

  4. Tony Jebara, Columbia University Principal Components Analysis • If we have eigenvectors, mean and coefficients: • Getting eigenvectors (I.e. approximating the covariance): • Eigenvectors are orthonormal: • In coordinates of v, Gaussian is diagonal, cov = L • All eigenvalues are non-negative • Higher eigenvalues are higher variance, use those first • To compute the coefficients:

  5. Tony Jebara, Columbia University Multidimensional Scaling (MDS) • Idea: capture only distances between points X • in original space • Construct another set of low dim or 2D Y points having • same distances • A Dissimilarity d(x,y) is a function of two objects x and y • such that • A Metric also has to satisfy triangle inequality: • Standard example: Euclidean l2 metric • Assume for N objects, we compute a dissimilarity D • matrix which tells us how far they are

  6. Tony Jebara, Columbia University Multidimensional Scaling • Given dissimilarity D between original X points under • original d() metric, find Y points with dissimilarity D under • another d’() metric such that D is similar to D • Want to find Y’s that minimize some difference from D to D • Eg. Least Squares Stress = • Eg. Invariant Stress = • Eg. Sammon Mapping = • Eg. Strain = Some are global Some are local Gradient descent

  7. Tony Jebara, Columbia University MDS Example 3D to 2D • Have distances from • cities to cities, these • are on the surface of • a sphere (Earth) in • 3D space • Reconstructed 2D • points on plane • capture essential • properties (poles?)

  8. Tony Jebara, Columbia University MDS Example Multi-D to 2D • More • elaborate • example • Have • correlation • matrix between • crimes. These • are arbitrary • dimensionality. • Hack: convert • correlation • to dissimilarity • and show • reconstructed Y

  9. Tony Jebara, Columbia University Locally Linear Embedding • Instead of distance, look at neighborhood of each point. • Preserve reconstruction of point with neighbors in low dim • Find K nearest neighbors • for each point • Describe neighborhood as • best weights on neighbors • to reconstruct the point • Find best vectors that still • have same weights Why?

  10. Tony Jebara, Columbia University Locally Linear Embedding • Finding W’s (convex combination of weights on neighbors): 3) Find l 4) Find w 1) Take Deriv & Set to 0 2) Solve Linear system

  11. Tony Jebara, Columbia University Locally Linear Embedding • Finding Y’s (new low-D points that agree with the W’s) • Solve for Y as • the bottom d+1 • eigenvectors of M • Plot the Y values

  12. Tony Jebara, Columbia University LLE Examples • Original X data are raw • images • Dots are reconstructed • two-dimensional Y • points

  13. Tony Jebara, Columbia University LLEs • Top=PCA • Bottom=LLE

  14. Tony Jebara, Columbia University Generative Topographic Map • A principled altenative to the Kohonen map • Forms a generative • model of the • manifold. Can • sample it, etc. • Find a nonlinear • mapping y() from • a 2D grid of Gaussians. • Pick params W of mapping such that mapped Gaussians in • data space maximize the likelihood of the observed data. • Have two spaces, the data space t (old notation were X’s) • and the hidden latent space x (old notation were Y’s). • The mapping goes from latent space to observed space

  15. Tony Jebara, Columbia University GTM as a Grid of Gaussians • We choose our priors and • conditionals for all • variables of • interest • Assume Gaussian • noise on the • y() mapping • Assume our prior latent variables are a grid model • equally spaced in latent space • Can now write out the full likelihood

  16. Tony Jebara, Columbia University GTM Distribution Model • Integrating over delta functions makes a summation • Note the log-sum, need to apply EM to maximize • Also, use the following parametric • (linear in the basis) form of the mapping • Examples of • manifolds for • randomly chosen • W mappings • Typically, we are • given the data and • want to find the maximum likelihood mapping W for it…

  17. Tony Jebara, Columbia University GTM Examples • Recover non-linear • manifold by warping • grid with W params • Synthetic Example: • Left = Initialized • Right = Converged • Real Example: • Oil Data • 3-Classes • Left = GTM • Right = PCA

More Related