1 / 33

Manifold Learning

Manifold Learning. Student: Ali Taalimi Advisor: Prof. Abidi 05/05/2012. When can we avoid the curse of dimensionality?. In what situation it might be have inferences from data in high dimensional, without face of curse of dimensionality. Smoothness

monzon
Download Presentation

Manifold Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Manifold Learning Student: Ali Taalimi Advisor: Prof. Abidi 05/05/2012

  2. When can we avoid the curse of dimensionality? In what situation it might be have inferences from data in high dimensional, without face of curse of dimensionality. • Smoothness rate~(1/n)^(s/d), n is # of examples we need to learn the function if s(smoothness) is equal to rate of d(dimensionality), there is no curse of dimensionality. So, if the function in high dimension, is very smooth, and the smoothness is the rate of dimensionality the problem is solved (fitting the smoothest function to the data). splines, kernel methods, L2 regularization… • Sparsity Maybe the function that you are going to learn, isn't smooth. But, can be represented by the sparse combination of some basis functions (using few relevant features). wavelets, L1 regularization, LASSO, compressed sensing.. • Geometry (the most recent) graphs, simplicial complexes, Laplacian, diffusions

  3. Geometry and Data: The Central Dogma • (Fact of natural dataset) In very very high dimensional space, data, will not distribute uniformly. Distribution of natural data is non-uniform and it has some shape. Since it has a shape/geometry, maybe it concentrates around low dimensional structures. Because the natural datasets comes from a system that has few free parameters. • The shape (geometry) of the distribution can be exploited for efficient learning.

  4. Manifold Learning one setting of thinking about geometry is manifold learning. Manifold learning is not a single problem, but, rather is a collection of problems unified by the some common assumption. Assumption of data lives near or on some low dimensional manifold, embedded in this high dimensional space. In other words, data might have some geometry which is far from uniform, and, then try to understand what consequence of that fact might be. You have to learn it in high dimensional space, all, the data is typically embedded in low dimensional space. You want to learn a function, but, in this case the natural domain of the function is a manifold which all the data lives. so, we have to learn a function, which domain is this manifold, and, whose range might be a finite set (clustering, classification, dimensional reduction,…) PROBLEM: Although all the data lives near some manifold, for the most part, we don’t know what this manifold is. We have to discover this manifold, without knowing what this manifold is.

  5. Suppose compact Euclidean space. • Suppose I sample it and give you a collection of points that sit on this manifold. All you can see is the clouds of points. • What topology can you learn from these randomly drawn points? • How many connected component does my manifold have? (example: having samples from mixture of Gaussian, can you tell the number of Gaussian) • In other words, we are trying to learn both the function and its domain, simultaneously.

  6. PCA • most simplest solution of fitting a linear manifold to the data • fitting the best linear subspace/manifold to the data of the certain rank. • X1, …, Xn • H is some subspace • P(Xi, H) is projection Xi in subspace H • fitting a subspace H between all choice of H to the data that minimize least square error:

  7. Manifold Model • Suppose data does not lie on or near a linear subspace. • Yet data has inherently one degree of freedom (data is not lie in one dimensionals space)

  8. Vision Example • consider image f:R*R[0, 1] • f(x, y) is equal to intensity of the image at location(x, y) • Consider the following class of images: • In fact this set includes many images which only translated by amount of (t, r) • So this set is embedded in the nonlinear space of all the images • But, there is only two degree of freedom in this particular set (t, r).

  9. Manifold of Sphere • Consider a sphere as a model manifold that you want to think about. • given a point p on the manifold, there is tangent space on the manifold, which for the k dimensional manifold is essentially k dimensional space, and, you could think of it, k dimensional affine subspace of Rn. • Exactly like sphere which embedded in R3 and the tangent space is 2 dimensional affine of R3. • Since tangent space is the linear space, you can think naturally of tangent vectors

  10. relation of tangent and curve tangent vector can be thought of as derivative (How?) What is the curve on the manifold? ϕ(t): RMk The derivative of the curve rather t (d(ϕ(t))/dt), is tangent vector. So every tangent vector (v) is identified by a curve (ϕ(t)). φ(t) : RMk f : Mk  R f(φ(t)) : R  R df/dv = d(f(φ(t)) / dt So you can think of tangent vector as operator that acts on function f and take directional derivative in certain direction.

  11. Geodesic length of curve: a curve is a map from [0, 1]Mk, if you take a derivative, you actually get a vector, Since all the curves are living in the space where you have norm and inner product defined (Riemannian Geometry), you access the norm of derivative. So length of the curve is: Geodesic: short curve between two points

  12. Gradient of the function • The gradient of the function will act as an operator on vector in tangent space , so given any v on the tangent space , the inner product of gradient and the v, is the differentiation of the function in direction v. • So I fix a function f, I pick any vector v in the tangent space, I can differentiate df in the direction v, and I get a number, So I have a map from function to numbers

  13. Exponential Map • exponential map will take you from tangent space, back to the manifold. • Until now we have the manifold, we have point p on the manifold, we have the tangent space of the point p on the manifold, we have functions defined on manifold, now consider Tp(M), which is the tangent space of M at the point p, So any element of this tangent space is a vector. • And exponential map will take me from Tp back to M • How? you essentially start going in the geodesic along v, such that the length of the curve has norm v. • In other words, the vector v that you picked has a certain norm, you will go along this manifold for a distance which is the length of vector v.

  14. Laplace-Beltrami operator for manifold Define Laplace for the functions defined on the manifold: if f is a double differentiable of k dimentional space to R, differentiate f, twice, for each direction and sum it up. I have this k dimentional tangent space, and, if I pick any vector in this space, and apply exponential operator, I will have a point in manifold. f is a function from k dimensional manifold to R. So, f composed by exponential map, will take us from tangent space to the R.

  15. Dimensionality Reduction • Given • Find If I give you bunch of points sample from manifold, can you discover this map that embed this manifold isometrically in a d-dimensional space, and then apply this map to the data and thereby end up embedding data now in a d-dimensional space, where all this data actually lives originally in D-dimensional space. • ISOMAP (Tenenbaum, et al, 2000) • LLE (Roweis, Saul, 2000) • Laplacian Eigenmaps (Belkin, Niyogi, 2001) • Local Tangent Space Alignment (Zhang, Zha, 02) • Hessian Eigenmaps (Donoho, Grimes, 02) • Diffusion Maps (Coifman, Lafon, et al, 04)

  16. Algorithmic framework There is a manifold in the high dimensional space. We don’t know this manifold. I have only bunch of points from this manifold, what should I do? make a graph/mesh structure by connecting nearby points to each other and make graph. nearby means near in Euclidean space, because the only thing that I can measure is Euclidean space distance measuring. This graph is approximation to the manifold, so, if I want to do something on the manifold I will do it on graph, instead.

  17. Isomap Construct nearest neighbor graph from all data points. make a graph where every vertex is identified by a data point. find the shortest path (Geodesic) distances between all points on the graph (Dij). Dij is not Euclidean distance between Xi and Xj. This distance is trying to simulate the geodesic distance between Xi and Xj, if you want to go along the manifold. Embed using Multidimensional Scaling

  18. Multidimensional Scaling • MDS: you give me a distance matrix, if there is a set of points in the Euclidean space from which this distance matrix could have a reason, then I can find those points for you. If indeed there are a set of points from which this distance matrix arose in Euclidean space (if D is aroused from set of vectros in Euclidean space), then the inner product between those points will satisfy : (A is matrix of inner products) <x, x> - 2<x, y> + <y, y> = ||x-y||2 Aii - 2Aij +Ajj = Dij It is true only if the distances are consistent with the inner products. This is not the case if the distances are geodesic distance on the manifold. After finding matrix of inner products (A), how to find the vectors? Simply by looking at the Eigen vectors of this matrix

  19. Multidimensional Scaling • 2) Embedding from inner products (same as PCA!) • matrix of inner products (A) is positive definite • Then for any x ∈ {1,...,n} • Ψ is the mapping function. Ψ will actually satisfy inner product and distance matrix • summary: you start with the distance matrix, and, find the candidate set of inner products (A), and try to find vectors which are consistent with that set of inner product matrix.

  20. Isomap images generated by a human hand varying in finger extension and wrist rotation. So, there is only 2 degree of freedom

  21. Locally Linear Embedding • construct the nearest neighbor graph. • Let x1,...,xn be neighbors of x. Project x to the span of x1,...,xn, and, that projection is calling • when the projection is found, finding the set of coefficients which sum to one, in order to to be center of mass. This is called Find barycentric coordinates of • Construct sparse matrix W . i th row is barycentric coordinates of in the basis of its nearest neighbors. • Use lowest eigenvectors of (I − W )t(I − W ) to embed.

  22. Laplacian Eigenmaps (1) • make a graph which each data is represented by a vertex (n vertices for n data point), with edge eij. Calculate the Euclidean distance which is the only thing we can measure. • n nearest neighbors. [parameter n ∈ N] Nodes i and j are connected by an edge if i is among n nearest neighbors of j or j is among n nearest neighbors of i. • Heat kernel. [parameter t ∈ R]. If nodes i and j are connected, put if t is small it put a lot of penalty for far points, and, vice versa

  23. Laplacian Eigenmaps (2) till now, I built a graph of n vertices using n data points, and I calculate weights for edges of the graph. the matrix W is a random matrix, because the original points were randomly sampled from manifold. The idea is by looking at the spectrum of matrix W, and Eigen values and Eigen vectors, we can recover Eigen values and Eigen functions of Laplace Beltrami operator of the manifold from which the data was sampled. D is a diagonal matrix made by: Construct matrix L = D – W [Eigenmaps] Compute eigenvalues and eigenvectors for the generalized eigenvector problem: Lf = λDf Let f0,...,fk−1 be eigenvectors. Leave out the eigenvector f0 and use the next m lowest eigenvectors for embedding in an m-dimensional Euclidean space.

  24. Diffusion Distance initial diffusion of heat on the manifold which is a pulse at x (δx), and do the same for y (δy) Diffusion distance between x and y (heat diffusion operator (Ht)): Difference between heat distributions after time t. On the manifold I want to have the distance, I take a point x, and start with initial pulse at the point x and allow heat to dissipate over the manifold. heat flows along the geometry of the manifold. after time t, distribution of heat from initial location x, will be obtained. Do the same for location y, and then measure the Euclidean distance between these two.

  25. Diffusion Distance • relation of heat diffusion and Laplacian: • Diffusion of the heat on the manifold is governed by the heat equation on the manifold • heat equation on the manifold is given by the Laplacian of f (which is partial derivative with respect to time) • So, the way that diffusion maps as opposed to Laplacian Eigen maps works is that it looks at the Eigen function of the Laplacian and it embeds every point into the lower dimensional space using : • λ is the Eigen values and f’s are Eigen vectors

  26. Justification for Laplacian Eigen map • bunch of points sitting in the higher dimensional manifold (x1, …, xn from MD ) • wanted to have points in lower dimensional (y1, …, yn of R) • Laplacian Eigen map tries to preserve locality. • if xi and xj is near to each other, the yi and yj should be near to eachother (smoothness and locality preservation) • If Wij is large if xi and xj is close to eachother, so yi and yj should be close too. • If Wij is small, xi and xj are far from eachother, so, we don’t care.

  27. Justification of Laplacian Eigen Map It can be shown that: minimizing ends up to finding Eigen vectors of Laplacian (L) Use eigenvectors of L to embed. Let Y =[y1, y2 , ..., ym]

  28. On the Manifold • Relationship between graph Laplacian and manifold Laplacian? • I need a smooth (local preservation) map f : MR • On the graph, the smoothness is handled using quadratic form • Finding smooth function on the manifold is equal to finding smooth function on the graph (Stokes Theorem) • smooth function on the graph means function is not changing much from vertex to vertex. • Finding smooth function on the manifold, requires to look at Eigen functions of Laplace-Beltrami operator on the manifold. • Finding the min of smoothness condition on the graph leads us to find Eigen vectors of Laplacian on the graph • 1 and 2 converge to each other.

  29. given an arbitrary Riemanian manifold, we know about Laplace Beltrami on this manifold • EigenSystem: • {φi} forms an orthonormal basis for L2(M) • Eigen values {λi} characterize the smoothness of {φi} • If I know the manifold which the data live, the Eigen functions of the Laplace will give me the set of basis functions which are adapted to the geometry of manifold. • These functions are used to build classifiers.

  30. I don’t have the manifold, I only have the collection of points on the manifold, I should make a graph out of these points and I should look at the Laplacian on this graph which is an operator on functions defined on the vertex set of the graph I can look at the Eigen values and Eigen vectros of this , and from this, I can reconstruct the Eigen values and Eigen vectors of the Laplacian

  31. Results consider manifold (MD), and EigenSystem of Δf=λf which I am interested in This gives λ1, λ2, …, λi and Eigen functions φi I have randomly sampled points on the manifold x1, x2, …, xn and I have a graph G(V, E), and, sizeof(V)=n I look at functions f1: VR defined on the vertex set assume another set of functions f2: M  R which is solution of EigenSystem. Laplace-Beltrami operator apply to these functions apply graph Laplacian on the f1 , D is function of t L = D-W, and, find Eigen values and Eigen vectors of L λ of L converges to λ of Eigensystem Δf=λf as ninf and t0 The rate of this (1/n)^(1/d) don’t depend on D

  32. Application of Manifolds • Motion estimation (Estimate motion of the person) • Markerless motion estimation: inferring joint angles (16 cameras) • Corazza, et al, Stanford Biomotion Lab, 05 • Isometrically invariant representation. • Eigenfunctions of the Laplacian are invariant under isometries. What happen when we move? The surface of the body of walker is 2D manifold in 3 dimensional space.If I put two markers on the arm, their shortest geodesic distance doesn’t change while walking

  33. Motion estimation Two manifolds are isometrically equivalent, if there is a correspondence between them which preserves geodesic distances Moving the body is a isometric transformation of the surface of the body So, we have bunch of data from body surface, we compute Eigen vectors of the Laplace matrix Each Eigen vector is the function defined on the surface of the body The color of each data point is corresponding to the certain Eigen vector of the data point When the guy is moving, The color is not changing. It is useful because now I have a function which doesn’t change during walking. We can use several of these functions to segment body Don’t need time, only point clouds. Calculate Eigen vectors for certain time, then use it for whole time.

More Related