1 / 54

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin. Outline. Introduction Graph Embedding (GE)

misty
Download Presentation

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Embedding and Extensions: A General Framework for Dimensionality Reduction IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin

  2. Outline • Introduction • Graph Embedding (GE) • Marginal Fisher Analysis (MFA) • Experiments • Conclusion and Future Work

  3. Introduction • Dimensionality Reduction • Linear • PCA, LDA, are the two most popular due to simplicity and effectiveness • LPP, preserves local relationships in the data set, and uncovers its essential manifold structure

  4. Introduction • Dimensionality Reduction • For nonlinear methods, ISOMAP, LLE, Laplacian Eigenmap are three algorithms have been developed recently • Kernel trick: • linear methods → nonlinear ones • performing linear operations on higher or even infinite dimensional by kernel mapping function

  5. Introduction • Dimensionality Reduction • Tensor based algorithms • 2DPCA, 2DLDA, DATER

  6. Introduction • Graph Embedding is a general framework for dimensionality reduction • With it’s linearization, kernelization, and tensorization, we have a unified view for understanding DR algorithms • The above-mentioned algorithms can all be reformulated with in it

  7. Introduction • This paper show that GE can be used as a platform for developing new DR algorithms • Marginal Fisher Analysis (MFA) • Overcome the limitations of LDA

  8. Introduction • LDA (Linear Discriminant Analysis) • Find the linear combination of features best separate classes of objects • Number of available projection directions is lower than class number • Based upon interclass and intraclass scatters, optimal only when the data of each class is approximately Gaussian distributed

  9. Introduction • MFA advantage: (compare with LDA) • The number of available projection directions is much larger • No assumption on the data distribution, more general for discriminant analysis • The interclass margin can better characterize the separability of different classes

  10. Graph Embedding • For classification problem, the sample set is represented as a matrix X = [x1, x2, …, xN], xi  Rm • In practice, the feature dimension m is often very high, thus it’s necessary to transform the data to a low-dimensional oneyi = F(xi), for all i

  11. Graph Embedding

  12. Graph Embedding • Different motivations of DR algorithms, their objectives are similar – to derive lower dimensional representation • Can we reformulate them within a unifying framework?Whether the framework assists design new algorithms?

  13. Graph Embedding • Give a possible answer • Represent each vertex of a graph as a low-dimensional vector that preserves similarities between the vertex pairs • The similarity matrix of the graph characterizes certain statistical or geometric properties of the data set

  14. Graph Embedding • G = { X, W } be an undirected weighted graph with vertex set X and similarity matrix W RNN • The diagonal matrix D and the Laplacian matrix L of a graph G are defined as L = D W, Dii = ,  i

  15. Graph Embedding • Graph embedding of G is an algorithm to find low-dimensional vector representations relationships among the vertices of G • B is the constraint matrix, and d is a constant, for avoid trivial solution

  16. Graph Embedding • For larger similarity between samples xi and xj, the distance between yi and yj should be smaller to minimize the objective function • To offer mappings for data points throughout the entire feature space • Linearization, Kernelization, Tensorization

  17. Graph Embedding • LinearizationAssuming y = XTw • Kernelization: x  F, assuming

  18. Graph Embedding • The solutions are obtained by solving the generalized eigenvalue decomposition problem • F. Chung, “Spectral Graph Theory,” Regional Conf. Series in Math.,no. 92, 1997

  19. Graph Embedding • Tensor • the extracted feature from an object may contain higher-order structure • Ex: • an image is a second-order tensor • sequential data such as video sequences is a third-order tensor

  20. Graph Embedding • Tensor • In n dimensional space, nr directions, r is the rank(order) of a tensor • For tensor A, B  Rm1m2…mnthe inner product

  21. Graph Embedding • Tensor • For a matrix U Rmkm’k, B = A kU

  22. Graph Embedding • The objective funtion: • In many case, there is no closed-form solution, but we can obtain the local optimum by fixing the projection vector

  23. General Framework for DR • The differences of DR algorithms: • the computation of the similarity matrix of the graph • the selection of the constraint matrix

  24. General Framework for DR

  25. General Framework for DR • PCA • seeks projection directions with maximal variances • it finds and removes the projection direction with minimal variance

  26. General Framework for DR • KPCA • applies the kernel trick on PCA, hence it is a kernelization of graph embedding • 2DPCA is a simplified second-order tensorization of PCA and only optimizes one projection direction

  27. General Framework for DR • LDA • searches for the directions that are most effective for discrimination by minimizing the ratio between the intraclass and interclass scatters

  28. General Framework for DR • LDA

  29. General Framework for DR • LDA • follows the linearization of graph embedding • the intrinsic graph connects all the pairs with same class labels • the weights are in inverse proportion to the sample size of the corresponding class

  30. General Framework for DR • The intrinsic graph of PCA is used as the penalty graph of LDA PCA LDA

  31. General Framework for DR • KDA is the kernel extension of LDA • 2DLDA is the second-order tensorization of LDA • DATER is the tensorization of LDA in arbitrary order

  32. General Framework for DR • LLP • ISOMAP • LLE • Laplacian Eigenmap (LE)

  33. Related Works • Kernel Interpretation • Ham et al. • KPCA, ISOMAP, LLE, LE share a common KPCA formulation with different kernel definitions • Kernel matrix v.s Laplacian matrix from similarity matrix • Only unsupervised v.s more general

  34. Related Works • Out-of-Sample Extension • Brand • Mentioned the concept of graph embedding • Brand’s work can be considered as a special case of our graph embedding

  35. Related Works • Laplacian Eigenmap • Work with only a single graph, i.e., the intrinsic graph, and cannot be used to explain algorithms such as ISOMAP, LLE, and LDA • Some works use a Gaussian function to compute the nonnegative similarity matrix

  36. Marginal Fisher Analysis • Marginal Fisher Analysis

  37. Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)

  38. Marginal Fisher Analysis • Interclass separability (penalty graph)

  39. The first step of MFA

  40. The second step of MFA

  41. Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)

  42. Marginal Fisher Analysis • Interclass separability (penalty graph)

  43. The third step of MFA

  44. First of Four steps of MFA

  45. LDA v.s MFA • The available projection directions are much greater than that of LDA • There is no assumption on the data distribution of each class • The interclass margin in MFA can better characterize the separability of different classes than the interclass variance in LDA

  46. Kernel MFA • The distance between two samples • For a new data point x, its projection to the derived optimal direction

  47. Tensor MFA

  48. Experiments • Face Recognition • XM2VTS, CMU PIE, ORL • A Non-Gaussian Case

  49. Experiments • XM2VTS, PIE-1, PIE-2, ORL

  50. Experiments

More Related