Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin

Outline • Introduction • Graph Embedding (GE) • Marginal Fisher Analysis (MFA) • Experiments • Conclusion and Future Work

Introduction • Dimensionality Reduction • Linear • PCA, LDA, are the two most popular due to simplicity and effectiveness • LPP, preserves local relationships in the data set, and uncovers its essential manifold structure

Introduction • Dimensionality Reduction • For nonlinear methods, ISOMAP, LLE, Laplacian Eigenmap are three algorithms have been developed recently • Kernel trick: • linear methods → nonlinear ones • performing linear operations on higher or even infinite dimensional by kernel mapping function

Introduction • Dimensionality Reduction • Tensor based algorithms • 2DPCA, 2DLDA, DATER

Introduction • Graph Embedding is a general framework for dimensionality reduction • With it’s linearization, kernelization, and tensorization, we have a unified view for understanding DR algorithms • The above-mentioned algorithms can all be reformulated with in it

Introduction • This paper show that GE can be used as a platform for developing new DR algorithms • Marginal Fisher Analysis (MFA) • Overcome the limitations of LDA

Introduction • LDA (Linear Discriminant Analysis) • Find the linear combination of features best separate classes of objects • Number of available projection directions is lower than class number • Based upon interclass and intraclass scatters, optimal only when the data of each class is approximately Gaussian distributed

Introduction • MFA advantage: (compare with LDA) • The number of available projection directions is much larger • No assumption on the data distribution, more general for discriminant analysis • The interclass margin can better characterize the separability of different classes

Graph Embedding • For classification problem, the sample set is represented as a matrix X = [x1, x2, …, xN], xi  Rm • In practice, the feature dimension m is often very high, thus it’s necessary to transform the data to a low-dimensional oneyi = F(xi), for all i

Graph Embedding

Graph Embedding • Different motivations of DR algorithms, their objectives are similar – to derive lower dimensional representation • Can we reformulate them within a unifying framework?Whether the framework assists design new algorithms?

Graph Embedding • Give a possible answer • Represent each vertex of a graph as a low-dimensional vector that preserves similarities between the vertex pairs • The similarity matrix of the graph characterizes certain statistical or geometric properties of the data set

Graph Embedding • G = { X, W } be an undirected weighted graph with vertex set X and similarity matrix W RNN • The diagonal matrix D and the Laplacian matrix L of a graph G are defined as L = D W, Dii = ,  i

Graph Embedding • Graph embedding of G is an algorithm to find low-dimensional vector representations relationships among the vertices of G • B is the constraint matrix, and d is a constant, for avoid trivial solution

Graph Embedding • For larger similarity between samples xi and xj, the distance between yi and yj should be smaller to minimize the objective function • To offer mappings for data points throughout the entire feature space • Linearization, Kernelization, Tensorization

Graph Embedding • LinearizationAssuming y = XTw • Kernelization: x  F, assuming

Graph Embedding • The solutions are obtained by solving the generalized eigenvalue decomposition problem • F. Chung, “Spectral Graph Theory,” Regional Conf. Series in Math.,no. 92, 1997

Graph Embedding • Tensor • the extracted feature from an object may contain higher-order structure • Ex: • an image is a second-order tensor • sequential data such as video sequences is a third-order tensor

Graph Embedding • Tensor • In n dimensional space, nr directions, r is the rank(order) of a tensor • For tensor A, B  Rm1m2…mnthe inner product

Graph Embedding • Tensor • For a matrix U Rmkm’k, B = A kU

Graph Embedding • The objective funtion: • In many case, there is no closed-form solution, but we can obtain the local optimum by fixing the projection vector

General Framework for DR • The differences of DR algorithms: • the computation of the similarity matrix of the graph • the selection of the constraint matrix

General Framework for DR

General Framework for DR • PCA • seeks projection directions with maximal variances • it finds and removes the projection direction with minimal variance

General Framework for DR • KPCA • applies the kernel trick on PCA, hence it is a kernelization of graph embedding • 2DPCA is a simplified second-order tensorization of PCA and only optimizes one projection direction

General Framework for DR • LDA • searches for the directions that are most effective for discrimination by minimizing the ratio between the intraclass and interclass scatters

General Framework for DR • LDA

General Framework for DR • LDA • follows the linearization of graph embedding • the intrinsic graph connects all the pairs with same class labels • the weights are in inverse proportion to the sample size of the corresponding class

General Framework for DR • The intrinsic graph of PCA is used as the penalty graph of LDA PCA LDA

General Framework for DR • KDA is the kernel extension of LDA • 2DLDA is the second-order tensorization of LDA • DATER is the tensorization of LDA in arbitrary order

General Framework for DR • LLP • ISOMAP • LLE • Laplacian Eigenmap (LE)

Related Works • Kernel Interpretation • Ham et al. • KPCA, ISOMAP, LLE, LE share a common KPCA formulation with different kernel definitions • Kernel matrix v.s Laplacian matrix from similarity matrix • Only unsupervised v.s more general

Related Works • Out-of-Sample Extension • Brand • Mentioned the concept of graph embedding • Brand’s work can be considered as a special case of our graph embedding

Related Works • Laplacian Eigenmap • Work with only a single graph, i.e., the intrinsic graph, and cannot be used to explain algorithms such as ISOMAP, LLE, and LDA • Some works use a Gaussian function to compute the nonnegative similarity matrix

Marginal Fisher Analysis • Marginal Fisher Analysis

Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)

Marginal Fisher Analysis • Interclass separability (penalty graph)

The first step of MFA

The second step of MFA

Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)

Marginal Fisher Analysis • Interclass separability (penalty graph)

The third step of MFA

First of Four steps of MFA

LDA v.s MFA • The available projection directions are much greater than that of LDA • There is no assumption on the data distribution of each class • The interclass margin in MFA can better characterize the separability of different classes than the interclass variance in LDA

Kernel MFA • The distance between two samples • For a new data point x, its projection to the derived optimal direction

Tensor MFA

Experiments • Face Recognition • XM2VTS, CMU PIE, ORL • A Non-Gaussian Case

Experiments • XM2VTS, PIE-1, PIE-2, ORL

Experiments

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

Presentation Transcript

Random effect modelling of great tit nesting behaviour

Tutte Embedding: How to Draw a Graph

NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap]

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

High-dimensional Indexing based on Dimensionality Reduction

Embedding Logical Qubits into the D-Wave Hardware Graph

Dimensionality Reduction

Dimensionality Reduction

Dimensionality reduction

Dimensionality reduction

Nonlinear Dimensionality Reduction for Hyperspectral Image Classification

Statistical analysis of array data: Dimensionality reduction, Clustering

EXTENSIONS EXTENSIONS

Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo October 2006

Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Dimensionality reduction

Nonlinear Dimension Reduction:

Scalable Supervised Dimensionality Reduction using Clustering

Tutte Embedding: How to Draw a Graph

Demystifying Dimensionality Reduction

Feature Selection, Dimensionality Reduction, and Clustering

Three Algorithms for Nonlinear Dimensionality Reduction