- 212 Views
- Updated On :
- Presentation posted in: General

Adaptive Graph Construction and Dimensionality Reduction. Songcan Chen, Lishan Qiao, Limei Zhang http://parnec.nuaa.edu.cn/ {s.chen, qiaolishan, [email protected] 2009. 11. 06. Outline. Why to construct graph? Typical graph construction Review & Challenges Our works

Adaptive Graph Construction and Dimensionality Reduction

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Adaptive Graph Construction and Dimensionality Reduction

Songcan Chen, Lishan Qiao, Limei Zhang

http://parnec.nuaa.edu.cn/

{s.chen, qiaolishan, [email protected]

2009. 11. 06

Outline

- Why to construct graph?
- Typical graph construction
- Review & Challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline

- Why to construct graph?
- Typical graph construction
- Review & Challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Graph is used to characterize data geometry (e.g., manifold) and thus plays an important role in data analysis including machine learning!

For example, dimensionality reduction, semi-supervised learning, spectral clustering, …

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Dimensionality reduction

10

-10

4-NN Graph

- Nonlinear manifold learning
- E.g., Laplacian Eigenmaps, LLE, ISOMAP

2D Embedding Result

Data Points (Swiss roll)

- Linearized variants
- E.g., LPP, NPE, and so on

- (Semi-)supervised and/or Tensorized extensions
- Too numerous to mention one by one

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Dimensionality reduction

[1]

PCA

LDA

- Many classical DR algorithms
- E.g., PCA (Unsupervised), LDA (Supervised)

According to [1], most of the current dimensionality reduction

algorithms can be unified under a graph embedding framework.

[1] S.Yan, D.Xu, B.Zhang, H.Zhang, Q.Yang, S.Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29(1)(2007):40–51.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Semi-supervised learning

Transductive

(e.g., Label Propagation)

Inductive

(e.g., Manifold Reg.)

Data Points

with 4-NN graph

- Typical graph-based semi-supervised algorithms
- Local and global consistency
- Label propagation
- Manifold regularization
- …

“Graph is at the heart of the graph-based semi-supervised learning methods” [1].

[1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Spectral clustering

Clustering structure

Manifold structure

- Typical graph-based clustering algorithms
- Graph cut
- Normalized cut
- …

“Ncut on a kNN graph does something systematically different than Ncut on an ε-neighborhood graph! … shows that graph clustering criteria cannot be studied independently of the kind of

graph they are applied to.”[1]

[1] M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures. NIPS, 2008

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Summary

- Dimensionality reduction
- Linear/nonlinear, local/nonlocal, parametric/nonparametric

- Semi-supervised learning
- Transductive/inductive

- Spectral clustering
- Clustering structure/manifold structure

A well-designed graph tends to result in good performance [1].

How to construct a good graph?

What is the right graph for a given data set?

[1] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Summary

Generally speaking,

Despite its importance, “Graph construction has not been studied extensively” [1].

“The way to establish high-quality graphs is still an open problem”[2].

[1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008.

[2] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph?

Summary

- Fortunately, graph construction problem has attracted
- increasingly attention, especially in this year (2009)

- For example, graph construction by

- sparse representation [1,2,3] or l1-graph.

- minimizing the weighted sum of the squared distance from
each vertex to the weighted average of its neighbors [4].

- b-matching graph [5]

- symmetry-favored criterion and assuming that the graph is
doubly stochastic [6].

- learning projection transform and graph weights simultaneously [7].

[1] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition.

Pattern Recogn, 2009 (Received on 21 July 2008)

[2] S. Yan,H. Wang, Semi-supervised Learning by Sparse Representation. SDM, 2009

[3] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR, 2009.

[4] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009

[5] T. Jebara, J. Wang, S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning.

ICML, 2009.

[6] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009

[7] L. Qiao, S. Chen, L. Zhang, A Simultaneous Learning Framework for Dimensionality Reduction

and Graph Construction, submitted, 2009

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline

- Why to construct graph?
- Typical graph construction
- Review & Challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Review

Dimensionality Reduction

Spectral Clustering

Graph

construction

Edge weight

assignment

Learning

tasks

Semi-supervised Learning

Spectral Kernel Learning

……

A basic flow for graph-based machine learning

Two basic

characteristics

- Task-independent

- Two steps
- Graph construction
- Edge weight assignment

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Review

- Two basic criteria
- k-nearest neighbor criterion (Left)
- ε-ball neighborhood graph (Right)

Graph

construction

Edge weight

assignment

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Review

- Gaussian function (Heat kernel)

- Inverse Euclidean distance

- Local reconstructive relationship (involved in LLE)

Graph

construction

Edge weight

assignment

- Several basic ways

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Review

- Several basic ways
- Gaussian function (Heat kernel)
- Inverse Euclidean distance
- Local reconstructive relationship (involved in LLE)

- Non-negative local reconstruction [1]

[1] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. NIPS, 2006

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Challenges

- Few degree of freedom

- Little noise

- Sufficient sampling (Abundant samples)

- Smooth assumption or clustering assumption

However,

- work well only when conditions are strictly satisfied.

- In Practice, >>??

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Challenges ①~⑤

- Tens or hundreds of degrees of freedom
- Recent research [1] showed the face subspace is
estimated to have at least 100 dimensions.

- More complex composite objects ?

- Recent research [1] showed the face subspace is

1

Noise and other corruptions

2

Euclidean Distance

The locality preserving criterion may not work well under this scenario, especially when just few training samples are available.

0.84x103

0.92x103

1.90x103

[1] M. Meytlis, L. Sirovich, On the dimensionality of face space. IEEE TPAMI, 2007, 29(7): 1262-1267.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Challenges ①~⑤

Insufficient samples

3

Data

points

kNN

graph

Data

points

kNN

graph

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Challenges ①~⑤

Also, this illustrate…

In fact, there are not reliable methods to assign appropriate

values for the parameters k and ε under unsupervised scenario, or if only few labeled samples are available [1].

The sensitivity to neighborhood size

4

Another example, on Wine data set

15 samples per class for training

5 samples per class for training

[1] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf, Learning with local and global consistency. NIPS, 2004

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction

Challenges ①~⑤

- Others. For example,
- The lingering “curse of dimensionality”
- Fixed neighborhood size
- Independence on subsequent learning tasks

5

Dimensionality reduction aims mainly at overcoming the “curse

of dimensionality”, but unfortunately locality preserving

algorithms construct graph relying on the nearest neighbor

criterion which itself suffers from such a curse. This seems to

be a paradox.

Let’s try to address these problems…

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline

- Why to construct graph?
- Typical graph construction
- Review & challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Task-independent graph construction

Dimensionality Reduction

Spectral Clustering

Graph

construction

Edge weight

assignment

Learning

tasks

Semi-supervised Learning

Spectral Kernel Learning

Our work (II)

……

Our work (I)

Our work (I)

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Motivation

PCA

Simple, but ignore local structure

LLE

Consider locality, but fixed neighborhood size, artificial definition, difficult

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

From L0 to L1

The solution of L2 minimization (Left) and L1 minimization (Right) problem

If the solution sought is sparse enough, the solution of L0-minimization problem is equal to the solution of L1-minimization problem [1].

[1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52(4) (2006) 1289-1306

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Modeling & Algorithms

Nonsmooth optimization

1

Subgradient-based algorithms [1]

Also, p=2, it can be recast as SOCP

Quasi LASSO

p=2, LASSO, many algorithms: LARS…[2]

p=1, Linear Programming (see next page)

2

L1-ball constraint optimization [3]

(e.g., SLEP: Sparse Learning with Efficient Projections, http://www.public.asu.edu/~jye02/Software/SLEP/index.htm)

3

[1] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publisher, 2003.

[2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of Statistics, 2004, 32(2): 407-451.

[3] J. Liu, J. Ye, Efficient Euclidean Projections in Linear Time, ICML, 2009

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Modeling (Example, p=1)

(Left) A sub-block of the weight matrix constructed by the above model; (Right) The optimal t for 3 different samples (YaleB).

Incorporate prior into the graph construction process !

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Modeling (Example, p=2)

L1-norm neighborhood and its weights

Sparse, Adaptive, Discriminative, Outlier-insensitive

Conventional k neighborhood and its weights

Put samples from different classes into one patch

[1]X.Tan, L.Qiao, W.Gao and J.Liu. Robust Faces Manifold Modeling: Most Expressive Vs. Most Sparse Criterion, Subspace 2009 Workshop in conjunction with ICCV2009, Kyoto, Japan

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

SPP: Sparsity Preserving Projections

- The optimal describes the sparse reconstructive
relationship.

- So, we expect to preserve such relationship in the
low dimensional space.

- More specifically,

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Experiments: Toy

PCA

The toy data and their 1D images based on 4 different DRs algorithms

LPP

NPE

SPP

Insufficient sampling

Additional prior

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Experiments: Wine

Wine data set from UCI, 178 samples, 3 classes, 13 features

The basic statistics of Wine data set

PCA

LPP

NPE

SPP

The 2D projections of Wine data set

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Experiments: Face

YALE

AR

Extended YALE B

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Experiments: Face

AR_Fixed

Yale

AR_Random

Extended YaleB

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Experiments: Face

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Related works

[1]

[2]

[3]

Dimensionality Reduction

Spectral Clustering

Graph

construction

Edge weight

assignment

Learning

tasks

Semi-supervised Learning

Spectral Kernel Learning

……

Our work (I)

Other extensions ?

From graph to data-dependent regularization, …

[1] L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face

recognition. Pattern Recognition, 2009. (Received 21 July 2008)

[2] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR2009.

[3] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. SDM2009.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Extensions

Semi-supervised classification

Semi-supervised dimensionality reduction

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Extensions

- Apply to single labeled face recognition problem
- Compare with supervised LDA, unsupervised SPP,
semi-supervised SDA

SPDA: Sparsity Preserving Discriminant Analysis

E1: 1 labeled and 2 unlabeled samples

E2: 1 labeled and 30 unlabeled samples

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I)

Summary

- Adaptive “neighborhood” size;

- Simpler parameter selection;

- Less training samples;

- Easier incorporation of prior knowledge
( Not so insensitive to noise)

- Stronger discriminating power

- Higher computational cost

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline

- Why to construct graph?
- Typical graph construction
- Review & Challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Task-dependent graph construction

Our work (II)

Our work (I)

Dimensionality Reduction

Spectral Clustering

Graph

construction

Edge weight

assignment

Learning

tasks

Semi-supervised Learning

Spectral Kernel Learning

……

- Task-independent graph construction
- Advantage: be applicable to any graph-based learning tasks
- Disadvantage: does not necessarily help subsequent
learning tasks

Can we unify them? How to unify them?

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Motivation (Cont’d)

Furthermore, take LPP as an example,

- Step 1: Graph construction

k-nearest neighbor criterion

- Step 2: Edge weight assignment

- Step 3: Projection directions learning

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Motivation (Cont’d)

5 samples per class for training

15 samples per class for training

- In LPP, “local geometry” is completely determined
by the artificiallypre-fixed neighborhood graph.

- As a result, its performance may drop seriously if
given a “bad” graph. Unfortunately, it is generally

uneasy to justify in advance whether a graph is good

or not, especially under unsupervised scenario.

- So, we expect the graph to be adjustable.

How to adjust?

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Motivation (Cont’d)

A natural question:

Can we obtain more locality perserving power or discriminating power by minimizing the objective function further ?

A Key: how to characterize such a power formally!

- LPP seeks a low-dimensional representation aiming
at preserving the local geometry in the original data.

- Locality preserving power is potentially related to
discriminating power [1].

- Locality preserving power is described by minimizing its
objective function.

Our idea: optimize graph and learn projections simultaneously in a unified objective function.

[1] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Modeling: SLPP

regard graph Sij as new optimization

variable, i.e., graph is adjustable instead of

pre-fixed. Also, note we do not constrain

Sij asymmetrical.

1

m (>1), a new parameter which controls

the uncertainty of Sij and helps us obtain

closed-form solution. In addition, without it,

we will get a singular solution where

only one element in each row of is 1 and

other elements are all zeros.

2

new constraints, aim to avoid degenerate

solution, provide a natural probability

explanation for the graph.

3

remove dii from this constraint mainly for

making the optimization tractable.

4

LPP

Soft LPP (SLPP or SLAPP)

2

1

3

4

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Algorithm

- Non-convex with respect to

- Solve it by alternating iteration
optimization technique

- Fortunately, we will obtain closed-
form solution at each step.

- Step 1: Calculate W by generalized eigen-problem
- Step 2: Update graph

Normalized inverse Euclidean distance!!

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Algorithm

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Modeling: ELPP

ELPP: Etropy-regularized LPP

Normalized heat kernel distance!!

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

ELPP: Algorithm

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Convergence

Cauchy’s convergence rule.

Block-Coordinate Gradient Descent !

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Wine

LPP

SLPP(1)

SLPP(3)

SLPP(5)

SLPP(7)

SLPP(9)

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Wine

Classification experiments with different initialized graphs

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Wine

Random initialization for the weight matrix

Classification experiments with different initialized graphs

The above experiments illustrate that the graph may becomes better and better with gradual updating process.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Face

AR database

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Face

Yale B database

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Experiments: Face

PIE database

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II)

Summary

- Inherit some good characteristics naturally;

- Provide a seamless link between dimensionality
reduction and graph construction;

- Be quite insensitive to initial parameters such as
neighborhood size k and heat kernel width σ;

- The learned graph can potentially be used in other
graph-based learning algorithms, even if the graph

learning is task-dependent, since DR itself is also a

preprocessing step.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline

- Why to construct graph?
- Typical graph construction
- Review & Challenges

- Our works
- (I) Task-independent graph construction
- (Related work: Sparsity Preserving Projections)
- (II) Task-dependent graph construction
- (Related work: Soft LPP and Entropy-regularized LPP)

- Discussion and Next Work

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Discussion

- Graph construction is a common and important problem
- involved in many machine learning algorithms.

- There is not a general way to establish high-quality graphs.

- Graph construction is based on different motivation, prior or assumption.

- It may be an interesting research topic to construct
- graphs relying on practical applications.

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Ongoing and Next Works

- For SPP
- Fast sparse representation algorithm (by the use of the data structure?)
- Verify the effectiveness of the constructed graph for other graph-based learning algorithms (e.g., manifold regularization)

- For SLPP and ELPP
- Semi-supervised extensions

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Ongoing and Next Works

- From Shannon entropy regularization to generalized entropy regularization

Thus, we can potentially obtain different graph updating formula. For example, possible types of neither Inverse Euclidean nor heat kernels etc..

- Apply the learned graph to other graph-based learning algorithms (e.g., manifold regularization)

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Ongoing and Next Works

Input training

samples X

Construct

graph G{X,S}

Update samples

X=WTX

Learning projection

W

Stop condition

N

Y

Output

G, W or X

- General
simultaneous

learning

framework

forDR (other learning tasks)

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Thanks!

Q&A

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11