Non-linear Dimensionality Reduction. CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen , Springer, 2007. Agenda. What is dimensionality reduction? Linear methods Principal components analysis
Prepared on materials from the book
Non-linear dimensionality reduction By Lee and Verleysen, Springer, 2007
Let’s say the points yi are mapped to xi, i=1,2,…,N.
Distance preserving methods try to preserve pair wise distances, i.e.,
d(yi, yj) = d(xi, xj), or the pair wise dot products, <yi, yj> = <xi, xj>.
What is a distance?
Nondegeneracy: d(a, b) = 0 if and only if a = b
Triangular inequality: for any three points a, b, and c, d(a, b) d(c, a) + d(c, b)
Other two properties, nonnegativity and symmetry follows from these two
A multidimensional scaling (MDS) method is a linear generative model like PCA:
y’s are d-dimensional observed variable and x’s are p-dimensional latent variable
W is a matrix with the property:
So, dot product is preserved. How about Euclidean distances?
So, Euclidean distances are preserved too!
Center data matrix Y; and compute dot product matrix S = YTY
If data matrix is not available, only distance matrix D is available, do
double centering to form scalar matrix:
Compute eigenvalue decomposition S = UUT
Construct p-dimensional representation as:
Metric MDS is actually PCA and is a linear method
NLM minimizes the energy function:
Start with initial x’s
Update x’s by
xk,i is the kth component of vector xi
seem to be better
ISOMAP = MDS with graph distance
Needs to decide how the graph is constructed: who is the neighbor of whom
K closest rule or -distance rule can build a graph
Closely related to MDS algorithm
KPCA using Gaussian kernel
Step 1: Define a 2D lattice indexed by (l, k): l, k =1,…K.
Step 2: For a set of data vectors yi, i=1,2,…,N, find a set of prototypes m(l, k). Note that by this indexing (l, k), the prototypes are mapped to the 2D lattice.
Step 3: Iterate for each data yi:
(prepared from [HTF] book)
A hard threshold function:
Or, a soft threshold function:
Each data point y is a local linear combination:
Neighborhood of yi: determined by a graph
Constraints on wij:
LLE first computes the matrix W by minimizingE. Then it assumes that in the low
dimensions the same local linear combination holds:
So, it minimizesF with respect to x’s: obtains low dimensional mapping!
Let’s visit: http://www.cs.toronto.edu/~roweis/lle/