Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Presented by Dragana Veljkovic

Overview • Curse-of-dimensionality • Dimension reduction techniques • Isomap • Locally linear embedding (LLE) • Problems and improvements

Problem description • Large amount of data being collected leads to creation of very large databases • Most problems in data mining involve data with a large number of measurements (or dimensions) • E.g. Protein matching, fingerprint recognition, meteorological predictions, satellite image repositories • Reducing dimensions increases capability of extracting knowledge

Problem definition • Original high dimensional data: X = (x1, …, xn) where xi=(xi1,…,xip)T • underlying low dimensional data: Y = (y1, …, yn) where yi=(yi1,…,yiq)T and q<<p • Assume X forms a smooth low dimensional manifold in high dimensional space • Find the mapping that captures the important features • Determine q that can best describe the data

Different approaches • Local or Shape preserving • Global or Topology preserving • Local embeddings • Local – simplify representation of each object regardless of the rest of the data • Features selected retain most of the information • Fourier decomposition, wavelet decomposition, piecewise constant approximation, etc.

Global or Topology preserving • Mostly used for visualization and classification • PCA or KL decomposition • MDS • SVD • ICA

Local embeddings (LE) • Overlapping local neighborhoods, collectively analyzed, can provide information on global geometry • LE preserves the local neighborhood of each object • preserving the global distances through the non-neighboring objects • Isomap and LLE

Another classification • Linear and Non Linear methods

Neighborhood Two ways to select neighboring objects: • k nearest neighbors (k-NN) – can make non-uniform neighbor distance across the dataset • ε-ball – prior knowledge of the data is needed to make reasonable neighborhoods; size of neighborhood can vary

Isomap – general idea • Only geodesic distances reflect the true low dimensional geometry of the manifold • MDS and PCA see only Euclidian distances and there for fail to detect intrinsic low-dimensional structure • Geodesic distances are hard to compute even if you know the manifold • In a small neighborhood Euclidian distance is a good approximation of the geodesic distance • For faraway points, geodesic distance is approximated by adding up a sequence of “short hops” between neighboring points

Isomap algorithm • Find neighborhood of each object by computing distances between all pairs of points and selecting closest • Build a graph with a node for each object and an edge between neighboring points. Euclidian distance between two objects is used as edge weight • Use a shortest path graph algorithm to fill in distance between all non-neighboring points • Apply classical MDS on this distance matrix

Isomap

Isomap on face images

Isomap on hand images

Isomap on written two-s

Isomap - summary • Inherits features of MDS and PCA: • guaranteed asymptotic convergence to true structure • Polynomial runtime • Non-iterative • Ability to discover manifolds of arbitrary dimensionality • Perform well when data is from a single well sampled cluster • Few free parameters • Good theoretical base for its metrics preserving properties

Problems with Isomap • Embeddings are biased to preserve the separation of faraway points, which can lead to distortion of local geometry • Fails to nicely project data spread among multiple clusters • Well-conditioned algorithm but computationally expensive for large datasets

Improvements to Isomap • Conformal Isomap – capable of learning the structure of certain curved manifolds • Landmark Isomap – approximates large global computations by a much smaller set of calculation • Reconstruct distances using k/2 closest objects, as well as k/2 farthest objects

Locally Linear Embedding (LLE) • Isomap attempts to preserve geometry on all scales, mapping nearby points close and distant points far away from each other • LLE attempts to preserve local geometry of the data by mapping nearby points on the manifold to nearby points in the low dimensional space • Computational efficiency • Representational capacity

LLE – general idea • Locally, on a fine enough scale, everything looks linear • Represent object as linear combination of its neighbors • Representation indifferent to affine transformation • Assumption: same linear representation will hold in the low dimensional space

LLE – matrix representation X = W*X where • X is p*n matrix of original data • W is n*n matrix of weights and • Wij =0 if Xj is not neighbor of Xi • rows of W sum to one Need to solve system Y = W*Y • Y is q*n matrix of underlying low dimensional data • Minimize error:

LLE - algorithm • Find k nearest neighbors in X space • Solve for reconstruction weights W • Compute embedding coordinates Y using weights W: • create sparse matrix M = (I-W)'*(I-W) • Compute bottom q+1 eigenvectors of M • Set i-th row of Y to be i+1 smallest eigen vector

Numerical Issues • Covariance matrix used to compute W can be ill-conditioned, regularization needs to be used • Small eigen values are subject to numerical precision errors and to getting mixed • But, sparse matrices used in this algorithm make it much faster then Isomap

LLE

LLE – effect of neighborhood size

LLE – with face picture

LLE – Lips pictures

PCA vs. LLE

Problems with LLE • If data is noisy, sparse or weakly connected coupling between faraway points can be attenuated • Most common failure of LLE is mapping close points that are faraway in original space – arising often if manifold is undersampled • Output strongly depends on selection of k

References • Roweis, S. T. and L. K. Saul (2000). "Nonlinear dimensionality reduction by locally linear embedding " Science290(5500): 2323-2326. • Tenenbaum, J. B., V. de Silva, et al. (2000). "A global geometric framework for nonlinear dimensionality reduction " Science290(5500): 2319-2323. • Vlachos, M., C. Domeniconi, et al. (2002). "Non-linear dimensionality reduction techniques for classification and visualization." Proc. of 8th SIGKDD, Edmonton, Canada. • de Silva, V. and Tenenbaum, J. (2003). “Local versus global methods for nonlinear dimensionality reduction”, Advances in Neural Information Processing Systems,15.

Questions?

Nonlinear Dimensionality Reduction