Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Presented by Dragana Veljkovic

Overview • Curse-of-dimensionality • Dimension reduction techniques • Isomap • Locally linear embedding (LLE) • Problems and improvements

Problem description • Large amount of data being collected leads to creation of very large databases • Most problems in data mining involve data with a large number of measurements (or dimensions) • E.g. Protein matching, fingerprint recognition, meteorological predictions, satellite image repositories • Reducing dimensions increases capability of extracting knowledge

Problem definition • Original high dimensional data: X = (x1, …, xn) where xi=(xi1,…,xip)T • underlying low dimensional data: Y = (y1, …, yn) where yi=(yi1,…,yiq)T and q<<p • Assume X forms a smooth low dimensional manifold in high dimensional space • Find the mapping that captures the important features • Determine q that can best describe the data

Different approaches • Local or Shape preserving • Global or Topology preserving • Local embeddings • Local – simplify representation of each object regardless of the rest of the data • Features selected retain most of the information • Fourier decomposition, wavelet decomposition, piecewise constant approximation, etc.

Global or Topology preserving • Mostly used for visualization and classification • PCA or KL decomposition • MDS • SVD • ICA

Local embeddings (LE) • Overlapping local neighborhoods, collectively analyzed, can provide information on global geometry • LE preserves the local neighborhood of each object • preserving the global distances through the non-neighboring objects • Isomap and LLE

Another classification • Linear and Non Linear methods

Neighborhood Two ways to select neighboring objects: • k nearest neighbors (k-NN) – can make non-uniform neighbor distance across the dataset • ε-ball – prior knowledge of the data is needed to make reasonable neighborhoods; size of neighborhood can vary

Isomap – general idea • Only geodesic distances reflect the true low dimensional geometry of the manifold • MDS and PCA see only Euclidian distances and there for fail to detect intrinsic low-dimensional structure • Geodesic distances are hard to compute even if you know the manifold • In a small neighborhood Euclidian distance is a good approximation of the geodesic distance • For faraway points, geodesic distance is approximated by adding up a sequence of “short hops” between neighboring points

Isomap algorithm • Find neighborhood of each object by computing distances between all pairs of points and selecting closest • Build a graph with a node for each object and an edge between neighboring points. Euclidian distance between two objects is used as edge weight • Use a shortest path graph algorithm to fill in distance between all non-neighboring points • Apply classical MDS on this distance matrix

Isomap

Isomap on face images

Isomap on hand images

Isomap on written two-s

Isomap - summary • Inherits features of MDS and PCA: • guaranteed asymptotic convergence to true structure • Polynomial runtime • Non-iterative • Ability to discover manifolds of arbitrary dimensionality • Perform well when data is from a single well sampled cluster • Few free parameters • Good theoretical base for its metrics preserving properties

Problems with Isomap • Embeddings are biased to preserve the separation of faraway points, which can lead to distortion of local geometry • Fails to nicely project data spread among multiple clusters • Well-conditioned algorithm but computationally expensive for large datasets

Improvements to Isomap • Conformal Isomap – capable of learning the structure of certain curved manifolds • Landmark Isomap – approximates large global computations by a much smaller set of calculation • Reconstruct distances using k/2 closest objects, as well as k/2 farthest objects

Locally Linear Embedding (LLE) • Isomap attempts to preserve geometry on all scales, mapping nearby points close and distant points far away from each other • LLE attempts to preserve local geometry of the data by mapping nearby points on the manifold to nearby points in the low dimensional space • Computational efficiency • Representational capacity

LLE – general idea • Locally, on a fine enough scale, everything looks linear • Represent object as linear combination of its neighbors • Representation indifferent to affine transformation • Assumption: same linear representation will hold in the low dimensional space

LLE – matrix representation X = W*X where • X is p*n matrix of original data • W is n*n matrix of weights and • Wij =0 if Xj is not neighbor of Xi • rows of W sum to one Need to solve system Y = W*Y • Y is q*n matrix of underlying low dimensional data • Minimize error:

LLE - algorithm • Find k nearest neighbors in X space • Solve for reconstruction weights W • Compute embedding coordinates Y using weights W: • create sparse matrix M = (I-W)'*(I-W) • Compute bottom q+1 eigenvectors of M • Set i-th row of Y to be i+1 smallest eigen vector

Numerical Issues • Covariance matrix used to compute W can be ill-conditioned, regularization needs to be used • Small eigen values are subject to numerical precision errors and to getting mixed • But, sparse matrices used in this algorithm make it much faster then Isomap

LLE

LLE – effect of neighborhood size

LLE – with face picture

LLE – Lips pictures

PCA vs. LLE

Problems with LLE • If data is noisy, sparse or weakly connected coupling between faraway points can be attenuated • Most common failure of LLE is mapping close points that are faraway in original space – arising often if manifold is undersampled • Output strongly depends on selection of k

References • Roweis, S. T. and L. K. Saul (2000). "Nonlinear dimensionality reduction by locally linear embedding " Science290(5500): 2323-2326. • Tenenbaum, J. B., V. de Silva, et al. (2000). "A global geometric framework for nonlinear dimensionality reduction " Science290(5500): 2319-2323. • Vlachos, M., C. Domeniconi, et al. (2002). "Non-linear dimensionality reduction techniques for classification and visualization." Proc. of 8th SIGKDD, Edmonton, Canada. • de Silva, V. and Tenenbaum, J. (2003). “Local versus global methods for nonlinear dimensionality reduction”, Advances in Neural Information Processing Systems,15.

Questions?

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Presentation Transcript

Nonlinear Dimensionality Reduction Frameworks

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Dimensionality reduction

Dimensionality Reduction

Nonlinear Dimensionality Reduction for Hyperspectral Image Classification

Dimensionality reduction

Nonlinear Dimensionality Reduction for Hyperspectral Image Classification

Nonlinear Dimensionality Reduction Approach (ISOMAP)

Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Dimensionality Reduction

Nonlinear Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction Part 2: Nonlinear Methods

Three Algorithms for Nonlinear Dimensionality Reduction

Dimensionality reduction

Dimensionality reduction

Recent developments in nonlinear dimensionality reduction

Dimensionality Reduction

Video Analysis via Nonlinear Dimensionality Reduction Technique

Dimensionality reduction