1 / 52

Dimensionality Reduction Part 2: Nonlinear Methods

Dimensionality Reduction Part 2: Nonlinear Methods. Comp 790-090 Spring 2007. Previously…. Linear Methods for Dimensionality Reduction PCA: rotate data so that principal axes lie in direction of maximum variance MDS: find coordinates that best preserve pairwise distances. PCA. Motivation.

dirk
Download Presentation

Dimensionality Reduction Part 2: Nonlinear Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimensionality ReductionPart 2: Nonlinear Methods Comp 790-090 Spring 2007

  2. Previously… • Linear Methods for Dimensionality Reduction • PCA: rotate data so that principal axes lie in direction of maximum variance • MDS: find coordinates that best preserve pairwise distances PCA

  3. Motivation • Linear Dimensionality Reduction doesn’t always work • Data violates underlying “linear”assumptions • Data is not accurately modeled by “affine” combinations of measurements • Structure of data, while apparent, is not simple • In the end, linear methods do nothing more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first

  4. Stopgap Remedies • Local PCA • Compute PCA models for small overlapping item neighborhoods • Requires a clustering preprocess • Fast and simple, but results in no global parameterization • Neural Networks • Assumes a solution of a given dimension • Uses relaxation methods to deform given solution to find a better fit • Relaxation step is modeled as “layers” in a network where properties of future iterations are computed based on information from the current structure • Many successes, but a bit of an art

  5. Why Linear Modeling Fails • Suppose that your sample data lies on some low-dimensional surface embedded within the high-dimensional measurement space. • Linear models allow ALLaffine combinations • Often, certaincombinations are atypicalof the actual data • Recognizing this isharder as dimensionalityincreases

  6. What does PCA Really Model? • Principle Component Analysis assumptions • Mean-centered distribution • What if the mean, itself is atypical? • Eigenvectors ofCovariance • Basis vectors alignedwith successive directionsof greatest variance • Classic 1st Orderstatistical model • Distribution is characterizedby its mean and variance (Gaussian Hyperspheres)

  7. Non-Linear Dimensionality Reduction • Non-linear Manifold Learning • Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods • Discover a lower-dimensional“embedding” manifold • Find a parameterizationover that manifold • Linear parameter space • Projection mappingfrom original M-Dspace to d-Dembedding space “reprojection,elevating, or lifting” “projection” Linear Embedding Space

  8. Nonlinear DimRedux Steps • Discover a low-dimensional embedding manifold • Find a parameterization over the manifold • Project data into parameter space • Analyze, interpolate, and compress in embedding space • Orient (by linear transformation) the parameter space to align axes with salient features • Linear (affine) combinations are valid here • In the case of interpolation and compression use “lifting” to estimate M-D original data

  9. Nonlinear Methods • Local Linear Embeddings [Roweis 2000] • Isomaps [Tenenbaum 2000] • These two papers ignited the field • Principled approach (Asymptotically, as the amount of data goes to infinity they have been proven to find the “real” manifold) • Widely applied • Hotly contested

  10. Local Linear Embeddings • First Insight • Locally, at a fine enough scale, everything looks linear

  11. Local Linear Embeddings • First Insight • Find an affine combination the “neighborhood” about a point that best approximates it

  12. Finding a Good Neighborhood • This is the remaining “Art” aspect of nonlinear methods • Common choices • -ball: find all items that lie within an epsilon ball of the target item as measured under some metric • Best if density of items is high and every point has a sufficient number of neighbors • K-nearest neighbors: find the k-closest neighbors to a point under some metric • Guarantees all items are similarly represented, limits dimension to K-1

  13. Within locally linear neighborhoods, each point can be considered as an affine combination of its neighbors Affine “Neighbor” Combinations Imagine cutting out patches from manifold and placing them in lower-dim so that angles between points are preserved. • Weights should still be valid in lower-dimensional embedding space

  14. Find Weights • Rewriting as a matrix for all x • Reorganizing • Want to find W that minimizes , and satisfies “sum-to-one” constraint • Ends up as constrained “least-squares” problem “Unknown W matrix” N N N M N M

  15. Find Linear Embedding Space • Now that we have the weight matrix W, find the linear vector that satisfies the followingwhere W is N x N and X is M x N • This can be found by finding the null space of • Classic problem: run SVD on and find the orthogonal vector associated with the smallest d singular values (the smallest singular value will be zero and represent the system’s invariance to translation)

  16. Numerical Issues • Numerical problems can arise in computing LLEs • The least-squared covariance matrix that arises in the computation of the weighting matrix, W, solution can be ill-conditioned • Regularization (rescale the measurements by adding a small multiple of the Identity to covariance matrix) • Finding small singular (eigen) values is not as well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed • Good (but slow) solvers exist, you have to use them

  17. Results • The resulting parameter vector, yi, gives the coordinates associated with the item xi • The dth embedding coordinate is formed from the orthogonal vector associated with thedst singular value of A.

  18. Reprojection • Often, for data analysis, a parameterization is enough • For interpolation and compression we might want to map points from the parameter space back to the “original” space • No perfect solution, but a few approximations • Delauney triangulate the points in the embedding space, find the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weights • Interpolate by using a radially symmetric kernel centered about the desired parameter setting • Works, but mappings might not be one-to-one

  19. LLE Example • 3-D S-Curve manifold with points color-coded • Compute a 2-D embedding • The local affine structure is well maintained • The metric structure is okay locally, but can drift slowly over the domain (this causes the manifold to taper)

  20. More LLE Examples

  21. More LLE Examples

  22. LLE Failures • Does not work on to closed manifolds • Cannot recognize Topology

  23. Isomap • An alternative non-linear dimensionality reduction method that extends MDS • Key Observation:On a manifold distances are measured using geodesic distances rather than Euclidean distances Small Euclidean distance Large geodesic distance

  24. Problem: How to Get Geodesics • Without knowledge of the manifold it is difficult to compute the geodesic distance between points • It is even difficult if you know the manifold • Solution • Use a discrete geodesic approximation • Apply a graph algorithm to approximate the geodesic distances

  25. Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm

  26. Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm

  27. Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm

  28. Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm

  29. Isomap algorithm • Compute fully-connected neighborhood of points for each item • Can be k nearest neighbors or ε-ball • Neighborhoods must be symmetric • Test that resulting graph is fully-connected, if not increase either K or  • Calculate pairwise Euclidean distances within each neighborhood • Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points • Run MDS on resulting distance matrix

  30. Isomap Results • Find a 2D embedding of the 3D S-curve (also shown for LLE) • Isomap does a good job of preserving metric structure (not surprising) • The affine structure is also well preserved

  31. Residual Fitting Error

  32. Neighborhood Graph

  33. More Isomap Results

  34. More Isomap Results

  35. Isomap Failures • Isomap also has problems on closed manifolds of arbitrary topology

  36. Non-Linear Example • A Data-Driven Reflectance Model (Matusik et al, Siggraph2003) • Bidirectional Reflectance Distribution Functions(BRDF) • Define ratio of the reflected radiance in a particular direction to the incident irradiance from direction. • Isotropic BRDF

  37. Measurement • Modeling Bidirectional Reflectance Distribution Functions(BRDFs)

  38. Measurement • A “fast” BRDF measurement device inspired by Marshner[1998]

  39. Measurement • 20-80 million reflectance measurements per material • Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins

  40. Measurement • 20-80 million reflectance measurements per material • Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins

  41. Nickel Hematite Gold Paint Pink Felt Rendering from Tabulated BRDFs • Even without further analysis, our BRDFs are immediately useful • Renderings made with Henrik Wann Jensen’s Dali renderer

  42. BRDFs as Vectors in High-Dimensional Space • Each tabulated BRDF is a vector in 90x90x180x3 =4,374,000 dimensional space 180 Unroll 90 90 4,374,000

  43. 20 mean 5 10 30 45 60 all Linear Analysis (PCA) Eigenvalue magnitude • Find optimal “linear basis”for our data set • 45 componentsneeded to reduce residue to under measurement error 120 100 60 80 40 20 0 Dimension

  44. Problems with Linear Subspace Modeling • Large number of basis vectors (45) • Some linear combinations yield invalid or unlikely BRDFs (outside convex hull)

  45. Problems with Linear Subspace Modeling • Large number of basis vectors (45) • Some linear combinations yield invalid or unlikely BRDFs (inside convex hull)

  46. Results of Non-LinearManifold Learning • At 15 dimensions reconstruction error is less than 1% • Parameter count similar to analytical models Error 5 10 15 Dimensionality

  47. Non-Linear Advantages • 15-dimensional parameter space • More robust than linear model • More extrapolations are plausible Linear Model Extrapolation Non-linear Model Extrapolation

  48. Non-Linear Model Results

  49. Non-Linear Model Results

  50. Non-Linear Model Results

More Related