1 / 21

Principal Component Analysis

Principal Component Analysis. Beatrice M. Ombuki-Berman See attached Keller slides & paper. A standard statistical technique for reducing dimensionality of data (without using neural approach Purpose: Better understanding or communication of data;

hakan
Download Presentation

Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Component Analysis Beatrice M. Ombuki-Berman See attached Keller slides & paper

  2. A standard statistical technique for reducing dimensionality • of data (without using neural approach • Purpose: Better understanding or communication of data; • used a lot in the sciences to select the most important • features • In so reducing, we want to lose as little information as • possible, given the before- and after- dimensions. • Also known as Karhuenen-Loeve (K-L) transformation • (Watanabe, 1969) What is PCA?

  3. Linear regression requires one to • pre-identify dependent vs. independent variables. • PCA does not. Comparison with Linear Regression

  4. Suppose we are given a set of data points:http://144.124.112.51/auj/scattering/demo/page4.active.html transform coordinates to get a better understanding

  5. Shift set of points so that average position is at origin

  6. What is the trend (direction) of the set of points • Which is the direction of maximum variance (dispersion) • Which is the direction of minimum variance (dispersion) We may need answers to questions like:

  7. We may need answers to questions like (2): • Suppose you are only allowed to use a 1D plot • for this set of 2D points. • How should I represent the points • in such a way that the overall error is minimized ? • --- this is data compression problem • All these questions can be answered using • Principal Components Analysis (PCA)

  8. What is Principal Component Analysis? • Standard statistical technique for data reduction • also known as Karhuenen-Loeve (K-L) transformation (in communications theory) • an effective data reduction technique for representing the most common variations to all the training data.

  9. Principal component analysis • Useful statistical technique in various fields such as e.g., face recognition and image compression. • Common technique for finding patterns in data of high dimension.

  10. Principal Component Analysis • Compute the covariance matrix. • Determine its principal components. • Project data into the plane spanned by the principal components. PCA is a linear method

  11. What is “Principal Component” Number of principal components depends on number of Dimensions of the data points First principal component -> predominant direction in the data

  12. Illustrative Example • Project 2-dimensional data down onto a 1-dimensional space.

  13. Main idea • Transform the input data into fewer dimension • Preserve as much of the variance as possible

  14. Transformation

  15. Predominant direction • Minimizes the reconstruction error • Spanned by the directions of largest variance • Spanned by the principal eigenvectors of the covariance matrix i.e., the eigenvectors with maximal eigen values

  16. Planets Examplefromhttp://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html • Suppose we have 3-dimensional data set where the variables are the logarithms of • distance to the sun • equatorial diameter • the density

  17. Data Sets prior to taking logs

  18. Projecting data • Possible to project a set of data points on fewer transformations by ignoring certain columns • Projection is a special case of a linear transformer

  19. Data projections: In 2D, we can plot any one variable against another

  20. Data projection • Having the 2D data plot gives a nice representation with obvious properties: • close points mean similar planets • far apart points mean dissimilar planets • convex hull points mean "extreme" planets Now suppose that there is another planet feature that you find equally important: the density

  21. Maximinizing Variance • Transforming using first two principal components preserves more of the variance (summing variances in each dimension) in the projection than does projecting on any 2 variable

More Related