1 / 37

Principle Component Analysis

Principle Component Analysis. Dr Poonam Goyal CS & IS BITS, Pilani. Introduction. how do we get from this data set to a simple equation of x ?. Introduction. How can we identify the most meaningful basis to re-express a data set

malaya
Download Presentation

Principle Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principle Component Analysis Dr PoonamGoyal CS & IS BITS, Pilani

  2. Introduction how do we get from this data set to a simple equation of x ?

  3. Introduction • How can we identify the most meaningful basis to re-express a data set • We hope that it will filter out the noise and reveal hidden structure • In the previous example, the goal is to determine the dynamics along the x-axis • Or to determine that x^ the unit basis vector along the x-axis is important

  4. Introduction • D is a mxn matrix where m is the number of measurements and n is the number of observations • Each is m-dimensional vector spanned by some orthogonal basis • What is this orthogonal basis?

  5. Preliminaries • Mean • Standard Deviation

  6. Preliminaries • Variance • Covariance • Symmetric • -ve or +ve or zero • Covariance in three dimensions • Covariance matrix S

  7. Preliminaries • Eigen Vectors • Eigen Values

  8. Principle Component Analysis • Characteristics • Tends to identify the strongest pattern in the dataset • Can be used as pattern finding technique • Retains most of the information which is present in the dataset • Most of the variability of the data can be captured by a small fraction of the total set of dimensions • Results in relatively low dimensional data • Techniques can be applied which don’t work well with high dimensional data • Can eliminate much of the noise • If noise in the data is weaker than the pattern (hopefully)

  9. Geometric picture PCs • The 1st PC Z1 is a minimum distance fit to a line in X space • The 2nd PC Z2 is a minimum distance fit to a line in the plane perpendicular to the 1st PC • PCs are a series of linear least squares fits to a sample, each orthogonal to all the previous.

  10. Principle Component Analysis • Goal is to find a transformation of the data which satisfies the following • Each pair of new attributes has covariance =0 • Attributes are ordered with respect to how much of the variance of the data each attribute captures • The first attribute captures as much of the variation of the data as possible • Subject to the orthogonality requirement , each successive attribute captures as much of as the remaining variance as possible

  11. Data

  12. Data

  13. Data

  14. PCA • Covariance Matrix • For the adjusted matrix Cov=DTD

  15. PCA • Choosing Components and forming a feature vector • Choose both the eigen vectors • Leave out the smaller and • have only single column • Called feature vector • Feature vector=(ev1,ev2,…., evp)

  16. PCA • Driving the new dataset D’ • D’= Feature VectorT x DataAdjust

  17. PCA • Getting the original data set back • Data AdjustT = (Feature VectorT)-1 x D’ • Original DataT = ((Feature VectorT)-1 x D’) + Original mean

  18. PCA

  19. PCA Fraction of variance accounted for by each principle component 0.9 0.1 3 1 2

  20. Feature reduction • Transforming original data onto lower dimensional space • All original features are used • The transformed features are linear combinations of the original features • Most machine learning and data mining techniques may not be effective for high-dimensional data • The intrinsic dimension may be small

  21. Feature Reduction Algorithms • Unsupervised • Singular Value Decomposition (SVD) • Independent Component Analysis (ICA) • Principle Component Analysis (PCA) • Correlation Analysis (CA) • Supervised • Linear Discreminant Analysis (LDA) All are linear algorithms

  22. Singular Value Decomposition (SVD) • We know (DTD)evi=ievi • i s are positive real and termed as singular values and equal to i

  23. Singular Value Decomposition (SVD) It is always possible to decompose matrix Dmxn into D = U L VT where UTU = I; V TV = I; the columns of U are orthonormal eigenvectors of DDT , and the columns of V are orthonormal eigenvectors of DTD L is a diagonal matrix containing singular values from U or V which are positive and sorted in decreasing order U is mxr matrix, L is rxr matrix and V is nxr matrix

  24. Gram Schmidt Orthonormalization process • a1 is the first column eigen vector of A • normalize it to get the first orthonormal vector v1 • kth orthogonal vector is • Normalize it to get orthonormal vector vk • In our case A is the matrix of eigen vectors of the square matrix DDT or DTD

  25. Example • Problem: • #1: Find concepts in text • #2: Reduce dimensionality

  26. Singular Value Decomposition (SVD) For general rectangular matrix D D[m x n] = U[m x r]L [ r x r] (V[n x r])T • D: m x n matrix (e.g., m documents, n terms) • U: m x r matrix (m documents, r concepts) • L: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix) • V: n x r matrix (n terms, r concepts)

  27. Singular Value Decomposition (SVD) D = ULVT Decomposition of a matrix D s1 x x = u1 u2 s2 v1 v2

  28. SVD - Interpretation ‘documents’, ‘terms’ and ‘concepts’: • U: document-to-concept similarity matrix • V: term-to-concept similarity matrix • L: its diagonal elements: ‘strength’ of each concept Projection: • best axis to project on: (‘best’ = min sum of squares of projection errors)

  29. SVD - Example • A = ULVT - example: retrieval inf. lung brain data CS x x = MD

  30. SVD - Example • A = ULVT - example: doc-to-concept similarity matrix retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

  31. SVD - Example • A = ULVT - example: retrieval ‘strength’ of CS-concept inf. lung brain data CS x x = MD

  32. SVD - Example • A = ULVT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS-concept CS x x = MD

  33. SVD – Dimensionality reduction • Q: how exactly is dim. reduction done? • A: set the smallest singular values to zero: x x =

  34. SVD - Dimensionality reduction x x ~

  35. SVD - Dimensionality reduction ~

  36. Best-fit regression line reduces data from two dimensions into one. Regression line along second dim. captures less variation in original data.

  37. Properties of SVD • Patterns among the attributes are captured by right singular vectors i.e. the columns of V • Patterns among the objects are captured by the left singular vectors i.e. the columns of U • The larger the singular value, the larger the fraction of a matrix that is accounted for by the singular value and its associated singular vector

More Related