1 / 15

Data Mining: Data

Data Mining: Data. Lecture Notes for Chapter 2 Introduction to PCA (Principal Component Analysis). What is PCA?. Stands for “Principal Component Analysis” Useful technique in many applications such as face recognition, image compression, finding patterns in data of high dimension

elpida
Download Presentation

Data Mining: Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining: Data Lecture Notes for Chapter 2 Introduction to PCA (Principal Component Analysis)

  2. What is PCA? • Stands for “Principal Component Analysis” • Useful technique in many applications such as face recognition, image compression, finding patterns in data of high dimension • Before introducing this topic, you should know the background knowledge about • Standard deviation • Covariance • Eigenvectors • Eigenvalues (Elementary Linear Algegra)

  3. What is PCA? • “It is a way of identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences” • PCA is a powerful tool for analyzing data • Finding the patterns in the data (Feature extraction)— as in the name “Principal Component” means major or maximum information • Reducing the number of dimensions without much loss of information (data reduction, noise rejection, visualization, data compression etc.)

  4. Application of PCA • Bivariate of Data set

  5. Tutorial by Example • Step1: Get some data

  6. Tutorial by Example • Step2: Make a data set whose mean is zero • Compute the mean and std, Then subtract the mean from each of data dimensions

  7. Tutorial by Example

  8. Tutorial by Example • Step3: Calculate the covariance matrix (see PCATutorial.pdf) Since the data is 2 dim, the covariance matrix will be 2x2 • What to notice?

  9. Tutorial by Example • Step4: Calculate the eigenvectors and eigenvalues of the covariance matrix

  10. Tutorial by Example

  11. Tutorial by Example • Step5: Choosing components and forming a feature vector • The eigenvector with the highest eigenvalue is the principle component of the data set • The principle component from the example • You can decide to ignore the components of lesser significance, you do lose some information • If the eigenvalues are small, you don’t lose much • If you leave out some components, the final data set will have less dimensions (features) than the original

  12. Tutorial by Example • Then after ordering the eigenvectors by eigenvalues (highest to lowest), this can form a feature vector FeatureVector = (eig1 eig2 eig3 … eign) • From this example, we have two eigenvectors • So we have two chioces • Form a featuer vector with both of the eigenvectors • Leave out smaller, less significant component and only have a single column

  13. Tutorial by Example • Step6 : Deriving the new data set

  14. Tutorial by Example

  15. Tutorial by Example

More Related