Download
data mining data n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data Mining: Data PowerPoint Presentation
Download Presentation
Data Mining: Data

Data Mining: Data

154 Views Download Presentation
Download Presentation

Data Mining: Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Data Mining: Data Lecture Notes for Chapter 2 Introduction to PCA (Principal Component Analysis)

  2. What is PCA? • Stands for “Principal Component Analysis” • Useful technique in many applications such as face recognition, image compression, finding patterns in data of high dimension • Before introducing this topic, you should know the background knowledge about • Standard deviation • Covariance • Eigenvectors • Eigenvalues (Elementary Linear Algegra)

  3. What is PCA? • “It is a way of identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences” • PCA is a powerful tool for analyzing data • Finding the patterns in the data (Feature extraction)— as in the name “Principal Component” means major or maximum information • Reducing the number of dimensions without much loss of information (data reduction, noise rejection, visualization, data compression etc.)

  4. Application of PCA • Bivariate of Data set

  5. Tutorial by Example • Step1: Get some data

  6. Tutorial by Example • Step2: Make a data set whose mean is zero • Compute the mean and std, Then subtract the mean from each of data dimensions

  7. Tutorial by Example

  8. Tutorial by Example • Step3: Calculate the covariance matrix (see PCATutorial.pdf) Since the data is 2 dim, the covariance matrix will be 2x2 • What to notice?

  9. Tutorial by Example • Step4: Calculate the eigenvectors and eigenvalues of the covariance matrix

  10. Tutorial by Example

  11. Tutorial by Example • Step5: Choosing components and forming a feature vector • The eigenvector with the highest eigenvalue is the principle component of the data set • The principle component from the example • You can decide to ignore the components of lesser significance, you do lose some information • If the eigenvalues are small, you don’t lose much • If you leave out some components, the final data set will have less dimensions (features) than the original

  12. Tutorial by Example • Then after ordering the eigenvectors by eigenvalues (highest to lowest), this can form a feature vector FeatureVector = (eig1 eig2 eig3 … eign) • From this example, we have two eigenvectors • So we have two chioces • Form a featuer vector with both of the eigenvectors • Leave out smaller, less significant component and only have a single column

  13. Tutorial by Example • Step6 : Deriving the new data set

  14. Tutorial by Example

  15. Tutorial by Example