1 / 80

Machine Learning for Signal Processing Principal Component Analysis

Machine Learning for Signal Processing Principal Component Analysis. Class 7 . 12 Feb 2015 Instructor: Bhiksha Raj. Recall: Representing images. The most common element in the image: background Or rather large regions of relatively featureless shading Uniform sequences of numbers.

hoehn
Download Presentation

Machine Learning for Signal Processing Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning for Signal ProcessingPrincipal Component Analysis Class 7. 12 Feb 2015 Instructor: Bhiksha Raj 11-755/18-797

  2. Recall: Representing images • The most common element in the image: background • Or rather large regions of relatively featureless shading • Uniform sequences of numbers 11-755/18-797

  3. Adding more bases B1 B2 B3 B4 B5 B6 Getting closer at 625 bases! Checkerboards with different variations 11-755/18-797

  4. “Bases” B1 B2 B3 B4 B5 B6 • “Bases” are the “standard” units such that all instances can be expressed a weighted combinations of these units • Ideal requirements: Bases must be orthogonal • Checkerboards are one choice of bases • Orthogonal • But not “smooth” • Other choices of bases: Complex exponentials, Wavelets, etc.. 11-755/18-797

  5. Data specific bases? • Issue: All the bases we have considered so far are data agnostic • Checkerboards, Complex exponentials, Wavelets.. • We use the same bases regardless of the data we analyze • Image of face vs. Image of a forest • Segment of speech vs. Seismic rumble • How about data specific bases • Bases that consider the underlying data • E.g. is there something better than checkerboards to describe faces • Something better than complex exponentials to describe music? 11-755/18-797

  6. The Energy Compaction Property • Define “better”? • The description • The ideal: • If the description is terminated at any point, we should still get most of the information about the data • Error should be small 11-755/18-797

  7. Data-specific description of faces • A collection of images • All normalized to 100x100 pixels • What is common among all of them? • Do we have a common descriptor? 11-755/18-797

  8. A typical face The typical face • Assumption: There is a “typical” face that captures most of what is common to all faces • Every face can be represented by a scaled version of a typical face • We will denote this face as V • Approximate everyface f as f = wf V • Estimate V to minimize the squared error • How? What is V? 11-755/18-797

  9. A collection of least squares typical faces • Assumption: There are a set of K “typical” faces that captures most of all faces • Approximate every face f as f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,kVk • V2 is used to “correct” errors resulting from using only V1. So on average • V3 corrects errors remaining after correction with V2 • And so on.. • V = [V1 V2 V3] • Estimate V to minimize the squared error • How? What is V? 11-755/18-797

  10. Abstracting the problem: Finding the FIRST typical face Pixel 2 Pixel 1 Each “point” represents a face in “pixel space” 11-755/18-797

  11. Abstracting the problem: Finding the FIRST typical face V1 Pixel 2 Pixel 1 • The problem of finding the first typical face V1:Find the V for which the total projection error is minimum! • This “minimum squared error” V is our “best” first typical face • It is also the first Eigen face • Not really, but we’ll get to the correction later 11-755/18-797

  12. Formalizing the Problem: Error from approximating a single vector v vvTx Approximating: x = wv y x-vvTx x x Projection of a vector x on to a vector Assuming v is of unit length: 11-755/18-797

  13. Error for many vectors v y x • Total error: • Constrained objective to minimize: 11-755/18-797

  14. Minimizing error v y x • Differentiating w.r.t vand equating to 0 11-755/18-797

  15. The correlation matrix X = Data Matrix Correlation = XT = TransposedData Matrix The encircled term is the correlation matrix 11-755/18-797

  16. The best “basis” v y x • The minimum-error basis is found by solving • v is an Eigen vector of the correlation matrix R • l is the corresponding Eigen value 11-755/18-797

  17. Abstracting the problem: Finding the SECOND typical face Scatter of Error Faces V1 Pixel 2 Pixel 1 • The second “typical” face is the first Eigen vector of the error faces • Also the second Eigen vector 11-755/18-797

  18. The typical face The typical face • Compute the correlation matrix for your data • Arrange them in matrix X and compute R = XXT • Compute the principal Eigen vector of R • The Eigen vector with the largest Eigen value • This is the typical face 11-755/18-797

  19. The approximation with the first typical face • The first typical face models some of the characteristics of the faces • Simply by scaling its grey level • But the approximation has error • The second typical face must explain some of this error 11-755/18-797

  20. The second typical face The first typical face The second typical face? ? Approximation with only the first typical face has error The second face must explain this error How do we find this this face? 11-755/18-797

  21. Solution: Iterate - = - = - = - = Get the “error” faces by subtracting the first-level approximation from the original image Repeat the estimation on the “error” images 11-755/18-797

  22. Solution: Iterate - = - = - = - = Get the “error” faces by subtracting the first-level approximation from the original image Repeat the estimation on the “error” images 11-755/18-797

  23. Abstracting the problem: Finding the second typical face ERROR FACES Pixel 2 Pixel 1 • Each “point” represents an error face in “pixel space” • Find the vector V2 such that the projection of these error faces on V2 results in the least error 11-755/18-797

  24. Minimizing error The same math appliesbut now to the set of error data points ERROR FACES Pixel 2 Pixel 1 • Differentiating w.r.t vand equating to 0 11-755/18-797

  25. Minimizing error The same math appliesbut now to the set of error data points ERROR FACES Pixel 2 Pixel 1 • The minimum-error basis is found by solving • v2 is an Eigen vector of the correlation matrix Recorresponding to the largest eigen valuel of Re 11-755/18-797

  26. Which gives us our second typical face But approximation with the two faces will still result in error So we need more typical faces to explain this error We can do this by subtracting the appropriately scaled version of the second “typical” face from the error images and repeating the process 11-755/18-797

  27. Solution: Iterate Error face Second-level error - = - = - = - = • Get the second-level “error” faces by subtracting the scaled second typical face from the first-level error • Repeat the estimation on the second-level “error” images 11-755/18-797

  28. An interesting property • Each “typical face” will be orthogonal to all other typical faces • Because each of them is learned to explain what the rest could not • None of these faces can explain one another! 11-755/18-797

  29. To add more faces • We can continue the process, refining the error each time • An instance of a procedure is called “Gram-Schmidt” orthogonalization • OR… we can do it all at once 11-755/18-797

  30. With many typical faces M = W Typical faces V U = Approximation • Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,k Vk • Here W, V and U are ALL unknown and must be determined • Such that the squared error between U and M is minimum 11-755/18-797

  31. With multiple bases • Assumption: all bases v1 v2 v3.. are unit length • Assumption: all bases are orthogonal to one another: viTvj = 0 if i != j • We are trying to find the optimal K-dimensional subspace to project the data • Any set of vectors in this subspace will define the subspace • Constraining them to be orthogonal does not change this • I.e. if V = [v1v2v3 … ], VTV = I • Pinv(V) = VT • Projection matrix for V = VPinv(V) = VVT 11-755/18-797

  32. With multiple bases Represents a K-dimensional subspace V VVTx y x-VVTx x x Projection for a vector Error vector = Error length = 11-755/18-797

  33. With multiple bases V y x Error for one vector: Error for many vectors Goal: Estimate V to minimize this error! 11-755/18-797

  34. Minimizing error • With constraint VTV = I, objective to minimize • Note: now L is a diagonal matrix • The constraint simply ensures that vTv = 1 for every basis • Differentiating w.r.t Vand equating to 0 11-755/18-797

  35. Finding the optimal K bases Compute the Eigendecompsition of the correlation matrix Select K Eigen vectors But which K? Total error = Select K eigen vectors corresponding to the K largest Eigen values 11-755/18-797

  36. Eigen Faces! Arrange your input data into a matrix X Compute the correlation R = XXT Solve the Eigen decomposition: RV = LV The Eigen vectors corresponding to the K largest eigen values are our optimal bases We will refer to these as eigen faces. 11-755/18-797

  37. How many Eigen faces 300x10000 M = Data Matrix 10000x10000 10000x300 Correlation = MT = TransposedData Matrix • How to choose “K” (number of Eigen faces) • Lay all faces side by side in vector form to form a matrix • In my example: 300 faces. So the matrix is 10000 x 300 • Multiply the matrix by its transpose • The correlation matrix is 10000x10000 11-755/18-797

  38. Eigen faces [U,S] = eig(correlation) eigenface1 eigenface2 • Compute the eigen vectors • Only 300 of the 10000 eigen values are non-zero • Why? • Retain eigen vectors with high eigen values (>0) • Could use a higher threshold 11-755/18-797

  39. Eigen Faces eigenface1 eigenface2 eigenface1 eigenface2 eigenface3 The eigen vector with the highest eigen value is the first typical face The vector with the second highest eigen value is the second typical face. Etc. 11-755/18-797

  40. Representing a face = w1 + w2 + w3 Representation = [w1w2w3 …. ]T The weights with which the eigen faces must be combined to compose the face are used to represent the face! 11-755/18-797

  41. Energy Compaction Example One outcome of the “energy compaction principle”: the approximations are recognizable Approximating a face with one basis: 11-755/18-797

  42. Energy Compaction Example One outcome of the “energy compaction principle”: the approximations are recognizable Approximating a face with one Eigenface: 11-755/18-797

  43. Energy Compaction Example One outcome of the “energy compaction principle”: the approximations are recognizable Approximating a face with 10 eigenfaces: 11-755/18-797

  44. Energy Compaction Example One outcome of the “energy compaction principle”: the approximations are recognizable Approximating a face with 30 eigenfaces: 11-755/18-797

  45. Energy Compaction Example One outcome of the “energy compaction principle”: the approximations are recognizable Approximating a face with 60 eigenfaces: 11-755/18-797

  46. How did I do this? Hint: only changing weights assigned to Eigen faces.. 11-755/18-797

  47. Class specificity eigenface2 eigenface1 eigenface3 • The Eigenimages (bases) are very specific to the class of data they are trained on • Faces here • They will not be useful for other classes 11-755/18-797

  48. Class specificity • Eigen bases are class specific • Composing a fishbowl from Eigenfaces 11-755/18-797

  49. Class specificity • Eigen bases are class specific • Composing a fishbowl from Eigenfaces • With 1 basis 11-755/18-797

  50. Class specificity • Eigen bases are class specific • Composing a fishbowl from Eigenfaces • With 10 bases 11-755/18-797

More Related