1 / 35

Principal Components Analysis

Principal Components Analysis. BMTRY 726 3/27/14. Uses. Goal : Explain the variability of a set of variables using a “small” set of linear combinations of those variables Why : There are several reasons we may want to do this (1) Dimension Reduction (use k of p components)

zora
Download Presentation

Principal Components Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Components Analysis BMTRY 726 3/27/14

  2. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of those variables Why: There are several reasons we may want to do this (1) Dimension Reduction (use k of p components) -Note, total variability still requires p components (2) Identify “hidden” underlying relationships (i.e. patterns in the data) -Use these relationships in further analyses (3) Select subsets of variables

  3. “Exact” Principal Components We can represent data X as linear combinations of p random measurements on j = 1,2,…,nsubjects

  4. “Exact” Principal Components Principal components are those combinations that are: (1) Uncorrelated (linear combinations Y1, Y2,…, Yp) (2) Variance as large as possible (3) Subject to:

  5. Finding PC’s Under Constraints • So how do we find PC’s that meet the constraints we just discussed? • We want to maximize subject to the constraint that • This constrained maximization problem can be done using the method of Lagrange multipliers • Thus we want to maximize the function

  6. Finding PC’s Under Constraints • Differentiate w.r.t ai:

  7. Finding PC’s Under Constraints • But how do we choose our eigenvector (i.e. which eigenvector corresponds to which PC?) • We can see that what we want to maximize is • So we choose li to be as large as possible • If l1 is our largest eigenvalue with corresponding eigenvector ei then the solution for our max is

  8. Finding PC’s Under Constraints • Recall we had a second constraint • We could conduct a second Lagrangian maximization to find our second PC • However we already know that eigenvectors are independent (so this constraint is met) • We choose the order of the PCs by the magnitude of the eigenvalues

  9. “Exact” Principal Components So we can compute the PCs from the variance matrix of X, S:

  10. Properties We can also find the moments of our PC’s

  11. Properties We can also find the moments of our PC’s

  12. Properties Normality assumption not required to find PC’s If Xj ~ Np(m,S) then: Total Variance:

  13. Principal Components Consider data with p random measures on j = 1,2,…,nsubjects For the jth subject we then have the random vector X2 m2 X1 m1

  14. Graphic Representation

  15. Graphic Representation X2 Now suppose X1, X2 ~ N2(m, S) Y1axis selected to maximize variation in the scores Y2axis must be orthogonal to Y1 and maximize variation in the scores Y1 Y2 X1

  16. Dimension Reduction Proportion of total variance accounted for by the first k components is If the proportion of variance accounted for by the first k principal components is large, we might want to restrict our attention to only these first k components Keep in mind, components are simply linear combinations of the original p measurements Ideally look for meaningful interpretations of our choose kcomponents

  17. PC’s from Standardized Variables We may want to standardize our variables before finding PCs

  18. PC’s from Standardized Variables So the covariance of V equals the correlation of X We can define our PC’s for Z the same way as before….

  19. Compare Standardized/Non-standardized PCs

  20. Estimation In general we do not know what S is- we must estimate if from the sample So what are our estimated principal components?

  21. Sample Properties In general we do not know what S is- estimate it from sample So what are our estimated principal components?

  22. Centering We often center our observations before defining our PCs The centered PCs are found according to:

  23. Example Jolicoeur and Mosimann (1960) conducted a study looking at the relationship between size and shape of painted turtle carapaces. We can develop PC’s for natural log of length, width, and height of female turtles’ carapaces

  24. Example The first PC is: This might be interpreted as an overall size component Shell dimensions large Shell dimensions small Small values y1 Large values y1

  25. Example The second PC is: Emphasizes contrast between length and height of the shell Small values y2 Large values y2

  26. Example The third PC is: Emphasizes contrast between width and length of the shell Small values y3 Large values y3

  27. Example Consider the proportion of variability accounted for by each PC

  28. Example How are the PCs correlated with each of the x’s? Then

  29. Interpretation of PCs Consider data x1, x2, …., xp: PCs are actually projections onto the estimated eigenvectors -1st PC is the one with the largest projection -For data reduction, only use PCA if the eigenvalues vary -If x’s are uncorrelated, we can’t really do data reduction

  30. Choosing Number of PCs Often the goal of PCA is dimension reduction of data Select a limited number of PCs that capture majority of the variability in the data How do we decide how many PCs to include: 1. Scree plot: plot of versus i 2. Select all PCs with (for standardized observations) 3. Choose some proportion of the variance you want to account for

  31. Scree Plots

  32. Choosing Number of PCs Should principal components that only account for a small proportion of variance always be ignored? Not necessarily, they may indicate near perfect colinearities among traits In the turtle example, this is true-very little variation of the variation in shell measurements can be attributed to the 2nd and 3rd components

  33. Large Sample Properties If n is large, there are nice properties we can use

  34. Large Sample Properties Also for our estimated eigenvectors These results assume that X1, X2, …., Xnare N(m, S)

  35. Summary Principal component analysis most useful for dimensionality reduction Can also be used for identifying colinear variables Note, use of PCA in a regression setting is therefore one way to handle multi-colinearity A caveat… principal components can be difficult to interpret and should therefore be used with caution

More Related