1 / 26

Factor Analysis and Principal Components

Factor Analysis and Principal Components. Removing Redundancies and Finding Hidden Variables. Two Goals. Measurements are not independent of one another and we need a way to reduce the dimensionality and remove collinearity – Principal components

oswald
Download Presentation

Factor Analysis and Principal Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables

  2. Two Goals • Measurements are not independent of one another and we need a way to reduce the dimensionality and remove collinearity– Principal components • Measurements affected by unobserved, latent factors – we want to estimate those factors – Factor analysis

  3. Principal Components • Qualities we are interested in studying can be measured indirectly • Measurements have redundancy – e.g. multiple measurements reflect size • Measurements reflect more than one property – e.g. size and shape

  4. Steps • Select variables – generally interval or ratio scale variables – dichotomies can also be used • Analysis usually begins with a covariance or correlation matrix of the variables • Principal components are extracted that reflect correlations between variables

  5. Terminology • Eigenvalues – a measure of the variance “explained” by a component • Eigenvectors – dimensions that have been extracted from the correlation matrix – the principal components • Communality – amount of variance for a variable “explained” by a subset of the components

  6. Issues • Need more cases than variables • Sum of the eigenvalues = number of variables or number of cases – 1 whichever is smaller • Principal components are often standardized to a variance of 1. • Each component is independent

  7. Results • Eigenvalues for extracted components and proportion of variance “explained” • Loadings (correlations) between variables and components • Scores for the components for each case

  8. Number of Components • Principal components can be used simply to produce k independent components for k inter-related variables • More commonly, the number of components extracted is limited to a smaller number, e.g. those with eigenvalues>1

  9. Example • Rcmdr Statistics | Dimensional analysis | Principal-components • princomp() and prcomp() in R compute principal components – prcomp() is more stable • Packages psych and ade4 have principal component functions

  10. Handaxes • Collection of 600 handaxes from Furze Platt, Maidenhead, England at the Royal Ontario Museum • Seven dimensional measurements measure shape and size

  11. > .PC <- princomp(~L+L1+T+T1+W+W1+W2, cor=TRUE, data=HandAxes) > unclass(loadings(.PC)) # component loadings Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 L -0.3920231 -0.32304228 0.3538015 -0.33843343 -0.4808967 L1 -0.3315569 0.53860582 0.2426709 -0.47116870 -0.1314518 T -0.3634691 -0.05646815 0.6714878 0.49319567 0.4072064 T1 -0.3630703 0.28215177 -0.2868995 0.62010656 -0.5665372 W -0.4413891 -0.23565830 -0.2611803 -0.15214093 0.1240457 W1 -0.3839257 0.42527974 -0.3223177 -0.11082837 0.4786462 W2 -0.3608806 -0.53511821 -0.3326024 -0.01609245 0.1420839 Comp.6 Comp.7 L 0.50165401 0.139034239 L1 -0.54544728 0.065516026 T -0.06346368 -0.026815813 T1 -0.02654645 0.007598814 W -0.07495719 -0.798293730 W1 0.51900435 0.238953875 W2 -0.41365937 0.530309784

  12. > .PC$sd^2 # component variances Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 4.24372416 1.18476216 0.56766626 0.54851088 0.27589883 0.09909247 Comp.7 0.08034523 > summary(.PC) # proportions of variance Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 2.0600301 1.0884678 0.75343630 0.7406152 Proportion of Variance 0.6062463 0.1692517 0.08109518 0.0783587 Cumulative Proportion 0.6062463 0.7754980 0.85659323 0.9349519 Comp.5 Comp.6 Comp.7 Standard deviation 0.52526073 0.31478956 0.28345235 Proportion of Variance 0.03941412 0.01415607 0.01147789 Cumulative Proportion 0.97436604 0.98852211 1.00000000 > biplot(.PC, cex=c(.5, 1)) > scatterplotMatrix(~PC1+PC2+PC3+PC4, reg.line=FALSE, + smooth=FALSE, spread=FALSE, span=0.5, diagonal = 'density', +data=HandAxes, pch=20)

  13. Factor Analysis • We are interested in studying something that cannot be directly observed • We can, however, observe variables which are affected by the unobserved factors • Correlations between observed variables are assumed to reflect the unobserved factors

  14. Steps • Select variables as with principal components • Analysis usually begins with a correlation matrix of the variables • Communality estimates defined • Extract one or more factors • Rotate factors for interpretability

  15. Terminology • Eigenvalues, Eigenvectors, and Communality • Communality relates to common variance in the variable as opposed to the unique variance:

  16. Issues • Need more cases than variables • Sum of the eigenvalues = number of variables or number of cases – 1 whichever is smaller • Factors are often standardized to a variance of 1. • Each factor is independent if no rotation or orthogonal rotation is used

  17. Results • Eigenvalues for extracted components and proportion of variance “explained” • Loadings (correlations) between variables and factors • Factor rotation results • Factor scores for each case

  18. Number of Factors • Default choice is usually to select the factors with eigenvalues > 1 – these factors explain the equivalent variance of at least one original variable • Scree plots can be used to select more or fewer factors

  19. Rotation • Rotation is used to make the factors more interpretable • Rotation tries to create variables with very high or very low loadings • Orthogonal rotation preserves the independence of the factors • Oblique rotation produces correlated factors

  20. Interpretation • Interpretability is not a test that the factors are “real” • Factors are interpreted using information about the variables that load highly on them • Interpretations should be evaluated against other information

  21. Example • In Rcmdr use Statistics | Dimensional analysis | Factor Analysis • factanal() or fa() in psych

  22. > .FA <- factanal(~L+L1+T+T1+W+W1+W2, factors=2, + rotation="varimax", scores="regression", data=HandAxes) > .FA Call: factanal(x = ~L + L1 + T + T1 + W + W1 + W2, factors = 2, data = HandAxes, scores = "regression", rotation = "varimax") Uniquenesses: L L1 T T1 W W1 W2 0.369 0.191 0.594 0.515 0.081 0.240 0.064 Loadings: Factor1 Factor2 L 0.707 0.362 L1 0.895 T 0.470 0.430 T1 0.374 0.587 W 0.850 0.444 W1 0.337 0.804 W2 0.965

  23. Factor1 Factor2 SS loadings 2.637 2.309 Proportion Var 0.377 0.330 Cumulative Var 0.377 0.707 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 371.35 on 8 degrees of freedom. The p-value is 2.5e-75 > scatterplot(F2~F1, reg.line=FALSE, smooth=FALSE, spread=FALSE, boxplots=FALSE, span=0.5, data=HandAxes)

More Related