1 / 29

Principal Components Analysis

Principal Components Analysis. Hal Whitehead BIOL4062/5062. Principal Components Analysis {PCA}. A. K. A.: latent vectors latent variates principal axes principal factors etc. Principal Components Analysis: Principal purpose:. Reducing dimensionality:

lydia
Download Presentation

Principal Components Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Components Analysis Hal Whitehead BIOL4062/5062

  2. Principal Components Analysis{PCA} • A. K. A.: • latent vectors • latent variates • principal axes • principal factors • etc

  3. Principal Components Analysis:Principal purpose: Reducing dimensionality: large body of data to manageable set

  4. PCA: General methodology From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk

  5. PCA: General methodology From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.

  6. 2nd Principal Component, y2 1st Principal Component, y1 Principal Components Analysis

  7. PCA: General methodology From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk yk's are Principal Components such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.

  8. Principal Components Analysis • Rotates multivariate dataset into a new configuration which is easier to interpret • Purposes • simplify data • look at relationships between variables • look at patterns of units

  9. Principal Components Analysis • Uses: • Correlation matrix, or • Covariance matrix when variables in same units (morphometrics, etc.)

  10. Principal Components Analysis {a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component {a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal component … {ak1,ak2,...,akk} is kth Eigenvector of correlation/covariance matrix, and coefficients of kth principal component

  11. Principal Components Analysis So, principal components are given by: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk xj’s are standardized if correlation matrix is used (mean 0.0, SD 1.0)

  12. Principal Components Analysis Score of ith unit on jth principal component yi,j = aj1xi1 + aj2xi2 + ... + ajkxik

  13. xi2 yi,1 yi,2 xi1 PCA Scores

  14. Principal Components Analysis Amount of variance accounted for by: 1st principal component, λ1, 1st eigenvalue 2nd principal component, λ2, 2nd eigenvalue ... λ1>λ2>λ3> λ4> ... Average λj = 1 (correlation matrix)

  15. λ2 λ1 Principal Components Analysis:Eigenvalues

  16. PCA: Terminology • jth principal component is jth eigenvector of correlation/covariance matrix • coefficients, ajk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components • scores are values of units on components (produced using coefficients) • amount of variance accounted for by component is given by eigenvalue, λj • proportion of variance accounted for by component is given by λj / Σ λj • loading of kth original variable on jth component is given by ajk√λj --correlation between variable and component

  17. How many components to use? • If λj < 1 then component explains less variance than original variable (correlation matrix) • Use 2 components (or 3) for visual ease • Scree diagram:

  18. Principal Components Analysis on: • Covariance Matrix: • Variables must be in same units • Emphasizes variables with most variance • Mean eigenvalue ≠1.0 • Useful in morphometrics, a few other cases • Correlation Matrix: • Variables are standardized (mean 0.0, SD 1.0) • Variables can be in different units • All variables have same impact on analysis • Mean eigenvalue = 1.0

  19. PCA: Potential Problems • Lack of Independence • NO PROBLEM • Lack of Normality • Normality desirable but not essential • Lack of Precision • Precision desirable but not essential • Many Zeroes in Data Matrix • Problem (use Correspondence Analysis)

  20. Variables: Mean cluster size Max. cluster size Mean speed Heading consistency Fluke-up rate Breach rate Lobtail rate Spyhop rate Sidefluke rate Coda rate Creak rate High click rate Data collected: Off Galapagos Islands 1985 and 1987 Units: hours spent following sperm whales 440 hours Hourly records of sperm whale behaviour

  21. Scores plots

  22. Rotations of Principal Components(Exploratory Factor Analysis) • Factors are rotated components • (just rotate a few principal components) • Varimax: tries to maximize variance of squared loadings for each factor (orthogonal): • lines up factors with original variables • improves interpretability of factors • Quartimax: tries to minimize sums of squares of products of loadings (orthogonal)

  23. Variables Murder Rape Robbery Assault Burglary Larceny Autotheft Units: States US Crime Statistics

  24. Crime Statistics Component loadings 1 2 MURDER 0.557 -0.771 RAPE 0.851 -0.139 ROBBERY 0.782 0.055 ASSAULT 0.784 -0.546 BURGLARY 0.881 0.308 LARCENY 0.728 0.480 AUTOTHFT 0.714 0.438

  25. After Varimax Rotation: Crimes against people Crimes against property Crime Statistics: Component Loadings

  26. Crime Statistics: Scores Plot Crimes against people Crimes against property

  27. Procedure for principal components analysis 1. Decide whether to use correlation or covariance matrix 2. Find eigenvectors (components) and eigenvalues (variance accounted for) 3. Decide how many components to use by examining eigenvalues (perhaps using scree diagram) 4. Examine loadings (perhaps vector loading plot) 5. Plot scores 6. Try rotation--go to step 4

More Related