530 likes | 535 Views
Object Orie’d Data Analysis, Last Time. Finished NCI 60 Data Linear Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation , easy sol ’ n) Connected Mathematics & Graphics. Class Listserv. Tested on Thursday Evening, 9/8/05
E N D
Object Orie’d Data Analysis, Last Time • Finished NCI 60 Data • Linear Algebra Review • Multivariate Probability Review • PCA as an Optimization Problem (Eigen-decomp. gives rotation, easy sol’n) • Connected Mathematics & Graphics
Class Listserv Tested on Thursday Evening, 9/8/05 If you did not get the email: • Please add yourself to the list • Use Instructions at bottom of Class Web Page: http://www.unc.edu/~marron/UNCstat322-2005/HomePage.html
PCA Redistribution of Energy Convenient summary of amount of structure: Total Sum of Squares Physical Interpetation: Total Energy in Data Insight comes from decomposition Statistical Terminology: ANalysis of VAriance (ANOVA)
PCA Redist’n of Energy (Cont.) ANOVA mean decomposition: Total Variation = = Mean Variation + Mean Residual Variation Mathematics: Pythagorean Theorem Intuition Quantified via Sums of Squares
Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Residuals from Mean = Data – Mean Most of Variation = 92% is Mean Variation SS Remaining Variation = 8% is Resid. Var. SS
PCA Redist’n of Energy (Cont.) Now decompose SS about the mean where: Energy is expressed in trace of covar’ce matrix
PCA Redist’n of Energy (Cont.) • Eigenvalues provide atoms of SS decomposi’n • Useful Plots are: • “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs. • Note PCA gives SS’s for free (as eigenvalues), • but watch factors of
PCA Redist’n of Energy (Cont.) • Note, have already considered some of these Useful Plots: • Power Spectrum • Cumulative Power Spectrum
Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC1 has “most of var’n” = 93% Reflected by good approximation in Object Space
Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC2 has “only a little var’n” = 7% Reflected by poor approximation in Object Space
Different Views of PCA • Solves several optimization problems: • Direction to maximize SS of 1-d proj’d data • Direction to minimize SS of residuals • (same, by Pythagorean Theorem) • “Best fit line” to data in “orthogonal sense” • (vs. regression of Y on X = vertical sense • & regression of X on Y = horizontal sense) • Use one that makes sense…
Different Views of PCA 2-d Toy Example Feature Space Object Space • Max SS of Projected Data • Min SS of Residuals • Best Fit Line
PCA Data Representation Idea: Expand Data Matrix in terms of inner prod’ts & eigenvectors Recall notation: Eigenvalue expansion (centered data):
PCA Data Represent’n (Cont.) • Now using: • Eigenvalue expansion (raw data): • Where: • Entries of are loadings • Entries of are scores
PCA Data Represent’n (Cont.) Can focus on individual data vectors: (part of above full matrix rep’n)
PCA Data Represent’n (Cont.) • Reduced Rank Representation: • Reconstruct using only terms • (assuming decreasing eigenvalues) • Gives: rank approximation of data • Key to PCA data reduction • And PCA for data compression (~ .zip)
PCA Data Represent’n (Cont.) • Choice of in Reduced Rank Represent’n: • Generally very slippery problem • SCREE plot (Kruskal 1964): • Find knee in power spectrum
PCA Data Represent’n (Cont.) • SCREE plot drawbacks: • What is a knee? • What if there are several? • Knees depend on scaling (power? Log?) • Personal suggestion: • Find auxilliary cutoffs (inter-rater variation) • Use the full range (ala scale space)
PCA Simulation • Idea: given • Mean Vector • Eigenvectors • Eigenvalues • Simulate data from corresponding Normal Distribution • Approach: Invert PCA Data Represent’n • where
Alternate PCA Computation Issue: for HDLSS data (recall ) • may be quite large, • Thus slow to work with, and to compute • What about a shortcut? Approach: Singular Value Decomposition (of Data Matrix )
Alternate PCA Computation Singular Value Decomposition: Where: is unitary is unitary is diag’l matrix of singular val’s Assume: decreasing singular values
Alternate PCA Computation Singular Value Decomposition: Recall Relation to Eigen-analysis of Thus have same eigenvector matrix And eigenval’s are squares of singular val’s
Alternate PCA Computation Singular Value Decomposition, Computational advantage: Use compact form, only need to find e-vec’s e-val’s scores Other components not useful So can be much faster for
Alternate PCA Computation Another Variation: Dual PCA Motivation: Recall for demography data, Useful to view as both Rows as Data & Columns as Data
Alternate PCA Computation Useful terminology (from optimization): Primal PCA problem: Columns as Data Dual PCA problem: Rows as Data
Alternate PCA Computation Dual PCA Computation: Same as above, but replace with So can almost replace with Then use SVD, , to get:
Alternate PCA Computation Appears to be cool symmetry: Primal Dual Loadings Scores But, there is a problem with the means…
Primal - Dual PCA Note different “mean vectors”: Primal Mean = Mean of Col. Vec’s: Dual Mean = Mean of Row Vec’s:
Primal - Dual PCA Primal PCA, based on SVD of Primal Data: Dual PCA, based on SVD of Dual Data: Very similar, except: • Different centerings • Different row – column interpretation
Primal - Dual PCA Toy Example 1: Random Curves, all in Primal Space: • * Constant Shift • * Linear • * Quadratic • Cubic (chosen to be orthonormal) • Plus (small) i.i.d. Gaussian noise • d = 40, n = 20
Primal - Dual PCA Toy Example 1: Raw Data
Primal - Dual PCA Toy Example 1: Raw Data • Primal (Col.) curves similar to before • Data mat’x asymmetric (but same curves) • Dual (Row) curves much rougher (showing Gaussian randomness) • How data were generated • Color map useful? (same as mesh view) • See richer structure than before • Is it useful?
Primal - Dual PCA Toy Example 1: Primal PCA Column Curves as Data
Primal - Dual PCA Toy Example 1: Primal PCA • Expected to recover increasing poly’s • But didn’t happen • Although can see the poly’s (order???) • Mean has quad’ic (since only n = 20???) • Scores (proj’ns) very random • Power Spectrum shows 4 components (not affected by subtracting Primal Mean)
Primal - Dual PCA Toy Example 1: Dual PCA Row Curves as Data
Primal - Dual PCA Toy Example 1: Dual PCA • Curves all very wiggly (random noise) • Mean much bigger, 54% of Total Var! • Scores have strong smooth structure (reflecting ordered primal e.v.’s) (recall primal e.v. dual scores) • Power Spectrum shows 3 components (Driven by subtraction Dual Mean) • Primal – Dual mean difference is critical
Primal - Dual PCA Toy Example 1: Dual PCA – Scatterplot
Primal - Dual PCA Toy Example 1: Dual PCA - Scatterplot • Smooth Curve Structure • But not usual curves (Since 1-d curves not quite poly’s) • And only in 1st 3 components • Recall only 3 non-noise components • Since constant curve went into mean (dual) • Remainder is pure noise • Suggests wrong rotation of axes???
Primal - Dual PCA A 3rd Type of Analysis: • Called “SVD decomposition” • Main point: subtract neither mean • Viewed as a serious competitor • Advantage: gives best Mean Square Approximation of Data Matrix • Vs. Primal PCA: best about col. Mean • Vs. Dual PCA: best about row Mean Difference in means is critical!
Primal - Dual PCA Toy Example 1: SVD – Curves view
Primal - Dual PCA Toy Example 1: SVD Curves View • Col. Curves view similar to Primal PCA • Row Curves quite different (from dual): • Former mean, now SV1 • Former PC1, now SV2 • i.e. very similar shapes with shifted indices • Again mean centering is crucial • Main difference between PCAs and SVD
Primal - Dual PCA Toy Example 1: SVD – Mesh-Image View
Primal - Dual PCA Toy Example 1: SVD Mesh-Image View • Think about decomposition into modes of variation • Constant x Gaussian • Linear x Gaussian • Cubic by Gaussian • Quadratic • Shows up best in image view? • Why is ordering “wrong”???
Primal - Dual PCA Toy Example 1: All Primal • Why is SVD mode ordering “wrong”??? • Well, not expected… • Key is need orthogonality • Present in space of column curves • But only approximate in row Gaussians • The implicit orthogonalization of SVD (both rows and columns) gave mixture of the poly’s.
Primal - Dual PCA Toy Example 2: All Primal, GS Noise • Started with same column space • Generated i.i.d. Gaussians for row cols • Then did Graham-Schmidt Ortho-normalization (in row space) Visual impression: Amazingly similar to original data (used same seeds of random # generators)
Primal - Dual PCA Toy Example 2: Raw Data
Primal - Dual PCA Compare with Earlier Toy Example 1
Primal - Dual PCA Toy Example 2: Primal PCA Column Curves as Data Shows Explanation (of wrong components) was correct
Primal - Dual PCA Toy Example 2: Dual PCA Row Curves as Data Still have big mean But Scores look much better
Primal - Dual PCA Toy Example 2: Dual PCA – Scatterplot