1 / 21

Stat240 : P rincipal C omponent A nalysis ( PCA )

Stat240 : P rincipal C omponent A nalysis ( PCA ). Open/closed book examination data. >scores= as.matrix ( read.table ("http://www1.maths.leeds.ac.uk/~charles/mva-data/openclosedbook.dat", head=T)) > colnames (scores) >pairs(scores). MC VC LO NO SO 77 82 67 67 81 63 78 80 70 81

megara
Download Presentation

Stat240 : P rincipal C omponent A nalysis ( PCA )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat240: Principal Component Analysis (PCA)

  2. Open/closed book examination data >scores=as.matrix(read.table("http://www1.maths.leeds.ac.uk/~charles/mva-data/openclosedbook.dat", head=T)) >colnames(scores) >pairs(scores) MC VC LO NO SO 77 82 67 67 81 63 78 80 70 81 75 73 71 66 81 55 72 63 70 68 63 63 65 70 63 53 61 72 64 73 51 67 65 65 68 ... ...

  3. Sample Variance-Covariance > cov.scores=cov(scores) > round(cov.scores,2) MC VC LO NO SO MC 305.77 127.22 101.58 106.27 117.40 VC 127.22 172.84 85.16 94.67 99.01 LO 101.58 85.16 112.89 112.11 121.87 NO 106.27 94.67 112.11 220.38 155.54 SO 117.40 99.01 121.87 155.54 297.76 > eigen.value=eigen(cov.scores)$values > round(eigen.value,2) [1] 686.99 202.11 103.75 84.63 32.15 > eigen.vec=eigen(cov.scores)$vectors > round(eigen.vec,2) [,1] [,2] [,3] [,4] [,5] [1,] -0.51 0.75 -0.30 0.30 -0.08 [2,] -0.37 0.21 0.42 -0.78 -0.19 [3,] -0.35 -0.08 0.15 0.00 0.92 [4,] -0.45 -0.30 0.60 0.52 -0.29 [5,] -0.53 -0.55 -0.60 -0.18 -0.15 variances loadings

  4. Principal Components PC1: PC2: PC3: PC4: PC5:

  5. Scree plot >plot(1:5, eigen.value, xlab="i", ylab="variance", main="scree plot", type="b") > round(cumsum(eigen.value)/sum(eigen.value),3) [1] 0.619 0.801 0.895 0.971 1.000

  6. “princomp” • R has a function to conduct PCA > help(princomp) > obj=princomp(scores) > plot(obj, type="lines") > biplot(obj)

  7. PCA in checking MVN assumption • By examining normality of PCs, especially the first two PCs. • Histograms, q-q plots • Bivariate plots • Checking outliers

  8. PCA in regression • Data: Ynx1, Xnxp • PCA is useful when we want to regress Y on a large number of independent variables (X) • Reduce dimension • Handle collinearity • One would like to transform X to the principal components • How to choose principal components?

  9. PCA in regression • A misconception: retain those with large variances • There is a tendency that PCs with large variances can better explain the dependent variable • But PCs with small variances might also have predictive value • Should consider largest correlation

  10. Factor Analysis (FA)

  11. PCA vs FA • Both attempt to do data reduction • PCA leads to principal components • FA leads to factors PCA FA X1 X2 X3 X4 X1 X2 X3 X4 PC1 … … PC4 F1 F2 F3

  12. FA in R • The function is “factanal” • Example: v1<- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6) v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5) v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6) v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4) v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5) v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4) m1 <- cbind(v1,v2,v3,v4,v5,v6) obj=factanal(m1, factors=2) obj=factanal(covmat=cov(m1), factors=2) plot(obj$loadings,type="n“) text(obj$loadings,labels=c("v1", "v2", "v3", "v4", "v5","v6")) The default method is MLE The default rotation method used by “factanal” is varmax

  13. Example: Examination Scores • P=6: Gaelic, English, History, Arithmetic, Algebra, Geometry • N=220 male students • R=

  14. Factor Rotation • Motivation: get better insights • Varimax criterion • The rotation that maximizes the total variance of squares of (scaled) loadings

More Related