Principal components

1 / 12

# Principal components - PowerPoint PPT Presentation

Principal components. Model and concept No Y’s, no model. Partition, with maximum separation, of total variance into orthogonal components. Assumptions and screening Geometry of concept Procedure and analysis Potential problems. Assumptions for PCA (1). Normality.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Principal components' - elvis-cline

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Principal components
• Model and concept
• No Y’s, no model.
• Partition, with maximum separation, of total variance into orthogonal components.
• Assumptions and screening
• Geometry of concept
• Procedure and analysis
• Potential problems

AGR206

Assumptions for PCA (1)
• Normality.
• Not required, but enhances results.
• Multivariate normality can be tested by squared Mahalanobis distance.
• D2~c2 (p) df=number of variables.
• Extremely sensitive; use low a.
• Linearity.
• Assumed.
• Can be inspected by scatterplots.
• If violated, use transformation.

AGR206

Assumptions of PCA (2)
• Minimum number of cases.
• Must be very large for fuzzy variables, and to yield general results.
• T&F say 300; agricultural and biological papers published with 30 or more.
• Outliers.
• Very important to fix!
• Outliers violate MV normality.
• Identify by jackknifed squared Mahalanobis distance.
• Transform or delete; fully document.

AGR206

Analysis-SAS
• proc princomp data=spartina out=spartpc;var h2s sal eh7 ph acid p k ca mg na mn zn cu nh4;run;
• Output:
• simple statistics by variable
• Correlation matrix
• Eigenvalues
• Eigenvectors
• Output file content:
• same as original + all PC scores.

AGR206

Eigenvalues
• Eigenvalues are the variances of each PC.
• They add up to total variance.
• When based on correlation matrix (i.e. standardized variables) their sum equals the number of variables.
• Their values can be used to identify the degree of collinearity and where is comes from.
• Condition index or number (CN).
• Square root of the largest l divided by l for the PC being considered.
• CN>30 => problem for MLR (not PCA).

AGR206

Spartina example

Eigenvalues

Eigenvalues of the Correlation Matrix

Eigenvalue Difference Proportion Cumulative

PRIN1 4.92391 1.22868 0.351708 0.35171

PRIN2 3.69523 2.08810 0.263945 0.61565

PRIN3 1.60713 0.27222 0.114795 0.73045

PRIN4 1.33490 0.64330 0.095350 0.82580

PRIN5 0.69160 0.19103 0.049400 0.87520

PRIN6 0.50057 0.11513 0.035755 0.91095

PRIN7 0.38544 0.00467 0.027531 0.93848

PRIN8 0.38077 0.21480 0.027198 0.96568

PRIN9 0.16597 0.02298 0.011855 0.97754

PRIN10 0.14299 0.05613 0.010214 0.98775

PRIN11 0.08687 0.04158 0.006205 0.99395

PRIN12 0.04529 0.01544 0.003235 0.99719

PRIN13 0.02985 0.02036 0.002132 0.99932

PRIN14 0.00949 . 0.000678 1.00000

AGR206

Eigenvectors
• Coefficients that give the scores for each PC.
• [PC] = Z V
• Where:
• [PC] is the matrix of PC scores;
• Z is the matrix of standardized variables;
• V is the matrix of eigenvectors.

AGR206

Spartina [PC] = Z V

Eigenvectors

PRIN1PRIN2PRIN3PRIN4PRIN5PRIN6PRIN7

H2S -.163637 0.009086 0.231669 0.689722 0.014386 -.419348 0.300094

SAL -.107894 0.017324 0.605727 -.270389 0.508742 0.010076 0.383770

EH7 -.123813 0.225247 0.458251 0.301313 -.166758 0.596651 -.296867

PH -.408217 -.027467 -.282670 0.081726 0.091618 0.191256 0.056897

ACID 0.411680 -.000362 0.204919 -.165831 -.162713 -.024061 0.117085

P 0.273196 -.111277 -.160543 0.199965 0.747115 -.017903 -.336928

K -.033446 0.487887 -.022907 0.043000 -.061998 -.016587 -.067421

CA -.358562 -.180445 -.206595 -.054385 0.206152 0.427579 0.104949

MG 0.079033 0.498653 -.049515 -.036561 0.103793 0.034182 -.044195

NA -.017130 0.470439 0.050575 -.054358 0.239519 -.060440 -.181661

MN 0.277082 -.182164 0.019849 0.483078 0.038899 0.299511 0.124567

ZN 0.404195 0.088823 -.176373 0.150047 -.007768 0.034351 -.072907

CU -.010788 0.391707 -.376740 0.102023 0.063434 0.077993 0.562581

NH4 0.398754 -.025968 -.010607 -.104087 -.005857 0.381686 0.395252

PRIN8PRIN9PRIN10PRIN11PRIN12PRIN13PRIN14

H2S -.073755 0.168302 0.295840 0.222927 -.015407 0.006864 -.079812

SAL 0.100873 -.175066 -.227621 0.088425 -.156210 -.094878 0.089376

EH7 -.312742 -.226136 0.083754 -.023086 0.055421 -.033492 -.023123

PH -.029538 0.023918 0.146959 0.041662 -.331152 0.025938 0.750134

ACID -.152610 0.095416 0.101118 0.344782 0.455459 0.351392 0.477337

P -.398662 0.077828 -.017685 -.034542 0.064822 0.065467 0.014741

K -.115096 0.559085 -.555004 0.217893 -.030301 -.249524 0.072785

CA 0.185889 0.186412 0.073763 0.511310 0.346574 0.079545 -.307040

MG 0.170996 -.011293 0.111582 0.118799 -.397791 0.690127 -.192283

NA 0.449939 0.088170 0.439200 -.216233 0.363391 -.276211 0.143663

MN 0.531706 0.086117 -.361647 -.269913 0.077826 0.172893 0.140813

ZN 0.208525 -.439455 0.014406 0.568635 -.222750 -.396331 0.041311

CU -.277074 -.376706 -.129195 -.192872 0.305087 -.000372 -.043094

NH4 -.145025 0.420100 0.393717 -.130247 -.301510 -.230796 -.117317

AGR206

• Correlation between each variable and each PC.
• For variable Zi (standardized)and PCj
• r = lj0.5 vij
• If based on covariances, use r = vij (sj/si)
• Loadings represent the degree of association between the variable and the PC.

AGR206

Potential problems
• Depend on goal.
• Detection of collinearity.
• Reduction of dimensions.
• Correlation or covariance matrix?
• PC’s hard to interpret.
• All l’s about the same size.
• How many PC’s should be retained?
• Scree plot.
• Retain if l>average.
• Retain as many necessary for 80%.

AGR206