Principal Components Analysis with SPSS. Karl L. Wuensch Dept of Psychology East Carolina University. When to Use PCA. You have a set of p continuous variables. You want to repackage their variance into m components. You will usually want m to be < p , but not always.
Karl L. Wuensch
Dept of Psychology
East Carolina University
Discriminant function analysis.
Problem with multicollinearity.
Used PCA to extract eight orthogonal components.
Predicted recommended verdict from these 8 components.
Transformed results back to the original scales.
FACTBEER.SAV at http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm .
Analyze, Data Reduction, Factor.
Scoot beer variables into box.
Click Descriptives and then check Initial Solution, Coefficients, KMO and Bartlett’s Test of Sphericity, and Anti-image. Click Continue.
Click Extraction and then select Principal Components, Correlation Matrix, Unrotated Factor Solution, Scree Plot, and Eigenvalues Over 1. Click Continue.
Click Options. Select Exclude Cases Continue. Listwise and Sorted By Size. Click Continue.
Click OK, and SPSS completes the Principal Components Analysis.
cost size alcohol reputat color aroma taste
cost 1.00 .832 .767 -.406 .018 -.046 -.064
size .832 1.00 .904 -.392 .179 .098 .026
alcohol .767 .904 1.00 -.463 .072 .044 .012
reputat -.406 -.392 -.463 1.00 -.372 -.443 -.443
color .018 .179 .072 -.372 1.00 .909 .903
aroma -.046 .098 .044 -.443 .909 1.00 .870
taste -.064 .026 .012 -.443 .903 .870 1.00
a. Measures of Sampling Adequacy (MSA) on main diagonal. Off diagonal are partial correlations x -1.
Visual Aid: Use a Scree Plot Continue.
Scree is rubble at base of cliff.
For our beer data,
Big drop in eigenvalue between component 2 and component 3.
Components 3-7 are scree.
Try a 2 component solution.
Should also look at solution with one fewer and with one more component.
Random 1.Data Eigenvalues
Velicer's 1. Minimum Average Partial (MAP) Test:
Velicer's Average Squared Correlations
The smallest average squared correlation is
The number of components is 2
All variables load well on first component, economy and quality vs. reputation.
Second component is more interesting, economy versus quality.
Rotate these axes so that the two dimensions pass more nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE).
The number of degrees by which I rotate the axes is the angle PSI. For these data, rotating the axes -40.63 degrees has the desired effect.
Component 1 = Quality versus reputation. nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE).
Component 2 = Economy (or cheap drunk) versus reputation.
In this case, first unrotated factor nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE). true factor.
But rotation splits the factor, producing an imaginary second factor and corrupting the first.
Can avoid this problem by including a garbage variable that will be removed prior to the final solution.
If SSL = 1, the component has extracted one variable’s worth of variance.
If only one variable loads well on a component, the component is not well defined.
If only two load well, it may be reliable, if the two variables are highly correlated with one another but not with other variables.
Here are the communalities for our beer data. “Initial” is with all 7 components, “Extraction” is for our 2 component solution.
May better fit the data with axes that are not perpendicular, but at the cost of having components that are correlated with one another.