Principal Components Analysis with SPSS. Karl L. Wuensch Dept of Psychology East Carolina University. When to Use PCA. You have a set of p continuous variables. You want to repackage their variance into m components. You will usually want m to be < p , but not always.
Principal Components Analysis with SPSS
Karl L. Wuensch
Dept of Psychology
East Carolina University
Predictor variables = jurors’ scores on 8 scales.
Discriminant function analysis.
Problem with multicollinearity.
Used PCA to extract eight orthogonal components.
Predicted recommended verdict from these 8 components.
Transformed results back to the original scales.
FACTBEER.SAV at http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm .
Analyze, Data Reduction, Factor.
Scoot beer variables into box.
Click Descriptives and then check Initial Solution, Coefficients, KMO and Bartlett’s Test of Sphericity, and Anti-image. Click Continue.
Click Extraction and then select Principal Components, Correlation Matrix, Unrotated Factor Solution, Scree Plot, and Eigenvalues Over 1. Click Continue.
Click Rotation. Select Varimax and Rotated Solution. Click Continue.
Click Options. Select Exclude Cases Listwise and Sorted By Size. Click Continue.
Click OK, and SPSS completes the Principal Components Analysis.
a. Measures of Sampling Adequacy (MSA) on main diagonal. Off diagonal are partial correlations x -1.
Visual Aid: Use a Scree Plot
Scree is rubble at base of cliff.
For our beer data,
Only the first two components have eigenvalues greater than 1.
Big drop in eigenvalue between component 2 and component 3.
Components 3-7 are scree.
Try a 2 component solution.
Should also look at solution with one fewer and with one more component.
Random Data Eigenvalues
Velicer's Minimum Average Partial (MAP) Test:
Velicer's Average Squared Correlations
The smallest average squared correlation is
The number of components is 2
All variables load well on first component, economy and quality vs. reputation.
Second component is more interesting, economy versus quality.
Rotate these axes so that the two dimensions pass more nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE).
The number of degrees by which I rotate the axes is the angle PSI. For these data, rotating the axes -40.63 degrees has the desired effect.
Component 1 = Quality versus reputation.
Component 2 = Economy (or cheap drunk) versus reputation.
In this case, first unrotated factor true factor.
But rotation splits the factor, producing an imaginary second factor and corrupting the first.
Can avoid this problem by including a garbage variable that will be removed prior to the final solution.
If the last component has a small SSL, one should consider dropping it.
If SSL = 1, the component has extracted one variable’s worth of variance.
If only one variable loads well on a component, the component is not well defined.
If only two load well, it may be reliable, if the two variables are highly correlated with one another but not with other variables.
Here are the communalities for our beer data. “Initial” is with all 7 components, “Extraction” is for our 2 component solution.
May better fit the data with axes that are not perpendicular, but at the cost of having components that are correlated with one another.