
Introduction to Factor Analysis Bonnie Halpern-Felsher, Ph.D. Megie Okumura, MD, MAS
Road Map • Definition and purpose of factor analysis (example) • Types of factor analysis • Considerations when conducting an Exploratory Factor Analysis (EFA) • Beyond EFA
What is Factor Analysis? A statistical technique used to analyze interrelationships among many variables
Example:Adolescent Invulnerability Scale “A felt sense of invulnerability to injury, harm, and danger.” (Lapsley & Hill, 2010)
Psychological Invulnerability “One’s felt invulnerability to personal or psychological distress.”
Danger Invulnerability “A sense of indestructibility and propensity to take physical risks.”
Psychological Invulnerability Danger Invulnerability Circles = Factors Constructs Latent Variables
Psychological Invulnerability Danger Invulnerability Item 2 Item 4 Item 7 Item 19 Item 1 Item 3 Item 5 Item 20 … … Measured Variables Boxes = Observed Variables Manifest Variables
Psychological Invulnerability Danger Invulnerability Item 2 Item 4 Item 7 Item 19 Item 1 Item 3 Item 5 Item 20 … …
Psychological Invulnerability Danger Invulnerability 0.50 Item 2 Item 4 Item 7 Item 19 Item 1 Item 3 Item 5 Item 20 … … Factor Loading (λ): A measure of the influence of a factor on an observed variable; tells strength and direction of influence
λ2 = percent of variance in the measured variable that is accounted for by the factor Example: 0.6962 = 0.48 = 48% Interpretation: 48% of variance in item is accounted for by Factor 1.
Principles Behind Factor Analytic Theory • Interrelationships between all possible observed variables may be explained by a small number of factors • Given a set of data, we want to determine the number and nature of underlying factors and the pattern of influence those factors have on the observed variables
Types of Factor Analysis • Exploratory Factor Analysis (EFA) • You do not know what factors you will find (although you may have some idea) • Often used in scale development • Confirmatory Factor Analyses (CFA) • You specify which measured variables will load on which factors • This a special case of something called Structural Equation Modeling (SEM)
Item-Level Factor Analysis • Often used to analyze many items that comprise a self-report measurement scale • Is the scale unidimensional or multidimensional? • Does it measure one underlying construct (factor), or… • Does it measure several underlying constructs?
Things to Consider • What statistical software should I use? • Do I use Factor Analysis or Principle Components Analysis? • Have I created my scale items responsibly? • Do my data meet the assumptions of FA? • Which estimation method should I use? • Should I use a rotated solution? What type? • How do I decide how many factors to extract? • Do I have an adequate sample size?
Statistical Software • SPSS • STATA • SAS • AMOS • EQS • LISREL • Different statistical programs label their output differently • You will need to find out how your program labels its output
Comparing FA and PCA (Preacher & MacCallum, 2003)
Issues with Principle Components Analysis • Many people compute a PCA and say it is a FA • This is wrong because FA and PCA are not the same thing • The two methods may give similar results, but not always • Warning: Some programs carry out PCA as the default (SPSS)
Developing Scale Items • Make sure you have enough items • You may have to delete some later • Make sure items are at least face valid, and are based on theory/previous research • Choose your response scale carefully • Ordinal response scales (e.g., Likert scales) can introduce additional analytic concerns
Testing Assumptions of Factor Analysis • No outliers • Interval data • Linearity • Multivariate normality • Homoscedasticity • No perfect multicollinearity
What is Estimation? • Process of using a set of mathematical procedures to estimate a statistical model (“find the solution”) • Also called extraction
Choosing an Estimation Method • There are a variety of available methods • Limited information on relative strengths and weaknesses • Inconsistent names • Vary by statistical software • General recommendation • Maximum Likelihood Estimation for normally distributed data • Principle Factors Method if data are non-normal (Costello & Osborne, 2005)
Rotating Your Solution • Rotation is used to find the most easily interpretable solution • Orthogonal rotation • Forces your factors to be uncorrelated • Several types (e.g., Varimax rotation) • Oblique rotation • Allows your factors to be correlated • Several types (e.g., Promax rotation)
Orthogonal vs. Oblique Rotation • How often are constructs in the behavioral sciences completely unrelated in practice?
r = .50 Psychological Invulnerability Danger Invulnerability Item 2 Item 4 Item 7 Item 19 Item 1 Item 3 Item 5 Item 20 … …
Which Type of Rotation? • Oblique rotation is often the safest bet • If your factors are actually uncorrelated, you will get roughly the same solution as if you used an orthogonal rotation • Rotations are mathematically equivalent and do not affect how well the model fits the data
Which Rotated Solution is Best? • Look at simple structure • Each item loads heavily on one and only one factor
Choosing Number of Factors • Two common methods • K1 Rule • Cattell’sScree Test • Much more accurate method • Parallel Analysis
K1 Rule: # of Factors = # of Eigenvalues > 1 (Eigenvalues represent the total amount of variance explained by a factor) *Not very accurate*
Cattell’sScree Test: Choose the number of factors that precedes the last big drop on the scree plot Can be subjective *Not very accurate*
Parallel Analysis: Number of Factors = Number of points on the Factor Analysis line that are above the Parallel Analysis line *Accurate, but rarely used* STATA, SPSS, SAS
Procedural Options • Determine number of factors based on parallel analysis • Re-run factor analysis with a few more or a few less factors • Compare results of different factor analyses with regard to interpretability, residuals, communalities
Interpretability • Are the factors even interpretable? • Which variables load on which factors? • Do the loadings make sense according to previous research, theory, and common sense?
Looking at Communalities • Communality (h2): percent of variance in a given measured variable that is explained by all of the factors jointly • Implications of low communality • For one item Factor model is not working well for that item; consider deleting item • For several items Items are not very related to each other
Example: Communalities • Get table of communalities in computer output • For “Nothing can harm me” • h2 = .59 • The extracted factors explain 59% of the variation in this item
Evaluating Residual Correlations • Two sets of correlations among items • Correlations predicted by your factor model (reproduced correlations) • Observed correlations • Residual correlations • (Reproduced) – (Observed) • Should be close to zero if your model fits your data well
Sample Size for Factor Analysis • EFA is a large sample procedure • Old rule of thumb: Ratio of cases to items should be at least 10:1 • FA is still prone to error at ratio of 20:1 (Costello & Osborne, 2005)
Sample Size Cont’d • If you have “strong” data, you may get by with a smaller sample size • How to define strong data? • High communalities ( >.80, >.40) • No cross-loadings (λ ≥ .32 on ≥ 2 factors) • Several items loading on each factor (Not < 3 items; Preferably > 5 items) (Costello & Osborne, 2005)
Naming Your Factors • We often assign a “meaningful” label to each factor • e.g., Danger Invulnerability • Beware the naming fallacy! • Just because a factor is named does not mean that the hypothetical construct is understood or even correctly labeled • Beware reification! • The belief that a hypothetical construct must correspond to a real thing (Klein, 2005)
Extensions • Confirmatory Factor Analysis (CFA) • Perhaps you have done an EFA, and now you want to replicate the results of your EFA in a new sample • Structural Equation Modeling (SEM) • You can look at predictive relationships among latent constructs
Confirmatory Factor Analysis Nicotine Dependence Depression MNWS HONC FTND Hamilton CES-D BDI-II
Structural Equation Modeling ? Nicotine Dependence Depression MNWS HONC FTND Hamilton CES-D BDI-II
In Summary… • Definition and purpose of factor analysis (example) • Types of factor analysis • Considerations when conducting an Exploratory Factor Analysis (EFA) • Beyond EFA (CFA/SEM)