Factor analysis
Download
1 / 50

Factor Analysis - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Factor Analysis. Elizabeth Garrett-Mayer, PhD Georgiana Onicescu, ScM Cancer Prevention and Control Statistics Tutorial July 9, 2009. Motivating Example: Cohesion in Dragon Boat paddler cancer survivors.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Factor Analysis' - nasia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Factor analysis

Factor Analysis

Elizabeth Garrett-Mayer, PhD

Georgiana Onicescu, ScM

Cancer Prevention and Control Statistics Tutorial

July 9, 2009


Motivating example cohesion in dragon boat paddler cancer survivors
Motivating Example: Cohesion in Dragon Boat paddler cancer survivors

  • Dragon boat paddling is an ancient Chinese sport that offers a unique blend of factors that could potentially enhance the quality of the lives of cancer survivor participants.

  • Evaluating the efficacy of dragon boating to improve the overall quality of life among cancer survivors has the potential to advance our understanding of factors that influence quality-of-life among cancer survivors.

  • We hypothesize that physical activity conducted within the context of the social support of a dragon boat team contributes significantly to improved overall quality of life above and beyond a standard physical activity program because the collective experience of dragon boating is likely enhanced by team sport factors such as cohesion, teamwork, and the goal of competition.

  • Methods: 134 cancer survivors self-selected to an 8-week dragon boat paddling intervention group or to an organized walking program. Each study arm was comprised of a series of 3 groups of approximately 20-25 participants, with pre- and post-testing to compare quality of life and physical performance outcomes between study arms.


Motivating example cohesion
Motivating Example: Cohesion survivors

  • We have a concept of what “cohesion” is, but we can’t measure it directly.

  • Merriam-Webster:

    • the act or state of sticking together tightly

    • the quality or state of being made one

  • How do we measure it?

  • We cannot simply say “how cohesive is your team?” or “on a scale from 1-10, how do you rate your team cohesion?”

  • We think it combines several elements of “unity” and “team spirit” and perhaps other “factors”


Factor analysis1
Factor Analysis survivors

  • Data reduction tool

  • Removes redundancy or duplication from a set of correlated variables

  • Represents correlated variables with a smaller set of “derived” variables.

  • Factors are formed that are relatively independent of one another.

  • Two types of “variables”:

    • latent variables: factors

    • observed variables



Other examples
Other examples survivors

  • Diet

  • Air pollution

  • Personality

  • Customer satisfaction

  • Depression

  • Quality of Life


Some applications of factor analysis
Some Applications of Factor Analysis survivors

1. Identification of Underlying Factors:

  • clusters variables into homogeneous sets

  • creates new variables (i.e. factors)

  • allows us to gain insight to categories

    2. Screening of Variables:

  • identifies groupings to allow us to select one variable to represent many

  • useful in regression (recall collinearity)

    3. Summary:

  • Allows us to describe many variables using a few factors

    4. Clustering of objects:

  • Helps us to put objects (people) into categories depending on their factor scores


“Perhaps the most widely used (and misused) multivariate survivors

[technique] is factor analysis. Few statisticians are neutral about

this technique. Proponents feel that factor analysis is the

greatest invention since the double bed, while its detractors feel

it is a useless procedure that can be used to support nearly any

desired interpretation of the data. The truth, as is usually the case,

lies somewhere in between. Used properly, factor analysis can

yield much useful information; when applied blindly, without

regard for its limitations, it is about as useful and informative as

Tarot cards. In particular, factor analysis can be used to explore

the data for patterns, confirm our hypotheses, or reduce the

Many variables to a more manageable number.

-- Norman Streiner, PDQ Statistics


Let s work backwards
Let’s work backwards survivors

  • One of the primary goals of factor analysis is often to identify a measurement model for a latent variable

  • This includes

    • identifying the items to include in the model

    • identifying how many ‘factors’ there are in the latent variable

    • identifying which items are “associated” with which factors


Standard result
Standard Result survivors

------------------------------------

Variable | Factor1 Factor2 |

-------------+--------------------+

notenjoy | -0.3118 0.5870 |

notmiss | -0.3498 0.6155 |

desireexceed | -0.1919 0.8381 |

personalpe~m | -0.2269 0.7345 |

importants~l | 0.5682 -0.1748 |

groupunited | 0.8184 -0.1212 |

responsibi~y | 0.9233 -0.1968 |

interact | 0.6238 -0.2227 |

problemshelp | 0.8817 -0.2060 |

notdiscuss | -0.0308 0.4165 |

workharder | -0.1872 0.5647 |

-----------------------------------


How to interpret

------------------------------------ survivors

Variable | Factor1 Factor2 |

-------------+--------------------+

notenjoy | -0.3118 0.5870 |

notmiss | -0.3498 0.6155 |

desireexceed | -0.1919 0.8381 |

personalpe~m | -0.2269 0.7345 |

importants~l | 0.5682 -0.1748 |

groupunited | 0.8184 -0.1212 |

responsibi~y | 0.9233 -0.1968 |

interact | 0.6238 -0.2227 |

problemshelp | 0.8817 -0.2060 |

notdiscuss | -0.0308 0.4165 |

workharder | -0.1872 0.5647 |

-----------------------------------

Loadings: represent correlations between item and factor

High loadings: define a factor

Low loadings: item does not “load” on factor

Easy to skim the loadings

This example:

factor 1 is defined by G5, G6,

G7, G8 G9

factor 2 is defined by G1, G2,

G3, G4, G10, G11

Other things to note:

factors are ‘independent’ (usually)

we need to ‘name’ factors

important to check their face validity.

These factors can now be ‘calculated’ using this model

Each person is assigned a factor score for each factor

Range between -1 to 1

How to interpret?

High loadings are highlighted

in yellow.


How to interpret1

------------------------------------ survivors

Variable | Factor1 Factor2 |

-------------+--------------------+

notenjoy | -0.3118 0.5870 |

notmiss | -0.3498 0.6155 |

desireexceed | -0.1919 0.8381 |

personalpe~m | -0.2269 0.7345 |

importants~l | 0.5682 -0.1748 |

groupunited | 0.8184 -0.1212 |

responsibi~y | 0.9233 -0.1968 |

interact | 0.6238 -0.2227 |

problemshelp | 0.8817 -0.2060 |

notdiscuss | -0.0308 0.4165 |

workharder | -0.1872 0.5647 |

-----------------------------------

Authors may conclude something like:

“We were able to derive two factors from the 11 items. The first factor is defined as “teamwork.” The second factor is defined as “personal competitive nature .” These two factors describe 72% of the variance among the items.”

How to interpret?

High loadings are highlighted

in yellow.


Where did the results come from
Where did the results come from? survivors

Based on the basic “Classical Test Theory Idea”:

For a case with just one factor:

Ideal: X1 = F + e1 var(ej) = var(ek) , j ≠ k

X2 = F + e2

Xm = F + em

Reality: X1 = λ1F + e1 var(ej) ≠ var(ek) , j ≠ k

X2 = λ2F + e2

Xm = λmF + em

(unequal “sensitivity” to change in factor)

(Related to Item Response Theory (IRT))


Multi factor models
Multi-Factor Models survivors

  • Two factor orthogonal model

  • ORTHOGONAL = INDEPENDENT

  • Example: cohesion has two domains

    X1 = λ11F1 + λ12F2 + e1

    X2 = λ21F1 + λ22F2 + e2

    …….

    X11 = λ111F1 + λ112F2 + e11

  • More generally, m factors, n observed variables

    X1 = λ11F1 + λ12F2 +…+ λ1mFm + e1

    X2 = λ21F1 + λ22F2 +…+ λ2mFm + e2

    …….

    Xn = λn1F1 + λn2F2 +…+ λnmFm + en



The factor analysis process
The factor analysis process survivors

  • Multiple steps

  • “Stepwise optimal”

    • many choices to be made!

    • a choice at one step may impact the remaining decisions

    • considerable subjectivity

  • Data exploration is key

  • Strong theoretical model is critical


Steps in exploratory factor analysis
Steps in Exploratory Factor Analysis survivors

(1) Collect and explore data: choose relevant variables.

(2) Determine the number of factors

(3) Estimate the model using predefined number of factors

(4) Rotate and interpret

(5) (a) Decide if changes need to be made (e.g. drop item(s), include item(s))

(b) repeat (3)-(4)

(6) Construct scales and use in further analysis


Data exploration
Data Exploration survivors

  • Histograms

    • normality

    • discreteness

    • outliers

  • Covariance and correlations between variables

    • very high or low correlations?

  • Same scale

  • high = good, low = bad?


Data exploration1
Data exploration survivors


Correlation matrix
Correlation Matrix survivors

. pwcorr notenjoy-workharder

| notenjoy notmiss desire~d person~m import~l groupu~d respon~y

-------------+---------------------------------------------------------------

notenjoy | 1.0000

notmiss | 0.3705 1.0000

desireexceed | 0.2609 0.3987 1.0000

personalpe~m | 0.2552 0.3472 0.5946 1.0000

importants~l | -0.2514 -0.3357 -0.1384 -0.3123 1.0000

groupunited | -0.1732 -0.2460 -0.2384 -0.1359 0.4364 1.0000

responsibi~y | -0.2554 -0.3663 -0.2908 -0.2507 0.4399 0.8016 1.0000

interact | -0.1847 -0.2966 -0.2162 -0.2294 0.4415 0.4251 0.5174

problemshelp | -0.2561 -0.2865 -0.2567 -0.1940 0.4159 0.6498 0.7748

notdiscuss | 0.1610 0.0763 0.2253 0.2193 -0.0242 0.0027 -0.0598

workharder | 0.3482 0.1606 0.3794 0.3848 -0.0010 -0.2765 -0.3083

| interact proble~p notdis~s workha~r

-------------+------------------------------------

interact | 1.0000

problemshelp | 0.5446 1.0000

notdiscuss | -0.0346 -0.0699 1.0000

workharder | -0.1063 -0.2358 0.2660 1.0000



Data matrix
Data Matrix survivors

  • Factor analysis is totally dependent on correlations between variables.

  • Factor analysis summarizes correlation structure

v1……...vk

F1…..Fj

v1……...vk

v1

.

.

.

vk

O1

.

.

.

.

.

.

.

.

On

v1

.

.

.

vk

Factor

Matrix

Correlation

Matrix

Implications for assumptions about X’s?

Data Matrix


Important implications
Important implications survivors

  • Correlation matrix must be valid measure of association

  • Likert scale? i.e. “on a scale of 1 to K?”

  • Consider previous set of plots

  • Is Pearson (linear) correlation a reasonable measure of association?


Correlation for categorical items
Correlation for categorical items survivors

  • Odds ratios? Nope. on the wrong scale.

  • Need measures on scale of -1 to 1, with zero meaning no association

  • Solutions:

    • tetrachoric correlation: for binary items

    • polychoric correlation: for ordinal items

  • -’choric corelations

    • assume that variables are truncated versions of continuous variables

    • only appropriate if ‘continuous underlying’ assumption makes sense

  • Not available in many software packages for factor analysis!


Polychoric correlation matrix
Polychoric Correlation Matrix survivors

Polychoric correlation matrix

notenjoy notmiss desireexceed

notenjoy 1

notmiss .64411349 1

desireexceed .44814752 .60971951 1

personalperform .37687346 .49572253 .74640077

importantsocial -.33466689 -.35262233 -.18773414

groupunited -.26640575 -.25987331 -.32414348

responsibility -.38218019 -.43174724 -.34289848

interact -.31300025 -.41147172 -.28711931

problemshelp -.40864072 -.44688816 -.34338549

notdiscuss .28367782 .2071563 .33714715

workharder .49864257 .26866894 .50117974

personalperform importantsocial groupunited

personalperform 1

importantsocial -.42902852 1

groupunited -.22011768 .47698468 1

responsibility -.32272048 .49187407 .85603168

interact -.37003374 .51150655 .46469124

problemshelp -.31435615 .51458893 .75552992

notdiscuss .28191066 -.07289447 -.0934676

workharder .4766736 .02547056 -.35603256

responsibility interact problemshelp

responsibility 1

interact .59252523 1

problemshelp .84727982 .60910395 1

notdiscuss -.11548039 -.09653691 -.11580359

workharder -.37311526 -.13316066 -.30122735

notdiscuss workharder

notdiscuss 1

workharder .3471915 1

.


Polychoric correlation in stata
Polychoric Correlation in Stata survivors

. findit polychoric

. polychoric notenjoy-workharder

. matrix R = r(R)


Choosing number of factors
Choosing Number of Factors survivors

  • Intuitively: The number of uncorrelated constructs that are jointly measured by the X’s.

  • Only useful if number of factors is less than number of X’s (recall “data reduction”).

    Use “principal components” to help decide

    • type of factor analysis

    • number of factors is equivalent to number of variables

    • each factor is a weighted combination of the input variables:

      F1 = a11X1 + a12X2 + ….

    • Recall: For a factor analysis, generally,

      X1 = a11F1 + a12F2 +...


Eigenvalues
Eigenvalues survivors

  • To select how many factors to use, consider eigenvalues from a principal components analysis

  • Two interpretations:

    • eigenvalue  equivalent number of variables which the factor represents

    • eigenvalue  amount of variance in the data described by the factor.

  • Rules to go by:

    • number of eigenvalues > 1

    • scree plot

    • % variance explained

    • comprehensibility

  • Note: sum of eigenvalues is equal to the number of items


Cohesion example
Cohesion Example survivors

. factormat R, pcf n(134)

(obs=134)

Factor analysis/correlation Number of obs = 134

Method: principal-component factors Retained factors = 3

Rotation: (unrotated) Number of params = 30

--------------------------------------------------------------------------

Factor | Eigenvalue Difference Proportion Cumulative

-------------+------------------------------------------------------------

Factor1 | 4.96356 3.14606 0.4512 0.4512

Factor2 | 1.81751 0.76378 0.1652 0.6165

Factor3 | 1.05373 0.27749 0.0958 0.7123

Factor4 | 0.77624 0.02065 0.0706 0.7828

Factor5 | 0.75559 0.22587 0.0687 0.8515

Factor6 | 0.52972 0.05654 0.0482 0.8997

Factor7 | 0.47318 0.24670 0.0430 0.9427

Factor8 | 0.22647 0.02484 0.0206 0.9633

Factor9 | 0.20163 0.07341 0.0183 0.9816

Factor10 | 0.12822 0.05407 0.0117 0.9933

Factor11 | 0.07415 . 0.0067 1.0000

--------------------------------------------------------------------------


Scree plot for cohesion example
Scree Plot for Cohesion Example survivors

. screeplot


Choose two factors now fit the model
Choose two factors: Now fit the model survivors

. factormat R, n(134) ipffactor(2)

(obs=134)

Factor analysis/correlation Number of obs = 134

Method: iterated principal factors Retained factors = 2

Rotation: (unrotated) Number of params = 21

.........

Factor loadings (pattern matrix) and unique variances

-------------------------------------------------

Variable | Factor1 Factor2 | Uniqueness

-------------+--------------------+--------------

notenjoy | -0.6091 0.2661 | 0.5582

notmiss | -0.6566 0.2648 | 0.4988

desireexceed | -0.6712 0.5373 | 0.2608

personalpe~m | -0.6342 0.4344 | 0.4091

importants~l | 0.5538 0.2162 | 0.6466

groupunited | 0.7164 0.4137 | 0.3156

responsibi~y | 0.8456 0.4197 | 0.1088

interact | 0.6271 0.2132 | 0.5613

problemshelp | 0.8187 0.3866 | 0.1802

notdiscuss | -0.2830 0.3072 | 0.8256

workharder | -0.4977 0.3260 | 0.6461

-------------------------------------------------


Interpretability
Interpretability? survivors

  • Not interpretable at this stage

  • In an unrotated solution, the first factor describes most of variability.

  • Ideally we want to

    • spread variability more evenly among factors.

    • make factors interpretable

  • To do this we “rotate” factors:

    • redefine factors such that loadings on various factors tend to be very high (-1 or 1) or very low (0)

    • intuitively, it makes sharper distinctions in the meanings of the factors

    • We use “factor analysis” for rotation NOT principal components!

  • Rotation does NOT improve fit!


Rotating factors intuitively
Rotating Factors (Intuitively) survivors

F2

F2

2

3

3

2

1

1

F1

4

4

F1

  • Factor 1 Factor 2

  • x1 0.5 0.5

  • x2 0.8 0.8

  • x3 -0.7 0.7

  • x4 -0.5 -0.5

    • Factor 1 Factor 2

  • x1 0 0.6

  • x2 0 0.9

  • x3 -0.9 0

  • x4 0 -0.9


  • Rotated solution
    Rotated Solution survivors

    . rotate

    Factor analysis/correlation Number of obs = 134

    Method: iterated principal factors Retained factors = 2

    Rotation: orthogonal varimax (Kaiser off) Number of params = 21

    --------------------------------------------------------------------------

    Factor | Variance Difference Proportion Cumulative

    -------------+------------------------------------------------------------

    Factor1 | 3.35544 0.72180 0.5603 0.5603

    Factor2 | 2.63364 . 0.4397 1.0000

    --------------------------------------------------------------------------

    LR test: independent vs. saturated: chi2(55) = 959.26 Prob>chi2 = 0.0000

    Rotated factor loadings (pattern matrix) and unique variances

    -------------------------------------------------

    Variable | Factor1 Factor2 | Uniqueness

    -------------+--------------------+--------------

    notenjoy | -0.3118 0.5870 | 0.5582

    notmiss | -0.3498 0.6155 | 0.4988

    desireexceed | -0.1919 0.8381 | 0.2608

    personalpe~m | -0.2269 0.7345 | 0.4091

    importants~l | 0.5682 -0.1748 | 0.6466

    groupunited | 0.8184 -0.1212 | 0.3156

    responsibi~y | 0.9233 -0.1968 | 0.1088

    interact | 0.6238 -0.2227 | 0.5613

    problemshelp | 0.8817 -0.2060 | 0.1802

    notdiscuss | -0.0308 0.4165 | 0.8256

    workharder | -0.1872 0.5647 | 0.6461

    -------------------------------------------------


    Rotation options
    Rotation options survivors

    • “Orthogonal”

      • maintains independence of factors

      • more commonly seen

      • usually at least one option

      • Stata: varimax, quartimax, equamax, parsimax, etc.

    • “Oblique”

      • allows dependence of factors

      • make distinctions sharper (loadings closer to 0’s and 1’s

      • can be harder to interpret once you lose independence of factors


    Uniqueness
    Uniqueness survivors

    • Should all items be retained?

    • Uniquess for each item describes the proportion of the item described by the factor model

    • Recall an R-squared:

      • proportion of variance in Y explained by X

    • 1-Uniqueness:

      • proportion of the variance in Xk explained by F1, F2, etc.

    • Uniqueness:

      • represents what is left over that is not explained by factors

      • “error” that remainese

    • A GOOD item has a LOW uniqueness


    Our current model
    Our current model? survivors

    Rotated factor loadings (pattern matrix) and unique variances

    -------------------------------------------------

    Variable | Factor1 Factor2 | Uniqueness

    -------------+--------------------+--------------

    notenjoy | -0.3118 0.5870 | 0.5582

    notmiss | -0.3498 0.6155 | 0.4988

    desireexceed | -0.1919 0.8381 | 0.2608

    personalpe~m | -0.2269 0.7345 | 0.4091

    importants~l | 0.5682 -0.1748 | 0.6466

    groupunited | 0.8184 -0.1212 | 0.3156

    responsibi~y | 0.9233 -0.1968 | 0.1088

    interact | 0.6238 -0.2227 | 0.5613

    problemshelp | 0.8817 -0.2060 | 0.1802

    notdiscuss | -0.0308 0.4165 | 0.8256

    workharder | -0.1872 0.5647 | 0.6461

    -------------------------------------------------


    Revised without notdiscuss
    Revised without “ survivorsnotdiscuss”

    Rotated factor loadings (pattern matrix) and unique variances

    -------------------------------------------------

    Variable | Factor1 Factor2 | Uniqueness

    -------------+--------------------+--------------

    notenjoy | -0.3093 0.5811 | 0.5667

    notmiss | -0.3345 0.6455 | 0.4715

    desireexceed | -0.1783 0.8483 | 0.2486

    personalpe~m | -0.2119 0.7551 | 0.3849

    importants~l | 0.5618 -0.2057 | 0.6420

    groupunited | 0.8265 -0.1271 | 0.3008

    responsibi~y | 0.9247 -0.2089 | 0.1012

    interact | 0.6160 -0.2469 | 0.5596

    problemshelp | 0.8784 -0.2224 | 0.1789

    workharder | -0.2023 0.5271 | 0.6813

    -------------------------------------------------


    Methods for estimating model
    Methods for Estimating Model survivors

    • Principal Components (already discussed)

    • Principal Factor Method

    • Iterated Principal Factor / Least Squares

    • Maximum Likelihood (ML)

    • Most common(?): ML and Least Squares

    • Unfortunately, default is often not the best approach!

    • Caution! ipf and ml may not converge to the right answer! Look for uniqueness of 0 or 1. Problem of “identifiability” or getting “stuck.”


    Interpretation
    Interpretation survivors

    • Naming of Factors

    • Wrong Interpretation: Factors represent separate groups of people.

    • Right Interpretation: Each factor represents a continuum along which people vary (and dimensions are orthogonal if orthogonal)


    Factor scores and scales
    Factor Scores and Scales survivors

    • Each object (e.g. each cancer survivor) gets a factor score for each factor.

    • Old data vs. New data

    • The factors themselves are variables

    • An individual’s score is weighted combination of scores on input variables

    • These weights are NOT the factor loadings!

    • Loadings and weights determined simultaneously so that there is no correlation between resulting factors.


    Factor scoring
    Factor Scoring survivors

    . predict f1 f2

    (regression scoring assumed)

    Scoring coefficients

    (method = regression; based on varimax rotated factors)

    ----------------------------------

    Variable | Factor1 Factor2

    -------------+--------------------

    notenjoy | -0.03322 0.19223

    notmiss | 0.04725 0.13279

    desireexceed | 0.15817 0.54996

    personalpe~m | -0.04037 0.21452

    importants~l | 0.02971 -0.02168

    groupunited | 0.12273 0.12938

    responsibi~y | 0.60379 0.07719

    interact | 0.04594 -0.00870

    problemshelp | 0.31516 0.06376

    workharder | 0.11750 0.10810

    ----------------------------------

    Why different than loadings?

    Factors are generally

    scaled to have

    variance 1.

    Mean is arbitrary.

    * If based on Pearson correlation

    mean will be zero.



    Teamwork factor 1 by program
    Teamwork (Factor 1) by Program survivors

    Dragon Boat

    Walking



    Criticisms of factor analysis
    Criticisms of Factor Analysis survivors

    • Labels of factors can be arbitrary or lack scientific basis

    • Derived factors often very obvious

      • defense: but we get a quantification

    • “Garbage in, garbage out”

      • really a criticism of input variables

      • factor analysis reorganizes input matrix

    • Too many steps that could affect results

    • Too complicated

    • Correlation matrix is often poor measure of association of input variables.


    Our example
    Our example? survivors

    • Preliminary analysis of pilot data!

    • Concern: negative items “hang together”, positive items “hang together:

    • Is separation into two factors:

      • based on two different factors (teamwork, pers. comp. nature)

      • based on negative versus positive items?

    • Recall: the computer will always give you something!

    • Validity?

      • boxplots of factor 1 suggest something

      • additional reliability and validity needs to be considered


    Stata code
    Stata Code survivors

    pwcorr notenjoy-workharder

    polychoric notenjoy-workharder

    matrix R = r(R)

    factormat R, pcf n(134)

    screeplot

    factormat R, n(134) ipf factor(2)

    rotate

    polychoric notenjoy notmiss desire personal important group respon interact problem workharder

    matrix R = r(R)

    factormat R, n(134) ipf factor(2)

    rotate

    predict f1 f2

    scatter f1 f2

    graph box f1, by(progrm)

    graph box f2, by(progrm)


    Stata code for pearson correlation
    Stata Code for Pearson Correlation survivors

    factor notenjoy-workharder, pcf

    screeplot

    factor notenjoy-workharder, ipf factor(2)

    rotate

    factor notenjoy notmiss desire personal important group respon interact problem workharder, ipf factor(2)

    rotate

    predict f1 f2

    scatter f1 f2

    graph box f1, by(progrm)

    graph box f2, by(progrm)


    Stata options
    Stata Options survivors

    • Pearson correlation

      • Use factor for principal components and factor analysis

        • choose estimation approach: ipf, pcf, ml, pf

        • choose to retain n factors: factor(n)

    • Polychoric correlation

      • Use factormat for principal components and factor analysis

        • choose estimation approach: ipf, pcf, ml, pf

        • choose to retain n factors: factor(n)

        • include n(xxx) to describe the sample size

    • Scree Plot: screeplot

    • Rotate: choose rotation type: varimax (default), promax, etc.

    • Create factor variables

      • predict: list as many new variable names as there are retained factors.

      • Example: for 3 retained factors,

        factor teamwork competition hardworks


    ad