1 / 52

# Statistics for International Relations Research I - PowerPoint PPT Presentation

IHEID - The Graduate Institute Academic year 2010-2011. Statistics for International Relations Research I . Dr. NAI Alessandro, visiting professor. Dec 10, 2010 Lecture 9 : Factor analysis. Lecture content. Feedback on Assignment VIII Main logic of factor analysis

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Statistics for International Relations Research I

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Statistics for International Relations Research I

Dr. NAI Alessandro, visiting professor

Dec 10, 2010

Lecture 9:

Factor analysis

Lecture content

• Feedback on Assignment VIII

• Main logic of factor analysis

• Procedure and interpretation of results

• A final example

Main logic [i / xiii]

A few steps back…

… to the issue of computing scales

In survey data, variables are measured though questions, precise and clear

Ideal for knowing and exploring behavior, decisions, ideas

Problematic for knowing and exploring structural phenomena, such as attitudes

Main logic [ii / xiii]

• Attitude

• Characteristic strongly anchored in individuals

• Stable predisposition

• Examples:

• - social alienation

• materialism

• intelligence

• Very difficult to grasp directly though questions!

Main logic [iii / xiii]

Solution:

Measure attitudes though different indicators that are put into perspective

Starting indicators have to be simple to measure

The process from simple indicators trough the measure of a structural phenomenon is called scales computing

i1

i2

i3

in

Main logic [iv / xiii]

Scale computing

attitude

Main logic [v / xiii]

Examples of scales computing

Through the positioning on the left-right scale and the party voted, we may measure…

… the ideology?

Through the results obtained to a series of attitudinal tests, the factual knowledge on some phenomena and the capacity of drawing conclusions on situations, we may measure…

… the intelligence?

Main logic [vi / xiii]

Main logic [vii / xiii]

What are the main postulates behind a scale computing?

The items on which the new variable is built share a common theoretical ground (“conceptual glue”)

The theoretical concept measured though the new variable (“the scale”) is known!

Main logic [viii / xiii]

What if the concept measured is not known?

In other terms, what if we sense that a set of items share a common “conceptual glue”…

… but we cannot define conceptually the phenomenon behind?

Main logic [ix / xiii]

Consider the following variables, measuring individual’s agreement with a series of statements

(“1, agree strongly” to “5, disagree strongly”)

The less government intervenes in economy, the better for the country

Government should reduce differences in income levels

Employees need strong trade unions to protect work conditions/wages

Gays and lesbians should be free to live as they whish

The law should always be obeyed

Ban political parties that whish overthrow democracy

Economic growth always ands up harming environment

Modern science can be relied on to solve environmental problems

Main logic [x / xiii]

What’s the theoretical concept behind the items?

Difficult to grasp directly, no strong theoretical assumptions

An inductive approach is needed:

Factor analysis

Main logic [xi / xiii]

Factor analysis

"Factor analysis is not designed to test hypothesis or tell you whether one group is significantly different from another. It takes a large set of variables and looks for a way that the data may be 'reduced' or summarized using a smaller set of factors or components" (Pallant 2005)

"Statistical technique for analysing the correlation between a number of variables in order to reduce them to a smaller number of underlying dimensions, called factors, and to determine the correlation of each of the original variables with each factor" (Colman and Pulford 2006)

Main logic [xii / xiii]

In a nutshell

Factor analysis allows to reduce the complexity of a series of indicators/items into a smaller number of variables (factors)

Underlying dimensions are extracted

Attention: all original items should take the very same form (e.g., ordinal variables with a 0-10 range)

Main logic [xiii / xiii]

Procedure and interpretation of results [i / xxix]

Given the following list of indicators…

The less government intervenes in economy, the better for the country

Government should reduce differences in income levels

Employees need strong trade unions to protect work conditions/wages

Gays and lesbians should be free to live as they whish

The law should always be obeyed

Ban political parties that whish overthrow democracy

Economic growth always ands up harming environment

Modern science can be relied on to solve environmental problems

…what are the underlying dimensions?

Procedure and interpretation of results [ii / xxix]

SPSS procedure: Analyze / Dimension reduction / Factor

Procedure and interpretation of results [iii / xxix]

Procedure and interpretation of results [iv / xxix]

Procedure and interpretation of results [v / xxix]

Procedure and interpretation of results [vi / xxix]

Procedure and interpretation of results [vii / xxix]

Varimax (variance maximizing) rotation

Allows to maximize the amount of variance explained by a minimum of variables

It simplifies the interpretation of results

Procedure and interpretation of results [viii / xxix]

Procedure and interpretation of results [ix / xxix]

Procedure and interpretation of results [x / xxix]

Diagnostic test 1: Overall quality of the procedure

(i.e. the factor reduction was successful)

If KMO score > .5, good overall quality

Here, no particular problems

Procedure and interpretation of results [xi / xxix]

Diagnostic test 2: General correlation between variables

(i.e. a “theoretical glue” exists between the items)

If p<.05, the general correlation is significant

Here, no particular problems

Procedure and interpretation of results [xii / xxix]

Dimensions (components) extracted through the procedure

Only dimensions with an Initial Eigenvalue > 1.0 are considered

Here, 3 dimensions extracted

Procedure and interpretation of results [xiii / xxix]

Relative importance of the dimensions extracted

(through % of explained variance)

Here, the first two variables are clearly more important than the third one

Procedure and interpretation of results [xiv / xxix]

Visualisation of components extracted

Look for the elbow on the graph…

Here, 3 dimensions extracted

Procedure and interpretation of results [xv / xxix]

(standardized -1.0 to 1.0; scores presented in the table only if >.3)

Useful to assess the contribution for each initial item to the new dimensions

Needed to provide a theoretical interpretation of the dimensions extracted

Procedure and interpretation of results [xvi / xxix]

Consider, for each dimension, the strength and direction of the loading score

The higher the score, the higher the (empirical and theoretical) relevance for the dimension

Procedure and interpretation of results [xvii / xxix]

First dimension extracted

ginveco has a negative loafing score (but less strong)

The first dimension (component 1) measures…

Support for Government intervention in economy

Procedure and interpretation of results [xvii / xxix]

Second dimension extracted

The second dimension (component 2) measures…

Support for authoritarian and technocratic governance

Procedure and interpretation of results [xviii / xxix]

Third dimension extracted

The third dimension (component 3) measures…

Support for a more liberal and socially aware society (?)

Procedure and interpretation of results [xix / xxix]

Contribution of items to dimensions: visualization

Procedure and interpretation of results [xx / xxix]

Procedure and interpretation of results [xxi / xxix]

Procedure and interpretation of results [xxii / xxix]

In a nutshell

Factor analysis has extracted 3 dimensions

Their theoretical interpretation is based on the loading scores for each initial item on each dimension

Dimension 1: Support for Government intervention in economy

Dimension 2: Support for authoritarian and technocratic governance

Dimension 3: Support for a more liberal and socially aware society

What are those dimensions?

Simply, variables that can be saved (and used!)

Procedure and interpretation of results [xxiii / xxix]

Procedure and interpretation of results [xxiv / xxix]

New variables are created in the database, one for each dimension extracted through factor analysis

Procedure and interpretation of results [xxv / xxix]

In our example, 3 new variables have been created in the SPSS database

FAC1_1, corresponding to Dimension 1 (Support for Government intervention in economy)

FAC2_1, corresponding to Dimension 2 (Support for authoritarian and technocratic governance)

FAC3_1, corresponding to Dimension 3 (Support for a more liberal and socially aware society)

Attention! Given the coding of the original items (“1, agree strongly” to “5, disagree strongly”) higher scores on a variable (e.g. FAC1_1) signals a lower level of support!

Procedure and interpretation of results [xxvi / xxix]

Proprieties of the new variables (FAC1_1 to FAC3_1)

Scale variables

Centred on 0.0 (mean=0.0)

Perfectly uncorrelated with one each other (which means that the dimensions extracted are perfectly unrelated and mutually independent)

Procedure and interpretation of results [xxvii / xxix]

Procedure and interpretation of results [xxviii / xxix]

Procedure and interpretation of results [xxix / xxix]

The extracted dimensions (new variables FAC1_1 to FAC3_1) may be used in further analyses

Eta = .325***

Eta square = .106

A final example [i / vi]

A final example

Consider the following variables

(“0, no time at all” to “7, more than 3 hours”)

TV watching, total time on average weekday

TV watching, news/ politics/current affairs on average weekday

Radio listening, total time on average weekday

Radio listening, news/ politics/current affairs on average weekday

Newspaper reading, total time on average weekday

Newspaper reading, politics/current affairs on average weekday

A final example [ii / vi]

What are the underlying dimensions for those items?

Information use?

compute info =sum(tvtot, tvpol, rdtot, rdpol, nwsptot, nwsppol).

var lab info "use of different information channels".

GRAPH

/HISTOGRAM(NORMAL)=info.

A final example [iii / vi]

A final example [iv / vi]

But what if there is something more?

Let’s try a Factor analysis

The overall quality seems ok

KMO test is (just) below .5, but the test of Sphericity shows a significant result. We may proceed

A final example [v / vi]

Three dimensions extracted, almost 80% of (cumulative) variance explained!

What are those dimensions?

A final example [vi / vi]

The extracted dimensions clearly show three very distinct typologies of media use

The “Newspapers users”

The “TV users”