Statistics for International Relations Research I

1 / 58

# Statistics for International Relations Research I - PowerPoint PPT Presentation

IHEID - The Graduate Institute Academic year 2010-2011. Statistics for International Relations Research I . Dr. NAI Alessandro, visiting professor. Dec 03, 2010 Lecture 8 : Regression analysis III. Lecture content. Feedback on Assignment VII Logistic regression models: binary

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Statistics for International Relations Research I

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics for International Relations Research I

Dr. NAI Alessandro, visiting professor

Dec 03, 2010

Lecture 8:

Regression analysis III

Lecture content

• Feedback on Assignment VII
• Logistic regression models: binary
• Predicted probabilities
• Logistic regression models: multinomial

Binary logistic regression models [i / xxxviii]

Correlation

Statistical relationship between two scale variables

(see lecture 5)

Regression

Method for model the effect of one or more independent scale variables on a dependent scale variable

Binary logistic regression models [ii / xxxviii]

Two major uses for regression models

Prediction analysis:

Develop a formula for making predictions about the dependent variable based on observed values

Ex: predict GNP for next year

Causal analysis:

Independent variables are regarded as causes of the dependent variable

Ex: uncover the causes for a higher criminality rate

Binary logistic regression models [iii / xxxviii]

Two main types of regression

OLS (Ordinary Least Squares): linear relationship between variables, scale dependent variable

(see lectures 6 and 7)

Logistic regression: curvilinear relationship between variables, dummy (binomial logistic regression) or nominal dependent variable (multinomial logistic regression)

All regression models may be bi- or multivariate

Binary logistic regression models [iv / xxxviii]

Independent variables in (all) regression models may take the following form:

- Scale (optimal measurement level in regressions)

- Ordinal (metrical, or close)

- Binary (0,1)

Nominal variables are allowed (almost) only in logistic regressions

Binary logistic regression models [v / xxxviii]

Why a regression is not efficient with qualitative variables?

Binary logistic regression models [vi / xxxviii]

For scale (and, sometimes, ordinal) dependent variables, OLS estimations are applied

(see lectures 6 and 7)

What if the dependent variable is qualitative?

- For binary DVs: binary logistic regressions

- For nominal DVs: multinomial logistic regressions

Binary logistic regression models [vii / xxxviii]

Let’s take a (bivariate) example:

Is the likelihood of participate in illegal protest activities dependent on citizens’ positioning on the left-right scale?

Working hypothesis: citizens on the left side of the scale are more likely to participate in illegal protest activities

Binary logistic regression models [x / xxxviii]

In our example, the dependent variable is binary, and the independent variable is ordinal (almost scale)

How to answer the working hypothesis?

Through a crosstab?

OLS Regression?

Binary logistic regression models [xii / xxxviii]

The crosstab shows a significant and strong relationship

However:

- The dispersion of individual observation is too high

- The dependent variable is highly skewed

Therefore, the results of the bivariate analysis (and especially the gamma score) are not robust enough

Binary logistic regression models [xv / xxxviii]

The OLS regression shows a significant but quite weak relationship

Furthermore:

- The relationship is clearly not linear

- The postulates of OLS regressions are not met

Therefore, the results of the bivariate analysis are not robust enough

Binary logistic regression models [xvi / xxxviii]

Solution?

Binary logistic regression

Regression model that allows to estimate the occurrence likelihood of a situation (binary dependent variable) through one or more independent variables

Independent variables may take every level of measurement (nominal, ordinal, scale)

Binary logistic regression models [xviii / xxxviii]

A logistic transformation is applied

The (transformed) regression equation may be written as:

y = f(z) = (ez) / (1 + ez)

Where:

z = exposure to a set of explanatory variables

f(z) = probability of a given outcome (y), given that set of explanatory variables

e = constant, base of the natural logarithm = 2.71828...

Binary logistic regression models [xix / xxxviii]

The regression equation becomes:

z = β0 + β1*x1 + β2*x2 + … + βk*xk

Where:

β0 = constant (intercept)

β1 = regression coefficient (“slope”) for the variable x1

β2 = regression coefficient (“slope”) for the variable x2

βk = regression coefficient (“slope”) for the variable xk

Binary logistic regression models [xxi / xxxviii]

SPSS procedure: Analyze / Regression / Binary logistic

Binary logistic regression models [xxii / xxxviii]

Binary logistic regressions always estimate the likelihood for the presence of a phenomenon (coded 1) on its absence (coded 0)

Here, the models will estimate the presence of a “No” (i.e., not likely to take part in illegal protest activities)

Binary logistic regression models [xxiii / xxxviii]

As for OLS regressions, SPSS provides scores on the overall quality of the model for logistic regressions

Nagelkerke’s R square: similar in interpretation as the R2 for OLS regressions

Here, the model predicts 11% of the DV variance, which is quite good with only one independent variable

Binary logistic regression models [xxiv / xxxviii]

Unstandardized coefficients (Bs)

Needed to build the regression equation

Here:

y = f(z) = (ez) / (1 + ez)

z = 1.918 + .54*lrscale

Binary logistic regression models [xxv / xxxviii]

Significance test

Based on Wald scores

Interpretation similar to Chi-square and Fisher tests

If p<.05, the effect is statistically significant

Binary logistic regression models [xxvi / xxxviii]

Logistic regression coefficients (“Log Odds”)

Provide information on the direction and the strength of the IV effect on the probability that the outcome y exists

Binary logistic regression models [xxvii / xxxviii]

Interpretation of the Log Odds

Based on the closeness to 1.0

If Exp(B) = 1.0, no relationship

If 0.0<Exp(B)<1.0, negative relationship

If 1.0<Exp(B), positive relationship

Rule: the closer to 1.0, the weaker the relationship

max

Relationship strenght

min

0

1

Exp(B) value

Binary logistic regression models [xxviii / xxxviii]

Binary logistic regression models [xxix / xxxviii]

In our example

Exp(B) = 1.72*** [*p<.05, **p<.01, ***p<.001]

Therefore, the relationship is significant, positive, and quite strong

Being more on the right on the left-right scale increases the likelihood that no illegal protest activities are done

Working hypothesis confirmed statistically

Binary logistic regression models [xxx / xxxviii]

A multivariate example:

Is the likelihood of participate in illegal protest activities dependent on citizens’ positioning on the left-right scale and the party voted in the last national election?

Binary logistic regression models [xxxi / xxxviii]

Qualitative (categorical) variable!

Binary logistic regression models [xxxvi / xxxviii]

Overall quality of the model

The model explains about 17% of the DV variance (the previous binary model explained about 11%)

Binary logistic regression models [xxxvii / xxxviii]

Variables’ effects

- “lrscale” has a significant, positive, and strong effect

- “partyvoted” has a significant effect (but only for p<.1)

Strength and direction effects for party voted?

Binary logistic regression models [xxxviii / xxxviii]

Qualitative variables’ effects are decomposed into modalities

Only “partyvoted(3)” (social-democrats!) has a significant effect

Interpretation always compared to the reference category (here: “other parties”)

Having votes SD (instead of “other”) strongly deceases the likelihood of participating in illegal protest activities

Predicted probabilities [i / vii]

Interpretation of Log Odds in logistic regression is easy (difference from 1.0), but not always straightforward

A complement: predicted probabilities

Since logistic models are based on the likelihood estimation that an event occurs, thinking about “probabilities” makes sense

Predicted probabilities [ii / vii]

Predicted probabilities are calculated though the following formula:

P(Y=1) = e(β0 + β1*x1 + β2*x2 + … + βk*xk) / (1 + e(β0 + β1*x1 + β2*x2 + … + βk*xk))

But may also be computed through SPSS

Predicted probabilities [iii / vii]

SPSS procedure to compute predprob

Predicted probabilities [v / vii]

A new variable (PRE_1) is created in the SPSS database

The variable shows, for each individual, the probability (0-100%) to having 1 on the dependent variable (in our example, not participating in illegal protest activities) under the control of the IV of the model (lrscale and partyvoted)

Multinomial logistic regression models [i / x]

If the dependent variable is binary, regression models will assume the binary logistic form

What if the dependent variable is qualitative but not binary?

What for nominal dependent variables?

Multinomial logistic regressions

Multinomial logistic regression models [ii / x]

Main logic:

A binary logistic regression model is computed for each modality of the dependent variable (one modality being the reference category)

Results represent the individuals’ likelihood of being in that modality instead than in the reference modality

Coefficients are to be interpreted as in binary logistic regressions

Multinomial logistic regression models [iii / x]

Example:

Explain the party voted in last national elections

Four independent variables:

- Education level

- Trust in institutions

- Positioning on left-right scale

- Domicile (big city, town, village, …)

Multinomial logistic regression models [v / x]

SPSS procedure: Analyze / Regression / Multinomial logistic

Categorical (qualitative) independent variables: nominal, binary, ordinal non-metrical

Qualitative independent variables: scale or ordinal metrical

Multinomial logistic regression models [vi / x]

Choose a reference category

Here we choose the first modality (“Swiss People’s Party”) as reference category.

All results will be interpreted in opposition to this category

Note: always choose a reference category that makes sense conceptually/theoretically

Multinomial logistic regression models [vii / x]

Overall quality of the model

As for binary logistic regression, SPSS compute a score measuring the overall quality of the model in terms of explained variance

Here, 42.3% of variance explained; very good model

Multinomial logistic regression models [viii / x]

Significance levels: Wald test, check if p<.05

Regression coefficients: Exp(B) (Log Odds), as in binary reg

Here, having a higher trust in institutions has a significant, positive but quite weak effect on the Chris-dem vote (Swiss People’s Party being the reference category)

Multinomial logistic regression models [ix / x]

Here, all components of the model have a significant effect

Being more educated has a strong effect on the Soc-dem vote (SPP reference!), whereas being more on the right of the left-right scale has a negative effect

Concerning the domicile, living in a big city (domicil=1) has a very strong effect on the Soc-dem vote

Any questions?