statistics for international relations research i n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistics for International Relations Research I PowerPoint Presentation
Download Presentation
Statistics for International Relations Research I

Loading in 2 Seconds...

play fullscreen
1 / 49

Statistics for International Relations Research I - PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on

IHEID - The Graduate Institute Academic year 2010-2011. Statistics for International Relations Research I. Dr. NAI Alessandro, visiting professor. Nov. 19, 2010 Lecture 6 : Regression analysis I. Lecture content. Feedback on Assignment V Introduction to OLS regression analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics for International Relations Research I' - loan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
statistics for international relations research i

IHEID - The Graduate Institute

Academic year 2010-2011

Statistics for International Relations Research I

Dr. NAI Alessandro, visiting professor

Nov. 19, 2010

Lecture 6:

Regression analysis I

slide2

Lecture content

  • Feedback on Assignment V
  • Introduction to OLS regression analysis
  • The regression line
  • The multivariate regression model
  • Model adequacy for specific cases
slide3

Introduction to OLS regression analysis [i/x]

Inferential statistics

Main goal: draw conclusions about the existence (likelihood) of relationships among variables based on data subject to random variation

Does A affect B?

Is the individuals’ positioning on B affected by their positioning on A?

slide4

Introduction to OLS regression analysis [ii /x]

Diferent statistical tools for uncover the existence of a causal relationship

slide5

Introduction to OLS regression analysis [iii /x]

Correlation

Statistical relationship between two scale variables

(see lecture 5)

Regression

Method for model the effect of one or more independent scale variables on a dependent scale variable

slide6

Introduction to OLS regression analysis [iv /x]

Two major uses for regression models

Prediction analysis:

Develop a formula for making predictions about the dependent variable based on observed values

Ex: predict GNP for next year

Causal analysis:

Independent variables are regarded as causes of the dependent variable

Ex: uncover the causes for a higher criminality rate

slide7

Introduction to OLS regression analysis [v/x]

Two main types of regression

OLS (Ordinary Least Squares): linear relationship between variables, scale dependent variable

Logistic regression: curvilinear relationship between variables, dummy (binomial logistic regression) or nominal dependent variable (multinomial logistic regression)

(see lecture 8)

All regression models may be bi- or multivariate

slide8

Introduction to OLS regression analysis [vi /x]

Independent variables in (all) regression models may take the following form:

- Scale (optimal measurement level in regressions)

- Ordinal (metrical, or close)

- Binary (0,1)

Nominal variables are allowed (almost) only in logistic regressions

slide9

Introduction to OLS regression analysis [vii /x]

Why a regression is not efficient with qualitative variables?

slide10

Introduction to OLS regression analysis [viii /x]

OLS regressions

Dependent variable is scale

Independent variable(s) may be scale, ordinal (metric) and binary

Estimations based on Ordinary Least Squares

slide11

Introduction to OLS regression analysis [ix/x]

Ordinary Least Squares (OLS)

Method used to get values for the regression coefficients: slope(s) and intercept

Based on the difference between observed and predicted values

Observed values: values in the database for each unit

Predicted values: for the same units, values predicted by the regression model

slide12

Introduction to OLS regression analysis [x/x]

Prediction error

For each unit of observation:

Error = Observed value – Predicted value

The OLS method (on which the regression line is based) proposes a model that makes the sum of the squared predictions errors as small as possible

slide13

The regression line [i/ xiv]

The regression line

Reassumes the relationship between two (scale) variables as being linear

Based on OLS estimations (model that makes the sum of the squared predictions errors as small as possible)

In other terms: the distance between the line and all the observed values is minimized

slide14

The regression line [ii/ xiv]

An intuitive example

Consider a squared wooden board (the Cartesian space) on which we dispose randomly some rocks

The wooden board has no weight

The rocks have the exactly same shape and weight

The regression line will be the line of equilibrium of the board

slide15

The regression line [iii/ xiv]

Woodenboard

Equilibriumpoint

(regressionline)

Rocks

slide16

The regression line [iv /xiv ]

Bivariate OLS regression: an example

Relationship between the % of female politicians in Parliament and the number of years since women had the right to vote

Null hypothesis: no relationship between the two variables

Working hypothesis: the older the female vote right in a given county, the higher the % of female politicians in its Parliament

slide17

Y axis

(%women in Parl.)

X axis

(yearssince vote right to wmn)

The regression line [v/ xiv]

Expected distribution for the verification of the working H

slide19

The regression line [vii / xiv]

Regression line

Working hypothesis confirmed by observation?

More or less

slide20

The regression line [viii / xiv]

  • The regression line reassumes the relationship between two or more (scale) variables
  • In a bivariate relationship, we assign as convention
  • - The independent variable on the X axis (horizontal)
  • The dependent variable on the Y axis (vertical)
  • We therefore say that y is a function of x
  • y=f(x)
  • If we have two independent variables (x and z), we say that
  • y=f(x,z)
slide21

The regression line [ix/ xiv]

The regression line always assumes the following algebraic formula (regression equation):

y = a + b*x + e

y: dependent variable

x: independent variable

a: intercept (value for y where x=0)

b: slope for x

e: residual (not explained linearly)

slide22

a

The regression line [x/ xiv]

y = a + b*x

Δy

b = Δy / Δx

Δx

slide23

The regression line [ix/ xiv]

The slope (b)

Coefficient that links the two variables

Effect of x on y, given that y=f(x)

Changes on y for each change of x

Ex: if b=2, when x increases of 10 units y increases of 20 units (10*2)

Look particularly at:

- The direction

- The strenght

slide24

The regression line [x/ xiv]

Direction of the slope

(interpretation similar as the distribution in crosstabs built on ordinal variables)

Positive relationship

If x increases, so does y

Negative relationship

If x increases, y decreases

slide25

The regression line [xi/ xiv]

Strength of the slope

Slope=1

(if x increases of 1 unit, y increases of 1 unit)

Slope=2

(if x increases of 1 unit, y increases of 2 units)

Slope=0.5

(if x increases of 1 unit, y increases of 0.5 units)

slide26

The regression line [xii/ xiv]

SPSS procedure: Analyze / Regression / Linear

slide27

The regression line [xiii/ xiv]

Back to our example

(H: the older the female vote right in a given county, the higher the % of female politicians in its Parliament)

y = a + b*x

y = 3.58 + 0.17*x

slide28

The regression line [xiv/ xiv]

General quality of the model

R: strength of the relationship (Pearson’s r)

R square: explanatory power of the model (% of explained variance, here 15.3%)

Standard error of the estimate (Se): mean prediction error

slide29

The multivariate regression model [i/xiii]

Bivariate linear regression

Method for model the effect of one independent scale variable (x) on a dependent scale variable (y)

y = f(x)

Multivariate linear regression

Method for model the effect of two or more independent scale variables (x, z, …) on a dependent scale variable (y)

y = f(x,z,…)

“Explanatory model”

slide30

Protein supply

+

Illiteracy rate

-

The multivariate regression model [ii/xiii]

Example: Explain life expectancy for a country

Working hypothesis: life expectancy for a country is positively influenced by the daily supply of proteins and negatively on the illiteracy rate

The model:

Life expectancy

slide34

The multivariate regression model [vi/xiii]

As for the bivariate models, multivariate models may be reassumed though a regression equation

y = a + b1*x + b2*z + … + e

y: dependent variable

x,z: independent variables

a: intercept (value for y where x=0)

b1: slope for x

b2: slope for z

e: residual (not explained linearly)

slide35

The multivariate regression model [vii/xiii]

SPSS procedure: Analyze / Regression / Linear

slide36

The multivariate regression model [viii/xiii]

The equation is: y = 50.59 - 0.27*x + 0.29*z

If the adult illiteracy rate (x) increases of 1%, the life expectancy decreases of 0.27 years

If the daily per capita supply of proteins increases of 1 grams, the life expectancy increases of 0.29 years

slide37

The multivariate regression model [ix/xiii]

Standardized coefficients (betas)

Useful to asses the contribution of each INDV to the dependent variable

May be compared to each other, contrarily to the non-standardized coefficients (Bs).

Here, x is more important than z in order to explain y

slide38

The multivariate regression model [x/xiii]

Overall quality of the model

R square (% of explained variance) and the standard error of the estimate have to be interpreted as in bivariate models.

In multivariate models, R is almost never taken into account.

Here, very good model!

slide39

The multivariate regression model [xi/xiii]

Problem in multivariate regression models

The logic “observe before, analyze afterwards” (as in analyses though crosstabs, ANOVA, and bivariate regressions) is complicated

Graphical representation of multivariate models is hard to visualize

slide40

The multivariate regression model [xii/xiii]

Example with two scale independent variables

slide42

Model adequacy [i/vii]

Model adequacy for a specific case

Main idea: uncover whether or not a general regression model is optimal to explain the situation in a given unit of observation

Works only with aggregative data!

With individual observation, it simply does not make sense

slide43

Model adequacy [ii/vii]

Main logic

Compare, for a specific case, the observed value (in the database) with the value predicted by the model (regression equation)

If the two values are close, the model adequacy for that specific case is high

If the values are not close, the model is not optimal for explain the situation in that specific case

slide44

Observed value

Δ

Predicted value

Model adequacy [iii/vii]

Lowadequacy

High adequacy

slide45

Model adequacy [iv/vii]

Example of model adequacy:

Relationship between the % of female politicians in Parliament and the number of years since women had the right to vote

Is the model adequacy high for the Swiss case?

slide46

Model adequacy [v/vii]

Regression equation (overall model):

y = 3.58 + 0.17*x

For Switzerland, the observed value for x is 25 (25 years since the right to vote was granted to women)

Predicted value for y (% of female politicians in Parliament):

y = 3.58 + 0.17*x = 3.58 + 0.17*25 = 7.85%

The observed value for y in the Swiss case (actual % of female politicians in Parliament) is 20.3%

slide47

Model adequacy [vi/vii]

The two values are close?

In order to decide, compare with the Standard Error of Estimation (Se) for the overall model

Main logic:

If the difference between observed and predicted values for y is lower than the Se, the model adequacy is high

slide48

Model adequacy [vii/vii]

For the Swiss case:

Predicted value for y = 7.85%

Observed value for y = 20.3%

Difference (Δ) = 20.3 – 7.85 = 12.45

Se for the overall model = 7.6

In this case, Δ> Se (12.45 > 7.6), therefore the model adequacy for Switzerland is low.

slide49

Any questions?

Thank you for your attention!