multiple regression
Download
Skip this Video
Download Presentation
Multiple Regression

Loading in 2 Seconds...

play fullscreen
1 / 69

Multiple Regression - PowerPoint PPT Presentation


  • 417 Views
  • Uploaded on

Multiple Regression. Multiple Regression. The test you choose depends on level of measurement: Independent Variable Dependent Variable Test Dichotomous Continuous Independent Samples t-test Dichotomous Nominal Nominal Cross Tabs Dichotomous Dichotomous

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multiple Regression' - jacob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple regression1
Multiple Regression

The test you choose depends on level of measurement:

Independent Variable Dependent Variable Test

Dichotomous Continuous Independent Samples t-test

Dichotomous

Nominal Nominal Cross Tabs

Dichotomous Dichotomous

Nominal Continuous ANOVA

Dichotomous Dichotomous

Continuous Continuous Bivariate Regression/Correlation

Dichotomous

Two or More…

Continuous or Dichotomous Continuous Multiple Regression

multiple regression2
Multiple Regression
  • Multiple Regression is very popular among sociologists.
    • Most social phenomena have more than one cause.
    • It is very difficult to manipulate just one social variable through experimentation.
    • Sociologists must attempt to model complex social realities to explain them.
multiple regression3
Multiple Regression
  • Multiple Regression allows us to:
    • Use several variables at once to explain the variation in a continuous dependent variable.
    • Isolate the unique effect of one variable on the continuous dependent variable while taking into consideration that other variables are affecting it too.
    • Write a mathematical equation that tells us the overall effects of several variables together and the unique effects of each on a continuous dependent variable.
    • Control for other variables to demonstrate whether bivariate relationships are spurious
multiple regression4
Multiple Regression
  • For example:

A sociologist may be interested in the relationship between Education and Income and Number of Children in a family.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

multiple regression5
Multiple Regression
  • For example:
    • Research Hypothesis: As education of respondents increases, the number of children in families will decline (negative relationship).
    • Research Hypothesis: As family income of respondents increases, the number of children in families will decline (negative relationship).

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

multiple regression6
Multiple Regression
  • For example:
    • Null Hypothesis: There is no relationship between education of respondents and the number of children in families.
    • Null Hypothesis: There is no relationship between family income and the number of children in families.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

multiple regression7
Multiple Regression
  • Bivariate regression is based on fitting a line as close as possible to the plotted coordinates of your data on a two-dimensional graph.
  • Trivariate regression is based on fitting a plane as close as possible to the plotted coordinates of your data on a three-dimensional graph.

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

multiple regression8
Multiple Regression

Y

Plotted coordinates (1 – 10) for Education, Income and Number of Children

0

X2

X1

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6

Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9

Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

multiple regression9
Multiple Regression

Y

What multiple regression does is fit a plane to these coordinates.

0

X2

X1

Case: 1 2 3 4 5 6 7 8 9 10

Children (Y): 2 5 1 9 6 3 0 3 7 7

Education (X1) 12 16 2012 9 18 16 14 9 12

Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3

multiple regression10
Multiple Regression
  • Mathematically, that plane is:

Y = a + b1X1 + b2X2

a = y-intercept, where X’s equal zero

b=coefficient or slope for each variable

For our problem, SPSS says the equation is:

Y = 11.8 - .36X1 - .40X2

Expected # of Children = 11.8 - .36*Educ - .40*Income

multiple regression11
Multiple Regression

57% of the variation in number of children is explained by education and income!

Y = 11.8- .36X1- .40X2

multiple regression12
Multiple Regression

r2

 (Y – Y)2 - (Y – Y)2

 (Y – Y)2

Y = 11.8- .36X1- .40X2

161.518 ÷ 261.76 = .573

multiple regression13
Multiple Regression

So what does our equation tell us?

Y = 11.8 - .36X1 - .40X2

Expected # of Children = 11.8 - .36*Educ - .40*Income

Try “plugging in” some values for your variables.

multiple regression14
Multiple Regression

So what does our equation tell us?

Y = 11.8 - .36X1 - .40X2

Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals: If Income Equals: Then, children equals:

0 0 11.8

10 0 8.2

10 10 4.2

20 10 0.6

20 11 0.2

^

multiple regression15
Multiple Regression

So what does our equation tell us?

Y = 11.8 - .36X1 - .40X2

Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals: If Income Equals: Then, children equals:

1 0 11.44

1 1 11.04

1 5 9.44

1 10 7.44

1 15 5.44

^

multiple regression16
Multiple Regression

So what does our equation tell us?

Y = 11.8 - .36X1 - .40X2

Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals: If Income Equals: Then, children equals:

0 1 11.40

1 1 11.04

5 1 9.60

10 1 7.80

15 1 6.00

^

multiple regression17
Multiple Regression

If graphed, holding one variable constant produces a two-dimensional graph for the other variable.

11.44

Y

Y

11.40

b = -.4

b = -.36

5.44

6.00

0 15

0 15

X2 = Income

X1 = Education

multiple regression18
Multiple Regression
  • An interesting effect of controlling for other variables is “Simpson’s Paradox.”
  • The direction of relationship between two variables can change when you control for another variable.
  • Read A&F pp. 383 – 386

+

Education

Crime Rate

Y = -51.3 + 1.5X

multiple regression19
Multiple Regression
  • “Simpson’s Paradox”

+

Education

Crime Rate

Y = -51.3 + 1.5X1

+

Education

Urbanization (is related to both)

+

Crime Rate

Regression Controlling for Urbanization

-

Education

Crime Rate

Y = 58.9 - .6X1 + .7X2

+

Urbanization

multiple regression20
Multiple Regression

Crime

Original Regression

Looking at each level of urbanization

Rural

Small town

Suburban

City

Education

multiple regression21
Multiple Regression
  • What happens when you have even more variables?
  • The social world is very complex.
  • For example:

A sociologist may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family.

Independent Variables

Education

Family Income

Sex

Gender Attitudes

Dependent Variable

Number of Children

multiple regression22
Multiple Regression
  • Research Hypotheses:
    • As education of respondents increases, the number of children in families will decline (negative relationship).
    • As family income of respondents increases, the number of children in families will decline (negative relationship).
    • As one moves from male to female, the number of children in families will increase (positive relationship).
    • As gender attitudes get more conservative, the number of children in families will increase (positive relationship).

Independent Variables

Education

Family Income

Sex

Gender Attitudes

Dependent Variable

Number of Children

multiple regression23
Multiple Regression
  • Null Hypotheses:
    • There will be no relationship between education of respondents and the number of children in families.
    • There will be no relationship between family income and the number of children in families.
    • There will be no relationship between sex and number of children.
    • There will be no relationship between gender attitudes and number of children.

Independent Variables

Education

Family Income

Sex

Gender Attitudes

Dependent Variable

Number of Children

multiple regression24
Multiple Regression
  • Bivariate regression is based on fitting a line as close as possible to the plotted coordinates of your data on a two-dimensional graph.
  • Trivariate regression is based on fitting a plane as close as possible to the plotted coordinates of your data on a three-dimensional graph.
  • Regression with more than two independent variables is based on fitting a shape to your constellation of data on an multi-dimensional graph.
multiple regression25
Multiple Regression
  • Regression with more than two independent variables is based on fitting a shape to your constellation of data on an multi-dimensional graph.
  • The shape will be placed so that it minimizes the distance (sum of squared errors) from the shape to every data point.
  • The shape is no longer a line, but if you hold all other variables constant, it is linear for each independent variable.
multiple regression26
Multiple Regression

Y

Imagining a graph with four dimensions!

0

X2

X1

multiple regression27
Multiple Regression

For our problem, our equation could be:

Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4

E(Children) =

7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.

multiple regression28
Multiple Regression

So what does our equation tell us?

Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4

E(Children) =

7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.

Education: Income: Sex: Gender Att: Children:

10 5 0 0 2.5

10 5 0 5 3.75

10 10 0 5 1.75

10 5 1 0 3.0

10 5 1 5 4.25

^

multiple regression29
Multiple Regression

Each variable, holding the other variables constant, has a linear, two-dimensional graph of its relationship with the dependent variable.

Here we hold every other variable constant at “zero.”

7.5

Y

Y

7.5

b = -.3

b = -.4

4.5

3.5

0 10

0 10

X2 = Education

X1 = Income

^

Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4

multiple regression30
Multiple Regression

Each variable, holding the other variables constant, has a linear, two-dimensional graph of its relationship with the dependent variable.

Here we hold every other variable constant at “zero.”

8.75

b = .25

Y

Y

8

b = .5

7.5

7.5

0 1

0 5

X3 = Sex

X4 = Gender Attitudes

^

Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4

multiple regression31
Multiple Regression
  • R2
    • TSS – SSE / TSS
      • TSS = Distance from mean to value on Y for each case
      • SSE = Distance from shape to value on Y for each case
    • Can be interpreted the same for multiple regression—joint explanatory value of all of your variables (or “your model”)
    • Can request a change in R2 test from SPSS to see if adding new variables improves the fit of your model
multiple regression32
Multiple Regression
  • R
    • The correlation of your actual Y value and the predicted Y value using your model for each person
  • Adjusted R2
    • Explained variation can never go down when new variables are added to a model.
    • Because R2 can never go down, some statisticians figured out a way to adjust R2 by the number of variables in your model.
    • This is a way of ensuring that your explanatory power is not just a product of throwing in a lot of variables.

Average deviation from the regression shape.

multiple regression33
Multiple Regression

Controlling for other variables means finding how one variable affects the dependent variable at each level of the other variables.

So what if two of your independent variables were highly correlated with each other???

Crime

Education

=Urbanization

multiple regression34
Multiple Regression

So what if two of your independent variables were highly correlated with each other???

(this is the problem called multicollinearity)

How would one have a relationship independent of the other?

As you hold one constant, you in effect hold the other constant!

Each variable would have the same value for the dependent variable at each level, so the partial effect on the dependent variable for each may be 0.

Crime

Education

= Years Studying Math

multiple regression35
Multiple Regression

Some solutions for multicollinearity:

Remove one of the variables

Create a scale out of the two variables (making one variable out of two)

Run separate models with each independent variable

Crime

Education

= Years Studying Math

multiple regression36
Multiple Regression
  • Dummy Variables
  • They are simply dichotomous variables that are entered into regression. They have 0 – 1 coding where 0 = absence of something and 1 = presence of something. E.g., Female (0=M; 1=F) or Southern (0=Non-Southern; 1=Southern).

What are dummy variables?!

multiple regression37
Multiple Regression

Dummy Variables are especially nice because they allow us to use nominal variables in regression.

But YOU said we CAN’Tdo that!

A nominal variable has no rank or order, rendering the numerical coding scheme useless for regression.

multiple regression38
Multiple Regression
  • The way you use nominal variables in regression is by converting them to a series of dummy variables.

Recode into different

Nomimal VariableDummy Variables

Race 1. White

1 = White 0 = Not White; 1 = White

2 = Black 2. Black

3 = Other 0 = Not Black; 1 = Black

3. Other

0 = Not Other; 1 = Other

multiple regression39
Multiple Regression
  • The way you use nominal variables in regression is by converting them to a series of dummy variables.

Recode into different

Nomimal VariableDummy Variables

Religion 1. Catholic

1 = Catholic 0 = Not Catholic; 1 = Catholic

2 = Protestant 2. Protestant

3 = Jewish 0 = Not Prot.; 1 = Protestant

4 = Muslim 3. Jewish

5 = Other Religions 0 = Not Jewish; 1 = Jewish

4. Muslim

0 = Not Muslim; 1 = Muslim

5. Other Religions

0 = Not Other; 1 = Other Relig.

multiple regression40
Multiple Regression
  • When you need to use a nominal variable in regression (like race), just convert it to a series of dummy variables.
  • When you enter the variables into your model, you MUST LEAVE OUT ONE OF THE DUMMIES.

Leave Out OneEnter Rest into Regression

White Black

Other

multiple regression41
Multiple Regression
  • The reason you MUST LEAVE OUT ONE OF THE DUMMIES is that regression is mathematically impossible without an excluded group.
  • If all were in, holding one of them constant would prohibit variation in all the rest.

Leave Out OneEnter Rest into Regression

Catholic Protestant

Jewish

Muslim

Other Religion

multiple regression42
Multiple Regression
  • The regression equations for dummies will look the same.

For Race, with 3 dummies, predicting self-esteem:

Y = a + b1X1 + b2X2

a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white.

b1 = the slope for variable X1, black

b2 = the slope for variable X2, other

multiple regression43
Multiple Regression
  • If our equation were:

For Race, with 3 dummies, predicting self-esteem:

Y = 28 + 5X1 – 2X2

Plugging in values for the dummies tells you each group’s self-esteem average:

White = 28

Black = 33

Other = 26

a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white.

5 = the slope for variable X1, black

-2 = the slope for variable X2, other

When cases’ values for X1 = 0 and X2 = 0, they are white;

when X1 = 1 and X2 = 0, they are black;

when X1 = 0 and X2 = 1, they are other.

multiple regression44
Multiple Regression
  • Dummy variables can be entered into multiple regression along with other dichotomous and continuous variables.
  • For example, you could regress self-esteem on sex, race, and education:

Y = a + b1X1 + b2X2 + b3X3 + b4X4

How would you interpret this?

Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4

X1 = Female

X2 = Black

X3 = Other

X4 = Education

multiple regression45
Multiple Regression

How would you interpret this?

Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4

  • Women’s self-esteem is 4 points lower than men’s.
  • Blacks’ self-esteem is 5 points higher than whites’.
  • Others’ self-esteem is 2 points lower than whites’ and consequently 7 points lower than blacks’.
  • Each year of education improves self-esteem by 0.3 units.

X1 = Female

X2 = Black

X3 = Other

X4 = Education

multiple regression46
Multiple Regression

How would you interpret this?

Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4

Plugging in some select values, we’d get self-esteem for select groups:

  • White males with 10 years of education = 33
  • Black males with 10 years of education = 38
  • Other females with 10 years of education = 27
  • Other females with 16 years of education = 28.8

X1 = Female

X2 = Black

X3 = Other

X4 = Education

multiple regression47
Multiple Regression

How would you interpret this?

Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4

The same regression rules apply. The slopes represent the linear relationship of each independent variable in relation to the dependent while holding all other variables constant.

X1 = Female

X2 = Black

X3 = Other

X4 = Education

Make sure you get into the habit of saying the slope is the effect of an independent variable “while holding everything else constant.”

multiple regression48
Multiple Regression

How would you interpret this?

Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4

The same regression rules apply…

R2 tells you the proportion of variation in your dependent variable that explained by your independent variables

The significance tests tell you whether your null hypotheses are to be rejected or not. If they are rejected, you have a low probability that your sample could have come from a population where the slope equals zero.

X1 = Female

X2 = Black

X3 = Other

X4 = Education

multiple regression49
Multiple Regression

Interactions

Another very important concept in multiple regression is “interaction,” where two variables have a joint effect on the dependent variable. The relationship between X1 and Y is affected by the value each person has on X2.

For example:

Wages (Y) are decreased by being black (X1), and wages (Y) are decreased by being female (X2). However, being a black woman (X1* X2) increases wages relative to being a black man.

multiple regression50
Multiple Regression
  • One models for interactions by creating a new variable that is the cross product of the two variables that may be interacting, and placing this variable into the equation with the original two.
  • Without interaction, male and female slopes create parallel lines, as do black and white.
  • Wages = 28k - 3k*Black - 1k*Female

^

28k

28k

27k

white

men

25k

27k

25k

24k

black

women

24k

Black

0 1

Female

0 1

multiple regression51
Multiple Regression
  • One models for interactions by creating a new variable that is the cross product of the two variables that may be interacting, and placing this variable into the equation with the original two.
  • With interaction, male and female slopes do not have to be parallel, nor do black and white slopes.
  • Wages = 28k - 3k*Black - 1k*Female + 2k*Black*Female

^

28k

28k

27k

white

25k

men

27k

25k

26k

black

26k

women

Black

0 1

Female

0 1

multiple regression52
Multiple Regression
  • Let’s look at another example…
  • Sex and Education may affect Wages as such:

Wages = 20k - 1k*Female + .3k*Education

But there is reason to think that men get a higher payout for education than women.

With the interaction, the equation may be:

Wages = 19k - 1k*F + .4k*Educ - .2k*F*Educ

^

^

multiple regression53
Multiple Regression

With the interaction, the equation may be:

Wages = 19k - 1k*F + .4k*Educ - .2k*F*Educ

30k

men

women

Wages

20k

0 10 20 Education

The results show different slopes for the increase in wages for women and men as education increases.

multiple regression54
Multiple Regression
  • When one suspects that interactions may be occurring in the social world, it is appropriate to test for them.
  • To test for an interaction, enter an “interaction term” into the regression along with the original two variables.
  • If the interaction slope is significant, you have interaction in the population. Report that!
  • If the slope is not significant, remove the interaction term from your model.
multiple regression55
Multiple Regression

Standardized Coefficients

  • Sometimes you want to know whether one variable has a larger impact on your dependent variable than another.
  • If your variables have different units of measure, it is hard to compare their effects.
  • For example, if wages go up one thousand dollars for each year of education, is that a greater effect than if wages go up five hundred dollars for each year increase in age.
multiple regression56
Multiple Regression

Standardized Coefficients

  • So which is better for increasing wages, education or aging?
  • One thing you can do is “standardize” your slopes so that you can compare the standard deviation increase in your dependent variable for each standard deviation increase in your independent variables.
  • You might find that Wages go up 0.3 standard deviations for each standard deviation increase in education, but 0.4 standard deviations for each standard deviation increase in age.
multiple regression57
Multiple Regression

Standardized Coefficients

  • Recall that standardizing regression coefficients is accomplished by the formula: b(Sx/Sy)
  • In the example above, education and income have very comparable effects on number of children.
  • Each lowers the number of children by .4 standard deviations for a standard deviation increase in each, controlling for the other.
multiple regression58
Multiple Regression

Standardized Coefficients

  • One last note of caution...
    • It does not make sense to standardize slopes for dichotomous variables.
    • It makes no sense to refer to standard deviation increases in sex, or in race--these are either 0 or they are 1 only.
multiple regression59
Multiple Regression

Okay, we’re almost

through with regression!

multiple regression60
Multiple Regression

Nested Models

  • “Nested models” refers to starting with a smaller set of independent variables and adding sets of variables in stages.
  • Keeping the models smaller achieves parsimony, simplest explanation.
  • Sometimes it makes sense to see whether adding a new set of variables improves your model’s explanatory power (increases R2).
  • For example, you know that sex, race, education and age affect wages. Would adding self-esteem and self-efficacy help explain wages even better?
multiple regression61
Multiple Regression

Nested Models

Y = a + b1X1 + b2X2 + b3X3 Reduced Model

Y = a + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 Complete Model

  • You should start by seeing whether the coefficients are significant.
  • Another test, to see if they jointly improve your model, is the change in R2 test (which you can request from SPSS)

R2c - R2r/df=#extra slopes in complete

F =

1 - R2c / df=#slopes+1 in complete

Nested Models

multiple regression62
Multiple Regression

Nested Models with Change in R2

Dependent Variable: How often does S attend religious services. Higher values equal more often.

Model 1 Model 2

Female Female

White (W=1) White

Black (B=1) Black

Age Age

Education

multiple regression63
Multiple Regression

Nested Models with Change in R2

Dependent Variable: How often does S attend religious services. Higher values equal more often.

multiple regression64
Multiple Regression

Nested Models with Change in R2

Dependent Variable: How often does S attend religious services. Higher values equal more often.

multiple regression65
Multiple Regression
  • Females attend services more often than males.
  • Blacks attend services more often than whites and others.
  • Older persons attend services more often than younger persons.
  • The more educated a person is, the more often he or she attends religious services.
  • Education adds to the explanatory power of the model.
  • Only five to six percent of the variation in religious service attendance is explained by our models.
multiple regression66
Multiple Regression

Give yourself a hand…

You now understand more statistics that 99% of the population!

You are well-qualified for understanding most sociological research papers.

multiple regression67
Multiple Regression
  • A final note about your papers…
    • You must include sex, race, age, income, and education in your regressions--these are standard sociological controls.
    • Recall that race will need to be recoded into a series of dichotomous variables.
    • Your final paper must include 1) a descriptive statistics table, 2) a correlation table, and 3) a regression table. Some better papers may have other analyses such as independent samples t-tests or ANOVA’s or Chi-squared tests.
    • I encourage you to use nested regression models.
multiple regression68
Multiple Regression
  • A final note about your papers…
    • For income, you should use the variable, “income98.” It is the best measure in the data set for income.
    • You will need to recode income98 to make the categories of income more even (so that your badly coded variable does not affect your interpretation of the relationship between income and you dependent variable.
    • One way to recode is:

1 = $0 - 9,999 5 = $40,000 - 49,999 9 = $90,000 - 109,999

2 = $10,000 - 19,999 6 = $50,000 - 59,999 10 = $110,00 or over

3 = $20,000 - 29,999 7 = $60,000 - 74,999

4 = $30,000 - 39,999 8 = $75,000 - 89,999

ad