Multiple Regression. Multiple Regression. The test you choose depends on level of measurement: Independent Variable Dependent Variable Test Dichotomous Continuous Independent Samples ttest Dichotomous Nominal Nominal Cross Tabs Dichotomous Dichotomous
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
The test you choose depends on level of measurement:
Independent Variable Dependent Variable Test
Dichotomous Continuous Independent Samples ttest
Dichotomous
Nominal Nominal Cross Tabs
Dichotomous Dichotomous
Nominal Continuous ANOVA
Dichotomous Dichotomous
Continuous Continuous Bivariate Regression/Correlation
Dichotomous
Two or More…
Continuous or Dichotomous Continuous Multiple Regression
A sociologist may be interested in the relationship between Education and Income and Number of Children in a family.
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Y
Plotted coordinates (1 – 10) for Education, Income and Number of Children
0
X2
X1
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Y
What multiple regression does is fit a plane to these coordinates.
0
X2
X1
Case: 1 2 3 4 5 6 7 8 9 10
Children (Y): 2 5 1 9 6 3 0 3 7 7
Education (X1) 12 16 2012 9 18 16 14 9 12
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3
Y = a + b1X1 + b2X2
a = yintercept, where X’s equal zero
b=coefficient or slope for each variable
For our problem, SPSS says the equation is:
Y = 11.8  .36X1  .40X2
Expected # of Children = 11.8  .36*Educ  .40*Income
57% of the variation in number of children is explained by education and income!
Y = 11.8 .36X1 .40X2
r2
(Y – Y)2  (Y – Y)2
(Y – Y)2
Y = 11.8 .36X1 .40X2
161.518 ÷ 261.76 = .573
So what does our equation tell us?
Y = 11.8  .36X1  .40X2
Expected # of Children = 11.8  .36*Educ  .40*Income
Try “plugging in” some values for your variables.
So what does our equation tell us?
Y = 11.8  .36X1  .40X2
Expected # of Children = 11.8  .36*Educ  .40*Income
If Education equals: If Income Equals: Then, children equals:
0 0 11.8
10 0 8.2
10 10 4.2
20 10 0.6
20 11 0.2
^
So what does our equation tell us?
Y = 11.8  .36X1  .40X2
Expected # of Children = 11.8  .36*Educ  .40*Income
If Education equals: If Income Equals: Then, children equals:
1 0 11.44
1 1 11.04
1 5 9.44
1 10 7.44
1 15 5.44
^
So what does our equation tell us?
Y = 11.8  .36X1  .40X2
Expected # of Children = 11.8  .36*Educ  .40*Income
If Education equals: If Income Equals: Then, children equals:
0 1 11.40
1 1 11.04
5 1 9.60
10 1 7.80
15 1 6.00
^
If graphed, holding one variable constant produces a twodimensional graph for the other variable.
11.44
Y
Y
11.40
b = .4
b = .36
5.44
6.00
0 15
0 15
X2 = Income
X1 = Education
+
Education
Crime Rate
Y = 51.3 + 1.5X
+
Education
Crime Rate
Y = 51.3 + 1.5X1
+
Education
Urbanization (is related to both)
+
Crime Rate
Regression Controlling for Urbanization

Education
Crime Rate
Y = 58.9  .6X1 + .7X2
+
Urbanization
Crime
Original Regression
Looking at each level of urbanization
Rural
Small town
Suburban
City
Education
A sociologist may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family.
Independent Variables
Education
Family Income
Sex
Gender Attitudes
Dependent Variable
Number of Children
Independent Variables
Education
Family Income
Sex
Gender Attitudes
Dependent Variable
Number of Children
Independent Variables
Education
Family Income
Sex
Gender Attitudes
Dependent Variable
Number of Children
For our problem, our equation could be:
Y = 7.5  .30X1  .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5  .30*Educ  .40*Income + 0.5*Sex + 0.25*Gender Att.
So what does our equation tell us?
Y = 7.5  .30X1  .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5  .30*Educ  .40*Income + 0.5*Sex + 0.25*Gender Att.
Education: Income: Sex: Gender Att: Children:
10 5 0 0 2.5
10 5 0 5 3.75
10 10 0 5 1.75
10 5 1 0 3.0
10 5 1 5 4.25
^
Each variable, holding the other variables constant, has a linear, twodimensional graph of its relationship with the dependent variable.
Here we hold every other variable constant at “zero.”
7.5
Y
Y
7.5
b = .3
b = .4
4.5
3.5
0 10
0 10
X2 = Education
X1 = Income
^
Y = 7.5  .30X1  .40X2 + 0.5X3 + 0.25X4
Each variable, holding the other variables constant, has a linear, twodimensional graph of its relationship with the dependent variable.
Here we hold every other variable constant at “zero.”
8.75
b = .25
Y
Y
8
b = .5
7.5
7.5
0 1
0 5
X3 = Sex
X4 = Gender Attitudes
^
Y = 7.5  .30X1  .40X2 + 0.5X3 + 0.25X4
Average deviation from the regression shape.
Controlling for other variables means finding how one variable affects the dependent variable at each level of the other variables.
So what if two of your independent variables were highly correlated with each other???
Crime
Education
=Urbanization
So what if two of your independent variables were highly correlated with each other???
(this is the problem called multicollinearity)
How would one have a relationship independent of the other?
As you hold one constant, you in effect hold the other constant!
Each variable would have the same value for the dependent variable at each level, so the partial effect on the dependent variable for each may be 0.
Crime
Education
= Years Studying Math
Some solutions for multicollinearity:
Remove one of the variables
Create a scale out of the two variables (making one variable out of two)
Run separate models with each independent variable
Crime
Education
= Years Studying Math
What are dummy variables?!
Dummy Variables are especially nice because they allow us to use nominal variables in regression.
But YOU said we CAN’Tdo that!
A nominal variable has no rank or order, rendering the numerical coding scheme useless for regression.
Recode into different
Nomimal VariableDummy Variables
Race 1. White
1 = White 0 = Not White; 1 = White
2 = Black 2. Black
3 = Other 0 = Not Black; 1 = Black
3. Other
0 = Not Other; 1 = Other
Recode into different
Nomimal VariableDummy Variables
Religion 1. Catholic
1 = Catholic 0 = Not Catholic; 1 = Catholic
2 = Protestant 2. Protestant
3 = Jewish 0 = Not Prot.; 1 = Protestant
4 = Muslim 3. Jewish
5 = Other Religions 0 = Not Jewish; 1 = Jewish
4. Muslim
0 = Not Muslim; 1 = Muslim
5. Other Religions
0 = Not Other; 1 = Other Relig.
Leave Out OneEnter Rest into Regression
White Black
Other
Leave Out OneEnter Rest into Regression
Catholic Protestant
Jewish
Muslim
Other Religion
For Race, with 3 dummies, predicting selfesteem:
Y = a + b1X1 + b2X2
a = the yintercept, which in this case is the predicted value of selfesteem for the excluded group, white.
b1 = the slope for variable X1, black
b2 = the slope for variable X2, other
For Race, with 3 dummies, predicting selfesteem:
Y = 28 + 5X1 – 2X2
Plugging in values for the dummies tells you each group’s selfesteem average:
White = 28
Black = 33
Other = 26
a = the yintercept, which in this case is the predicted value of selfesteem for the excluded group, white.
5 = the slope for variable X1, black
2 = the slope for variable X2, other
When cases’ values for X1 = 0 and X2 = 0, they are white;
when X1 = 1 and X2 = 0, they are black;
when X1 = 0 and X2 = 1, they are other.
Y = a + b1X1 + b2X2 + b3X3 + b4X4
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
X1 = Female
X2 = Black
X3 = Other
X4 = Education
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
X1 = Female
X2 = Black
X3 = Other
X4 = Education
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
Plugging in some select values, we’d get selfesteem for select groups:
X1 = Female
X2 = Black
X3 = Other
X4 = Education
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
The same regression rules apply. The slopes represent the linear relationship of each independent variable in relation to the dependent while holding all other variables constant.
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Make sure you get into the habit of saying the slope is the effect of an independent variable “while holding everything else constant.”
How would you interpret this?
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
The same regression rules apply…
R2 tells you the proportion of variation in your dependent variable that explained by your independent variables
The significance tests tell you whether your null hypotheses are to be rejected or not. If they are rejected, you have a low probability that your sample could have come from a population where the slope equals zero.
X1 = Female
X2 = Black
X3 = Other
X4 = Education
Interactions
Another very important concept in multiple regression is “interaction,” where two variables have a joint effect on the dependent variable. The relationship between X1 and Y is affected by the value each person has on X2.
For example:
Wages (Y) are decreased by being black (X1), and wages (Y) are decreased by being female (X2). However, being a black woman (X1* X2) increases wages relative to being a black man.
^
28k
28k
27k
white
men
25k
27k
25k
24k
black
women
24k
Black
0 1
Female
0 1
^
28k
28k
27k
white
25k
men
27k
25k
26k
black
26k
women
Black
0 1
Female
0 1
Wages = 20k  1k*Female + .3k*Education
But there is reason to think that men get a higher payout for education than women.
With the interaction, the equation may be:
Wages = 19k  1k*F + .4k*Educ  .2k*F*Educ
^
^
With the interaction, the equation may be:
Wages = 19k  1k*F + .4k*Educ  .2k*F*Educ
30k
men
women
Wages
20k
0 10 20 Education
The results show different slopes for the increase in wages for women and men as education increases.
Standardized Coefficients
Standardized Coefficients
Standardized Coefficients
Standardized Coefficients
Nested Models
Nested Models
Y = a + b1X1 + b2X2 + b3X3 Reduced Model
Y = a + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 Complete Model
R2c  R2r/df=#extra slopes in complete
F =
1  R2c / df=#slopes+1 in complete
Nested Models
Nested Models with Change in R2
Dependent Variable: How often does S attend religious services. Higher values equal more often.
Model 1 Model 2
Female Female
White (W=1) White
Black (B=1) Black
Age Age
Education
Nested Models with Change in R2
Dependent Variable: How often does S attend religious services. Higher values equal more often.
Nested Models with Change in R2
Dependent Variable: How often does S attend religious services. Higher values equal more often.
Give yourself a hand…
You now understand more statistics that 99% of the population!
You are wellqualified for understanding most sociological research papers.
1 = $0  9,999 5 = $40,000  49,999 9 = $90,000  109,999
2 = $10,000  19,999 6 = $50,000  59,999 10 = $110,00 or over
3 = $20,000  29,999 7 = $60,000  74,999
4 = $30,000  39,999 8 = $75,000  89,999