Statistics for Business and Economics. Chapter 11 Multiple Regression and Model Building. Learning Objectives. Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Chapter 11 Multiple Regression and Model Building
1 Explanatory
2+ Explanatory
Variable
Variables
Multiple
Simple
Non-
Non-
Linear
Linear
Linear
Linear
Types of Regression Models1 Explanatory
2+ Explanatory
Variable
Variables
Multiple
Simple
Non-
Non-
Linear
Linear
Linear
Linear
Types of Regression ModelsVariable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Types of Regression ModelsRelationship between 1 dependent and 2 or more independent variables is a linear function
Population Y-intercept
Population slopes
Random error
Dependent (response) variable
Independent (explanatory) variables
E(y) = 1 + 2x1 + 3(3) = 10 + 2x1
E(y) = 1 + 2x1 + 3(2) = 7 + 2x1
E(y) = 1 + 2x1 + 3(1) = 4 + 2x1
E(y) = 1 + 2x1 + 3(0) = 1 + 2x1
No InteractionE(y) = 1 + 2x1 + 3x2
E(y)
12
8
4
0
x1
0
0.5
1
1.5
Effect (slope) of x1 on E(y) does not depend on x2 value
^
^
^
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters.
You’ve collected the following data:
(y) (x1) (x2)RespSizeCirc
1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 0.0640 0.2599 0.246 0.8214
ADSIZE 1 0.2049 0.0588 3.656 0.0399
CIRC 1 0.2805 0.0686 4.089 0.0264
^
0
^
^
1
2
Parameter Estimation Computer Output^
For a model with k independent variables
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find SSE, s2, and s.
Source DF SS MS F PRegression 2 9.249736 4.624868 55.44 .0043 Residual Error 3 .250264 .083421Total 5 9.5
SSE
S2
Analysis of Variance Computer OutputYou work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find R2 and Ra2.
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find a 95% confidence interval for β1.
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α = .05.
2> 0
.05
6 - 3 = 3
Reject H0
.05
t
0
2.353
Hypothesis Test SolutionTest Statistic:
Decision:
Conclusion:
2> 0
.05
6 - 3 = 3
Reject H0
.05
t
0
2.353
Hypothesis Test SolutionTest Statistic:
Decision:
Conclusion:
Reject at = .05
There is evidence the mean ad response increases as circulation increases
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α = .05.
At least 1 not zero
.05
23
= .05
F
0
9.55
Testing Overall Significance SolutionTest Statistic:
Decision:
Conclusion:
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000
k
Testing Overall SignificanceComputer OutputMS(Model)
n – (k + 1)
MS(Error)
At least 1 not zero
.05
23
= .05
F
0
9.55
Testing Overall Significance SolutionTest Statistic:
Decision:
Conclusion:
Reject at = .05
There is evidence at least 1 of the coefficients is not zero
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000
P-Value
Testing Overall SignificanceComputer Output SolutionMS(Model) MS(Error)
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Types of Regression ModelsContains two-way cross product terms
Interaction Model With 2 Independent VariablesGiven:
E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1
E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1
Interaction Model RelationshipsE(y) = 1 + 2x1 + 3x2 + 4x1x2
E(y)
12
8
4
x1
0
0
0.5
1
1.5
Effect (slope) of x1 on E(y) depends on x2 value
Case, i
yi
x1i
x2i
x1i x2i
1
1
1
3
3
2
4
8
5
40
3
1
3
2
6
4
3
5
6
30
:
:
:
:
:
Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05.
yi
x1i
x2i
x1i x2i
1
1
2
2
4
8
8
64
1
3
1
3
3
5
7
35
2
6
4
24
4
10
6
60
Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2
3 ≠ 0
.05
6 - 2 = 4
Reject H0
Reject H0
.025
.025
t
-2.776
0
2.776
Interaction Test SolutionTest Statistic:
Decision:
Conclusion:
3 ≠ 0
.05
6 - 2 = 4
Reject H0
Reject H0
.025
.025
t
-2.776
0
2.776
Interaction Test SolutionTest Statistic:
Decision:
Conclusion:
t = 1.8528
Do no reject at = .05
There is no evidence of interaction
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Types of Regression ModelsLinear effect
Second-Order Model With 1 Independent Variable2
Case, i
yi
xi
xi
1
1
1
1
2
4
8
64
3
1
3
9
4
3
5
25
:
:
:
:
Create x2 column. Run regression with y, x, x2.
Errors (y) Weeks (x) 20 1 18 1 16 2 10 4 8 4 4 5 3 6 1 8 2 10 1 11 0 12 1 12
The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.
2
yi
xi
xi
1
1
20
1
1
18
2
4
16
4
16
10
:
:
:
Create x2 column. Run regression with y, x, x2.
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Types of Regression Models2
2
Case, i
yi
x1i
x2i
x1ix2i
x1i
x2i
1
1
1
3
3
1
9
2
4
8
5
40
64
25
3
1
3
2
6
9
4
4
3
5
6
30
25
36
:
:
:
:
:
:
:
Multiply x1by x2 to get x1x2; then create x12, x22. Run regression with y, x1, x2 , x1x2, x12, x22.
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Types of Regression ModelsCase, i
yi
x1i
x2i
1
1
1
1
2
4
8
0
3
1
3
1
4
3
5
1
:
:
:
:
x2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x1, x2
Male ( x2 = 0 ):
Female ( x2 = 1 ):
Interpreting Dummy-Variable Model EquationGiven:
y = Starting salary of college graduates
x1 = GPA
0 if Male
x2 =
1 if Female
Male ( x2 = 0 ):
Female ( x2 = 1 ):
Dummy-Variable Model ExampleComputer Output:
0 if Male
x2 =
1 if Female
A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous
Use Theory Only!
Use Computer Search!
^
e
e
x
x
Residual Plot for Equal VarianceUnequal Variance
Correct Specification
Fan-shaped.Standardized residuals used typically.
e
x
Residual Plot for IndependenceNot Independent
Correct Specification
^
e
x
Plots reflect sequence data were collected.
Obs SALES Value Residual Residual -2-1-0 1 2
1 1.0000 0.6000 0.4000 1.044 | |** |
2 1.0000 1.3000 -0.3000 -0.592 | *| |
3 2.0000 2.0000 0 0.000 | | |
4 2.0000 2.7000 -0.7000 -1.382 | **| |
5 4.0000 3.4000 0.6000 1.567 | |*** |
Residual Analysis Computer OutputPlot of standardized (student) residuals