Statistics for Business and Economics. Chapter 11 Multiple Regression and Model Building. Learning Objectives. Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Statistics for Business and Economics
Chapter 11 Multiple Regression and Model Building
RegressionModels
1 Explanatory
2+ Explanatory
Variable
Variables
Multiple
Simple
Non-
Non-
Linear
Linear
Linear
Linear
Models With Two or More Quantitative Variables
RegressionModels
1 Explanatory
2+ Explanatory
Variable
Variables
Multiple
Simple
Non-
Non-
Linear
Linear
Linear
Linear
Probability Distribution of Random Error
Linear Multiple Regression Model
Explanatory
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Relationship between 1 dependent and 2 or more independent variables is a linear function
Population Y-intercept
Population slopes
Random error
Dependent (response) variable
Independent (explanatory) variables
Bivariate model:
y
(Observed y)
b
Response
e
0
i
Plane
x2
x1
(x1i , x2i)
Bivariate model:
y
(Observed y)
^
b
Response
0
^
e
Plane
i
x2
x1
(x1i , x2i)
E(y) = 1 + 2x1 + 3(3) = 10 + 2x1
E(y) = 1 + 2x1 + 3(2) = 7 + 2x1
E(y) = 1 + 2x1 + 3(1) = 4 + 2x1
E(y) = 1 + 2x1 + 3(0) = 1 + 2x1
E(y) = 1 + 2x1 + 3x2
E(y)
12
8
4
0
x1
0
0.5
1
1.5
Effect (slope) of x1 on E(y) does not depend on x2 value
Parameter Estimation
Case, i
yi
x1i
x2i
1
1
1
3
2
4
8
5
3
1
3
2
4
3
5
6
:
:
:
:
Run regression with y, x1, x2
Too complicated by hand!
Ouch!
^
^
^
^
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters.
You’ve collected the following data:
(y) (x1) (x2)RespSizeCirc
1124881313572644106
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 0.0640 0.2599 0.246 0.8214
ADSIZE 1 0.2049 0.0588 3.656 0.0399
CIRC 1 0.2805 0.0686 4.089 0.0264
^
0
^
^
1
2
^
^
Estimation of σ2
For a model with k independent variables
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find SSE, s2, and s.
Analysis of Variance
Source DF SS MS F PRegression 2 9.249736 4.624868 55.44 .0043 Residual Error 3 .250264 .083421Total 5 9.5
SSE
S2
Evaluating the Model
Variation Measures
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find R2 and Ra2.
R2
Ra2
Testing Parameters
df = n – (k + 1)
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find a 95% confidence interval for β1.
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α = .05.
2 = 0
2> 0
.05
6 - 3 = 3
Reject H0
.05
t
0
2.353
Test Statistic:
Decision:
Conclusion:
2 = 0
2> 0
.05
6 - 3 = 3
Reject H0
.05
t
0
2.353
Test Statistic:
Decision:
Conclusion:
Reject at = .05
There is evidence the mean ad response increases as circulation increases
P–Value
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α = .05.
β1 = β2 = 0
At least 1 not zero
.05
23
= .05
F
0
9.55
Test Statistic:
Decision:
Conclusion:
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000
k
MS(Model)
n – (k + 1)
MS(Error)
β1 = β2 = 0
At least 1 not zero
.05
23
= .05
F
0
9.55
Test Statistic:
Decision:
Conclusion:
Reject at = .05
There is evidence at least 1 of the coefficients is not zero
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 9.2497 4.6249 55.440 0.0043
Error 3 0.2503 0.0834
C Total 5 9.5000
P-Value
MS(Model) MS(Error)
Interaction Models
Explanatory
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Given:
E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1
E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1
E(y) = 1 + 2x1 + 3x2 + 4x1x2
E(y)
12
8
4
x1
0
0
0.5
1
1.5
Effect (slope) of x1 on E(y) depends on x2 value
Case, i
yi
x1i
x2i
x1i x2i
1
1
1
3
3
2
4
8
5
40
3
1
3
2
6
4
3
5
6
30
:
:
:
:
:
Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2
You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05.
yi
x1i
x2i
x1i x2i
1
1
2
2
4
8
8
64
1
3
1
3
3
5
7
35
2
6
4
24
4
10
6
60
Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2
Global F–test indicates at least one parameter is not zero
F
P-Value
3 = 0
3 ≠ 0
.05
6 - 2 = 4
Reject H0
Reject H0
.025
.025
t
-2.776
0
2.776
Test Statistic:
Decision:
Conclusion:
3 = 0
3 ≠ 0
.05
6 - 2 = 4
Reject H0
Reject H0
.025
.025
t
-2.776
0
2.776
Test Statistic:
Decision:
Conclusion:
t = 1.8528
Do no reject at = .05
There is no evidence of interaction
Second–Order Models
Explanatory
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Curvilinear effect
Linear effect
2 > 0
2 > 0
y
y
x1
x1
2 < 0
2 < 0
y
y
x1
x1
2
Case, i
yi
xi
xi
1
1
1
1
2
4
8
64
3
1
3
9
4
3
5
25
:
:
:
:
Create x2 column. Run regression with y, x, x2.
Errors (y) Weeks (x)201 181 16210484453618210111012112
The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.
2
yi
xi
xi
1
1
20
1
1
18
2
4
16
4
16
10
:
:
:
Create x2 column. Run regression with y, x, x2.
Global F–test indicates at least one parameter is not zero
F
P-Value
P-Value
β2 test indicates curvilinear relationship exists
t
Explanatory
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
4 + 5 > 0
4 + 5 < 0
y
y
x2
x1
y
32 > 4 45
x2
x1
x2
x1
2
2
Case, i
yi
x1i
x2i
x1ix2i
x1i
x2i
1
1
1
3
3
1
9
2
4
8
5
40
64
25
3
1
3
2
6
9
4
4
3
5
6
30
25
36
:
:
:
:
:
:
:
Multiply x1by x2 to get x1x2; then create x12, x22. Run regression with y, x1, x2 , x1x2, x12, x22.
Models With One Qualitative Independent Variable
Explanatory
Variable
1
2 or More
1
Quantitative
Quantitative
Qualitative
Variable
Variables
Variable
1st
2nd
3rd
1st
Inter-
2nd
Dummy
Order
Order
Order
Order
Action
Order
Variable
Model
Model
Model
Model
Model
Model
Model
Case, i
yi
x1i
x2i
1
1
1
1
2
4
8
0
3
1
3
1
4
3
5
1
:
:
:
:
x2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x1, x2
Same slopes
Male ( x2 = 0 ):
Female ( x2 = 1 ):
Given:
y = Starting salary of college graduates
x1 = GPA
0 if Male
x2 =
1 if Female
Same slopes
Male ( x2 = 0 ):
Female ( x2 = 1 ):
Computer Output:
0 if Male
x2 =
1 if Female
^
y
Same Slopes 1
Female
^
^
0 + 2
Male
^
0
x1
0
0
Nested Models
Selecting Variables in Model Building
A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous
Use Theory Only!
Use Computer Search!
Residual Analysis
^
^
e
e
x
x
Add x2 Term
Correct Specification
^
^
e
e
x
x
Unequal Variance
Correct Specification
Fan-shaped.Standardized residuals used typically.
^
e
x
Not Independent
Correct Specification
^
e
x
Plots reflect sequence data were collected.
Dep Var Predict Student
Obs SALES Value Residual Residual -2-1-0 1 2
1 1.0000 0.6000 0.4000 1.044 | |** |
2 1.0000 1.3000 -0.3000 -0.592 | *| |
3 2.0000 2.0000 0 0.000 | | |
4 2.0000 2.7000 -0.7000 -1.382 | **| |
5 4.0000 3.4000 0.6000 1.567 | |*** |
Plot of standardized (student) residuals
Regression Pitfalls
y
Interpolation
Extrapolation
Extrapolation
x
Sampled Range