Statistics for business and economics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 111

Statistics for Business and Economics PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Statistics for Business and Economics. Chapter 11 Multiple Regression and Model Building. Learning Objectives. Explain the Linear Multiple Regression Model Describe Inference About Individual Parameters Test Overall Significance Explain Estimation and Prediction

Download Presentation

Statistics for Business and Economics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Statistics for business and economics

Statistics for Business and Economics

Chapter 11 Multiple Regression and Model Building


Learning objectives

Learning Objectives

  • Explain the Linear Multiple Regression Model

  • Describe Inference About Individual Parameters

  • Test Overall Significance

  • Explain Estimation and Prediction

  • Describe Various Types of Models

  • Describe Model Building

  • Explain Residual Analysis

  • Describe Regression Pitfalls


Types of regression models

RegressionModels

1 Explanatory

2+ Explanatory

Variable

Variables

Multiple

Simple

Non-

Non-

Linear

Linear

Linear

Linear

Types of Regression Models


Models with two or more quantitative variables

Models With Two or More Quantitative Variables


Types of regression models1

RegressionModels

1 Explanatory

2+ Explanatory

Variable

Variables

Multiple

Simple

Non-

Non-

Linear

Linear

Linear

Linear

Types of Regression Models


Multiple regression model

Multiple Regression Model

  • General form:

  • k independent variables

  • x1, x2, …, xk may be functions of variables

    • e.g. x2 = (x1)2


Regression modeling steps

Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Probability distribution of random error

Probability Distribution of Random Error


Regression modeling steps1

Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Assumptions for probability distribution of

Assumptions for Probability Distribution of ε

  • Mean is 0

  • Constant variance, σ2

  • Normally Distributed

  • Errors are independent


Linear multiple regression model

Linear Multiple Regression Model


Types of regression models2

Explanatory

Variable

1

2 or More

1

Quantitative

Quantitative

Qualitative

Variable

Variables

Variable

1st

2nd

3rd

1st

Inter-

2nd

Dummy

Order

Order

Order

Order

Action

Order

Variable

Model

Model

Model

Model

Model

Model

Model

Types of Regression Models


Regression modeling steps2

Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


First order multiple regression model

First–Order Multiple Regression Model

Relationship between 1 dependent and 2 or more independent variables is a linear function

Population Y-intercept

Population slopes

Random error

Dependent (response) variable

Independent (explanatory) variables


First order model with 2 independent variables

First-Order Model With 2 Independent Variables

  • Relationship between 1 dependent and 2 independent variables is a linear function

  • Model

  • Assumes no interaction between x1 and x2

    • Effect of x1 on E(y) is the same regardless of x2 values


Population multiple regression model

Population Multiple Regression Model

Bivariate model:

y

(Observed y)

b

Response

e

0

i

Plane

x2

x1

(x1i , x2i)


Sample multiple regression model

Sample Multiple Regression Model

Bivariate model:

y

(Observed y)

^

b

Response

0

^

e

Plane

i

x2

x1

(x1i , x2i)


No interaction

E(y) = 1 + 2x1 + 3(3) = 10 + 2x1

E(y) = 1 + 2x1 + 3(2) = 7 + 2x1

E(y) = 1 + 2x1 + 3(1) = 4 + 2x1

E(y) = 1 + 2x1 + 3(0) = 1 + 2x1

No Interaction

E(y) = 1 + 2x1 + 3x2

E(y)

12

8

4

0

x1

0

0.5

1

1.5

Effect (slope) of x1 on E(y) does not depend on x2 value


Parameter estimation

Parameter Estimation


Regression modeling steps3

Regression Modeling Steps

  • Hypothesize Deterministic Component

  • Estimate Unknown Model Parameters

  • Specify Probability Distribution of Random Error Term

    • Estimate Standard Deviation of Error

  • Evaluate Model

  • Use Model for Prediction & Estimation


First order model worksheet

First-Order Model Worksheet

Case, i

yi

x1i

x2i

1

1

1

3

2

4

8

5

3

1

3

2

4

3

5

6

:

:

:

:

Run regression with y, x1, x2


Multiple linear regression equations

Multiple Linear Regression Equations

Too complicated by hand!

Ouch!


Interpretation of estimated coefficients

^

  • Slope (k)

    • Estimated y changes by k for each 1 unit increase in xkholding all other variables constant

      • Example: if 1 = 2, then sales (y) is expected to increase by 2 for each 1 unit increase in advertising (x1) given the number of sales rep’s (x2)

^

^

^

  • Y-Intercept (0)

    • Average value of y when xk = 0

Interpretation of Estimated Coefficients


1 st order model example

1st Order Model Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters.

You’ve collected the following data:

(y) (x1) (x2)RespSizeCirc

1124881313572644106


Parameter estimation computer output

Parameter Estimates

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 0.0640 0.2599 0.246 0.8214

ADSIZE 1 0.2049 0.0588 3.656 0.0399

CIRC 1 0.2805 0.0686 4.089 0.0264

^

0

^

^

1

2

Parameter Estimation Computer Output


Interpretation of coefficients solution

^

  • Slope (1)

    • Number of responses to ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant

^

  • Slope (2)

    • Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulationholding adsize constant

Interpretation of Coefficients Solution


Estimation of 2

Estimation of σ2


Regression modeling steps4

Regression Modeling Steps

  • Hypothesize Deterministic Component

  • Estimate Unknown Model Parameters

  • Specify Probability Distribution of Random Error Term

    • Estimate Standard Deviation of Error

  • Evaluate Model

  • Use Model for Prediction & Estimation


Estimation of 21

Estimation of σ2

For a model with k independent variables


Calculating s 2 and s example

Calculating s2 and s Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find SSE, s2, and s.


Analysis of variance computer output

Analysis of Variance

Source DF SS MS F PRegression 2 9.249736 4.624868 55.44 .0043 Residual Error 3 .250264 .083421Total 5 9.5

SSE

S2

Analysis of Variance Computer Output


Evaluating the model

Evaluating the Model


Regression modeling steps5

Regression Modeling Steps

  • Hypothesize Deterministic Component

  • Estimate Unknown Model Parameters

  • Specify Probability Distribution of Random Error Term

    • Estimate Standard Deviation of Error

  • Evaluate Model

  • Use Model for Prediction & Estimation


Evaluating multiple regression model steps

Evaluating Multiple Regression Model Steps

  • Examine variation measures

  • Test parameter significance

    • Individual coefficients

    • Overall model

  • Do residual analysis


Variation measures

Variation Measures


Evaluating multiple regression model steps1

Evaluating Multiple Regression Model Steps

  • Examine variation measures

  • Test parameter significance

    • Individual coefficients

    • Overall model

  • Do residual analysis


Multiple coefficient of determination

Multiple Coefficient of Determination

  • Proportion of variation in y ‘explained’ by all x variables taken together

  • Never decreases when new x variable is added to model

    • Only y values determine SSyy

    • Disadvantage when comparing models


Adjusted multiple coefficient of determination

Adjusted Multiple Coefficient of Determination

  • Takes into account n and number of parameters

  • Similar interpretation to R2


Estimation of r 2 and r a 2 example

Estimation of R2 and Ra2 Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find R2 and Ra2.


Excel computer output solution

R2

Ra2

Excel Computer OutputSolution


Testing parameters

Testing Parameters


Evaluating multiple regression model steps2

Evaluating Multiple Regression Model Steps

  • Examine variation measures

  • Test parameter significance

    • Individual coefficients

    • Overall model

  • Do residual analysis


Inference for an individual parameter

df = n – (k + 1)

Inference for an Individual β Parameter

  • Confidence Interval

  • Hypothesis TestHo: βi= 0Ha: βi≠ 0 (or < or > )

  • Test Statistic


Confidence interval example

Confidence Interval Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Find a 95% confidence interval for β1.


Excel computer output solution1

Excel Computer OutputSolution


Confidence interval solution

Confidence IntervalSolution


Hypothesis test example

Hypothesis Test Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α = .05.


Hypothesis test solution

2 = 0

2> 0

.05

6 - 3 = 3

Reject H0

.05

t

0

2.353

Hypothesis Test Solution

  • H0:

  • Ha:

  • 

  • df 

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:


Excel computer output solution2

Excel Computer OutputSolution


Hypothesis test solution1

2 = 0

2> 0

.05

6 - 3 = 3

Reject H0

.05

t

0

2.353

Hypothesis Test Solution

  • H0:

  • Ha:

  • 

  • df 

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at  = .05

There is evidence the mean ad response increases as circulation increases


Excel computer output solution3

Excel Computer OutputSolution

P–Value


Evaluating multiple regression model steps3

Evaluating Multiple Regression Model Steps

  • Examine variation measures

  • Test parameter significance

    • Individual coefficients

    • Overall model

  • Do residual analysis


Testing overall significance

Testing Overall Significance

  • Shows if there is a linear relationship between allx variables together and y

  • Hypotheses

    • H0: 1 = 2 = ... = k = 0

      • No linear relationship

    • Ha: At least one coefficient is not 0

      • At least one x variable affects y


Testing overall significance1

Testing Overall Significance

  • Test Statistic

  • Degrees of Freedom1 = k2 = n – (k + 1)

    • k = Number of independent variables

    • n = Sample size


Testing overall significance example

Testing Overall Significance Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α = .05.


Testing overall significance solution

β1 = β2 = 0

At least 1 not zero

.05

23

 = .05

F

0

9.55

Testing Overall Significance Solution

  • H0:

  • Ha:

  •  =

  • 1 = 2 =

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:


Testing overall significance computer output

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Prob>F

Model 2 9.2497 4.6249 55.440 0.0043

Error 3 0.2503 0.0834

C Total 5 9.5000

k

Testing Overall SignificanceComputer Output

MS(Model)

n – (k + 1)

MS(Error)


Testing overall significance solution1

β1 = β2 = 0

At least 1 not zero

.05

23

 = .05

F

0

9.55

Testing Overall Significance Solution

  • H0:

  • Ha:

  •  =

  • 1 = 2 =

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at  = .05

There is evidence at least 1 of the coefficients is not zero


Testing overall significance computer output solution

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Prob>F

Model 2 9.2497 4.6249 55.440 0.0043

Error 3 0.2503 0.0834

C Total 5 9.5000

P-Value

Testing Overall SignificanceComputer Output Solution

MS(Model) MS(Error)


Interaction models

Interaction Models


Types of regression models3

Explanatory

Variable

1

2 or More

1

Quantitative

Quantitative

Qualitative

Variable

Variables

Variable

1st

2nd

3rd

1st

Inter-

2nd

Dummy

Order

Order

Order

Order

Action

Order

Variable

Model

Model

Model

Model

Model

Model

Model

Types of Regression Models


Interaction model with 2 independent variables

  • Contains two-way cross product terms

Interaction Model With 2 Independent Variables

  • Hypothesizes interaction between pairs of x variables

    • Response to one x variable varies at different levels of another x variable

  • Can be combined with other models

    • Example: dummy-variable model


Effect of interaction

Effect of Interaction

Given:

  • Withoutinteraction term, effect of x1 on y is measured by 1

  • With interaction term, effect of x1 on y is measured by 1 + 3x2

    • Effect increases as x2 increases


Interaction model relationships

E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1

E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1

Interaction Model Relationships

E(y) = 1 + 2x1 + 3x2 + 4x1x2

E(y)

12

8

4

x1

0

0

0.5

1

1.5

Effect (slope) of x1 on E(y) depends on x2 value


Interaction model worksheet

Interaction Model Worksheet

Case, i

yi

x1i

x2i

x1i x2i

1

1

1

3

3

2

4

8

5

40

3

1

3

2

6

4

3

5

6

30

:

:

:

:

:

Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2


Interaction example

Interaction Example

You work in advertising for the New York Times. You want to find the effect of ad size(sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05.


Interaction model worksheet1

Interaction Model Worksheet

yi

x1i

x2i

x1i x2i

1

1

2

2

4

8

8

64

1

3

1

3

3

5

7

35

2

6

4

24

4

10

6

60

Multiply x1by x2 to get x1x2. Run regression with y, x1, x2 , x1x2


Excel computer output solution4

Excel Computer OutputSolution

Global F–test indicates at least one parameter is not zero

F

P-Value


Interaction test solution

3 = 0

3 ≠ 0

.05

6 - 2 = 4

Reject H0

Reject H0

.025

.025

t

-2.776

0

2.776

Interaction Test Solution

  • H0:

  • Ha:

  • 

  • df 

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:


Excel computer output solution5

Excel Computer OutputSolution


Interaction test solution1

3 = 0

3 ≠ 0

.05

6 - 2 = 4

Reject H0

Reject H0

.025

.025

t

-2.776

0

2.776

Interaction Test Solution

  • H0:

  • Ha:

  • 

  • df 

  • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

t = 1.8528

Do no reject at  = .05

There is no evidence of interaction


Second order models

Second–Order Models


Types of regression models4

Explanatory

Variable

1

2 or More

1

Quantitative

Quantitative

Qualitative

Variable

Variables

Variable

1st

2nd

3rd

1st

Inter-

2nd

Dummy

Order

Order

Order

Order

Action

Order

Variable

Model

Model

Model

Model

Model

Model

Model

Types of Regression Models


Second order model with 1 independent variable

Curvilinear effect

Linear effect

Second-Order Model With 1 Independent Variable

  • Relationship between 1 dependent and 1 independent variable is a quadratic function

  • Useful 1st model if non-linear relationship suspected

  • Model


Second order model relationships

Second-Order Model Relationships

2 > 0

2 > 0

y

y

x1

x1

2 < 0

2 < 0

y

y

x1

x1


Second order model worksheet

Second-Order Model Worksheet

2

Case, i

yi

xi

xi

1

1

1

1

2

4

8

64

3

1

3

9

4

3

5

25

:

:

:

:

Create x2 column. Run regression with y, x, x2.


2 nd order model example

2nd Order Model Example

Errors (y) Weeks (x)201 181 16210484453618210111012112

The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.


Second order model worksheet1

Second-Order Model Worksheet

2

yi

xi

xi

1

1

20

1

1

18

2

4

16

4

16

10

:

:

:

Create x2 column. Run regression with y, x, x2.


Excel computer output solution6

Excel Computer Output Solution


Overall model test solution

Overall Model Test Solution

Global F–test indicates at least one parameter is not zero

F

P-Value


2 parameter test solution

P-Value

β2 Parameter Test Solution

β2 test indicates curvilinear relationship exists

t


Types of regression models5

Explanatory

Variable

1

2 or More

1

Quantitative

Quantitative

Qualitative

Variable

Variables

Variable

1st

2nd

3rd

1st

Inter-

2nd

Dummy

Order

Order

Order

Order

Action

Order

Variable

Model

Model

Model

Model

Model

Model

Model

Types of Regression Models


Second order model with 2 independent variables

Second-Order Model With 2 Independent Variables

  • Relationship between 1 dependent and 2 independent variables is a quadratic function

  • Useful 1st model if non-linear relationship suspected

  • Model


Second order model relationships1

4 + 5 > 0

4 + 5 < 0

y

y

x2

x1

y

32 > 4 45

Second-Order Model Relationships

x2

x1

x2

x1


Second order model worksheet2

Second-Order Model Worksheet

2

2

Case, i

yi

x1i

x2i

x1ix2i

x1i

x2i

1

1

1

3

3

1

9

2

4

8

5

40

64

25

3

1

3

2

6

9

4

4

3

5

6

30

25

36

:

:

:

:

:

:

:

Multiply x1by x2 to get x1x2; then create x12, x22. Run regression with y, x1, x2 , x1x2, x12, x22.


Models with one qualitative independent variable

Models With One Qualitative Independent Variable


Types of regression models6

Explanatory

Variable

1

2 or More

1

Quantitative

Quantitative

Qualitative

Variable

Variables

Variable

1st

2nd

3rd

1st

Inter-

2nd

Dummy

Order

Order

Order

Order

Action

Order

Variable

Model

Model

Model

Model

Model

Model

Model

Types of Regression Models


Dummy variable model

Dummy-Variable Model

  • Involves categorical x variable with 2 levels

    • e.g., male-female; college-no college

  • Variable levels coded 0 and 1

  • Number of dummy variables is 1 less than number of levels of variable

  • May be combined with quantitative variable (1st order or 2nd order model)


Dummy variable model worksheet

Dummy-Variable Model Worksheet

Case, i

yi

x1i

x2i

1

1

1

1

2

4

8

0

3

1

3

1

4

3

5

1

:

:

:

:

x2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x1, x2


Interpreting dummy variable model equation

Same slopes

Male ( x2 = 0 ):

Female ( x2 = 1 ):

Interpreting Dummy-Variable Model Equation

Given:

y = Starting salary of college graduates

x1 = GPA

0 if Male

x2 =

1 if Female


Dummy variable model example

Same slopes

Male ( x2 = 0 ):

Female ( x2 = 1 ):

Dummy-Variable Model Example

Computer Output:

0 if Male

x2 =

1 if Female


Dummy variable model relationships

Dummy-Variable Model Relationships

^

y

Same Slopes 1

Female

^

^

0 + 2

Male

^

0

x1

0

0


Nested models

Nested Models


Comparing nested models

Comparing Nested Models

  • Contains a subset of terms in the complete (full) model

  • Tests the contribution of a set of x variables to the relationship with y

  • Null hypothesis H0: g+1 = ... = k = 0

    • Variables in set do not improve significantly the model when all other variables are included

  • Used in selecting x variables or models

  • Part of most computer programs


Selecting variables in model building

Selecting Variables in Model Building


Selecting variables in model building1

Selecting Variables in Model Building

A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous

Use Theory Only!

Use Computer Search!


Model building with computer searches

Model Building with Computer Searches

  • Rule: Use as few x variables as possible

  • Stepwise Regression

    • Computer selects x variable most highly correlatedwith y

    • Continues to add or remove variables depending on SSE

  • Best subset approach

    • Computer examines all possible sets


Residual analysis

Residual Analysis


Evaluating multiple regression model steps4

Evaluating Multiple Regression Model Steps

  • Examine variation measures

  • Test parameter significance

    • Individual coefficients

    • Overall model

  • Do residual analysis


Residual analysis1

Residual Analysis

  • Graphical analysis of residuals

    • Plot estimated errors versus xi values

      • Difference between actual yiand predicted yi

      • Estimated errors are called residuals

    • Plot histogram or stem-&-leaf of residuals

  • Purposes

    • Examine functional form (linear v. non-linear model)

    • Evaluate violations of assumptions


Residual plot for functional form

^

^

e

e

x

x

Residual Plot for Functional Form

Add x2 Term

Correct Specification


Residual plot for equal variance

^

^

e

e

x

x

Residual Plot for Equal Variance

Unequal Variance

Correct Specification

Fan-shaped.Standardized residuals used typically.


Residual plot for independence

^

e

x

Residual Plot for Independence

Not Independent

Correct Specification

^

e

x

Plots reflect sequence data were collected.


Residual analysis computer output

Dep Var Predict Student

Obs SALES Value Residual Residual -2-1-0 1 2

1 1.0000 0.6000 0.4000 1.044 | |** |

2 1.0000 1.3000 -0.3000 -0.592 | *| |

3 2.0000 2.0000 0 0.000 | | |

4 2.0000 2.7000 -0.7000 -1.382 | **| |

5 4.0000 3.4000 0.6000 1.567 | |*** |

Residual Analysis Computer Output

Plot of standardized (student) residuals


Regression pitfalls

Regression Pitfalls


Regression pitfalls1

Regression Pitfalls

  • Parameter Estimability

    • Number of different x–values must be at least one more than order of model

  • Multicollinearity

    • Two or more x–variables in the model are correlated

  • Extrapolation

    • Predicting y–values outside sampled range

  • Correlated Errors


Multicollinearity

Multicollinearity

  • High correlation between x variables

  • Coefficients measure combined effect

  • Leads to unstable coefficients depending on x variables in model

  • Always exists – matter of degree

  • Example: using both age and height as explanatory variables in same model


Detecting multicollinearity

Detecting Multicollinearity

  • Significant correlations between pairs of x variables are more than with y variable

  • Non–significant t–tests for most of the individual parameters, but overall model test is significant

  • Estimated parameters have wrong sign


Solutions to multicollinearity

Solutions to Multicollinearity

  • Eliminate one or more of the correlated x variables

  • Avoid inference on individual parameters

  • Do not extrapolate


Extrapolation

Extrapolation

y

Interpolation

Extrapolation

Extrapolation

x

Sampled Range


Conclusion

Conclusion

  • Explained the Linear Multiple Regression Model

  • Described Inference About Individual Parameters

  • Tested Overall Significance

  • Explained Estimation and Prediction

  • Described Various Types of Models

  • Described Model Building

  • Explained Residual Analysis

  • Described Regression Pitfalls


  • Login