Chapter 12 multiple regression and model building
This presentation is the property of its rightful owner.
Sponsored Links
1 / 119

Chapter 12: Multiple Regression and Model Building PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Chapter 12: Multiple Regression and Model Building. Where We’ve Been. Introduced the straight-line model relating a dependent variable y to an independent variable x Estimated the parameters of the straight-line model using least squares Assesses the model estimates

Download Presentation

Chapter 12: Multiple Regression and Model Building

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 12 multiple regression and model building

Chapter 12: Multiple Regression and Model Building


Where we ve been

Where We’ve Been

  • Introduced the straight-line model relating a dependent variable y to an independent variable x

  • Estimated the parameters of the straight-line model using least squares

  • Assesses the model estimates

  • Used the model to estimate a value of y given x

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


Where we re going

Where We’re Going

  • Introduce a multiple-regression model to relate a variable y to two or more x variables

  • Present multiple regression models with both quantitative and qualitative independent variables

  • Assess how well the multiple regression model fits the sample data

  • Show how analyzing the model residuals can help detect problems with the model and the necessary modifications

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 1 multiple regression models

12.1: Multiple Regression Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 1 multiple regression models1

12.1: Multiple Regression Models

  • Analyzing a Multiple-Regression Model

    Step 1: Hypothesize the deterministic portion of the model by choosing the independent variables x1, x2, … , xk.

    Step 2: Estimate the unknown parameters  0, 1, 2, … , k .

    Step 3: Specify the probability distribution of  and estimate the standard deviation  of this distribution.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 1 multiple regression models2

12.1: Multiple Regression Models

  • Analyzing a Multiple-Regression Model

    Step 4: Check that the assumptions about  are satisfied; if not make the required modifications to the model.

    Step 5: Statistically evaluate the usefulness of the model.

    Step 6: If the model is useful, use it for prediction, estimation and other purposes.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 1 multiple regression models3

12.1: Multiple Regression Models

  • Assumptions about the Random Error 

  • The mean is equal to 0.

  • The variance is equal to  2.

  • The probability distribution is a normal distribution.

  • Random errors are independent of one another.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

A First-Order Model in Five Quantitative Independent Variables

where x1, x2, … , xk are all quantitative variables that are not functions of other independent variables.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters1

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

A First-Order Model in Five Quantitative Independent Variables

The parameters are estimated by finding the values for the  ‘s that minimize

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters2

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

A First-Order Model in Five Quantitative Independent Variables

The parameters are estimated by finding the values for the  ‘s that minimize

Only a truly talented mathematician (or geek) would choose to solve the necessary system of simultaneous linear equations by hand. In practice, computers are left to do the complicated calculation required by multiple regression models.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters3

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

  • A collector of antique clocks hypothesizes that the auction price can be modeled as

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters4

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

  • Based on the data in Table 12.1, the least squares prediction equation, the equation that minimizes SSE, is

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters5

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

  • Based on the data in Table 12.1, the least squares prediction equation, the equation that minimizes SSE, is

The estimate for  1 is interpreted as the expected change in y given a one-unit change in x1 holding x2 constant

The estimate for  2 is interpreted as the expected change in y given a one-unit change in x2 holding x1 constant

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters6

12.2: The First-Order Model: Estimating and Making Inferences about the  Parameters

  • Based on the data in Table 12.1, the least squares prediction equation, the equation that minimizes SSE, is

Since it makes no sense to sell a clock of age 0 at an auction with no bidders, the intercept term has no meaningful interpretation in this example.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters7

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

One-Tailed Test

Two-Tailed Test

Test of an Individual Parameter Coefficient in the Multiple Regression Model

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters8

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

Test of the Parameter Coefficient on the Number of Bidders

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters9

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

Test of the Parameter Coefficient on the Number of Bidders

Since t* > t, reject the null hypothesis.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters10

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

A 100(1-)% Confidence Interval for a  Parameter

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters11

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

A 100(1-)% Confidence Interval for  1

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 2 the first order model estimating and making inferences about the parameters12

12.2: The First-Order Model:Estimating and Making Inferences about the  Parameters

A 100(1-)% Confidence Interval for  1

Holding the number of bidders constant, the result above tells us that we can be 90% sure that the auction price will rise between $11.20 and $14.28 for each 1-year increase in age.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility

12.3: Evaluating Overall Model Utility

Reject H 0 for i

Do Not Reject H 0 for i

There may be no relationship between y and xi

Type II error occurred

The relationship between y and xi is more complex than a straight-line relationship

  • Evidence of a linear relationship between y and xi

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility1

12.3: Evaluating Overall Model Utility

  • The multiple coefficient of determination, R2,measures how much of the overall variation in y is explained by the least squares prediction equation.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility2

12.3: Evaluating Overall Model Utility

  • High values of R2 suggest a good model, but the usefulness of R2falls as the number of observations becomes close to the number of parameters estimated.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility3

12.3: Evaluating Overall Model Utility

Ra2 adjusts for the number of observations and the number of parameter estimates. It will always have a value no greater than R2.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility4

12.3: Evaluating Overall Model Utility

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility5

12.3: Evaluating Overall Model Utility

Rejecting the null hypothesis means that something in your model helps explain variations in y, but it may be that another model provides more reliable estimates and predictions.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility6

12.3: Evaluating Overall Model Utility

A collector of antique clocks hypothesizes that the auction price can be modeled as

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility7

12.3: Evaluating Overall Model Utility

A collector of antique clocks hypothesizes that the auction price can be modeled as

Something in the model is useful, but the F-test can’t tell us which x-variables are individually useful.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 3 evaluating overall model utility8

12.3: Evaluating Overall Model Utility

  • Checking the Utility of a Multiple-Regression Model

    • Use the F-test to conduct a test of the adequacy of the overall model.

    • Conduct t-tests on the “most important”  parameters.

    • Examine Ra2 and 2s to evaluate how well the model fits the data.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 4 using the model for estimation and prediction

12.4: Using the Model for Estimation and Prediction

  • The model of antique clock prices can be used to predict sale prices for clocks of a certain age with a particular number of bidders.

  • What is the mean sale price for all 150-year-old clocks with 10 bidders?

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 4 using the model for estimation and prediction1

12.4: Using the Model for Estimation and Prediction

The average value of all clocks with these characteristics can be found by using the statistical software to generate a confidence interval. (See Figure 12.7)

In this case, the confidence interval indicates that we can be 95% sure that the average price of a single 150-year-old clock sold at auction with 10 bidders will be between $1,154.10 and $1,709.30.

  • What is the mean auction sale price for a single 150-year-old clock with 10 bidders?

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 4 using the model for estimation and prediction2

12.4: Using the Model for Estimation and Prediction

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 4 using the model for estimation and prediction3

12.4: Using the Model for Estimation and Prediction

  • What is the mean sale price for a single 50-year-old clock with 2 bidders?

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 4 using the model for estimation and prediction4

12.4: Using the Model for Estimation and Prediction

  • What is the mean sale price for a single 50-year-old clock with 2 bidders?

Since 50 years-of-age and 2 bidders are both outside of the range of values in our data set, any prediction using these values would be unreliable.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models

12.5: Model Building: Interaction Models

  • In some cases, the impact of an independent variable xi on y will depend on the value of some other independent variable xk.

  • Interaction models include the cross-products of independent variables as well as the first-order values.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models1

12.5: Model Building: Interaction Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models2

12.5: Model Building: Interaction Models

  • In the antique clock auction example, assume the collector has reason to believe that the impact of age (x1) on price (y) varies with the number of bidders (x2) .

  • The model is now

    y = 0 + 1x1 + 2x2 + 3x1x2 +  .

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models3

12.5: Model Building: Interaction Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models4

12.5: Model Building: Interaction Models

  • In the antique clock auction example, assume the collector has reason to believe that the impact of age (x1) on price (y) varies with the number of bidders (x2) .

  • The model is now

    y = 0 + 1x1 + 2x2 + 3x1x2 +  .

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models5

12.5: Model Building: Interaction Models

The MINITAB results are reported

in Figure 12.11 in the text.

  • In the antique clock auction example, assume the collector has reason to believe that the impact of age (x1) on price (y) varies with the number of bidders (x2) .

  • The model is now

    y = 0 + 1x1 + 2x2 + 3x1x2 +  .

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models6

12.5: Model Building: Interaction Models

  • In the antique clock auction example, assume the collector has reason to believe that the impact of age (x1) on price (y) varies with the number of bidders (x2) .

  • The model is now

    y = 0 + 1x1 + 2x2 + 3x1x2 +  .

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 5 model building interaction models7

12.5: Model Building: Interaction Models

Once the interaction term has passed the t-test, it is unnecessary to test the individual independent variables.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models

12.6: Model Building: Quadratic and Other Higher Order Models

  • A quadratic (second-order) model includes the square of an independent variable:

    y = 0 + 1x+ 2x2 + .

    This allows more complex relationships to be modeled.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models1

12.6: Model Building: Quadratic and Other Higher Order Models

  • A quadratic (second-order) model includes the square of an independent variable:

    y = 0 + 1x+ 2x2 + .

    1 is the shift parameter and

    2 is the rate of curvature.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models2

12.6: Model Building: Quadratic and Other Higher Order Models

  • Example 12.7 considers whether home size (x) impacts electrical usage (y) in a positive but decreasing way.

  • The MINITAB results are shown in Figure 12.13.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models3

12.6: Model Building: Quadratic and Other Higher Order Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models4

12.6: Model Building: Quadratic and Other Higher Order Models

  • According to the results, the equation that minimizes SSE for the 10 observations is

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models5

12.6: Model Building: Quadratic and Other Higher Order Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models6

12.6: Model Building: Quadratic and Other Higher Order Models

  • Since 0 is not in the range of the independent variable (a house of 0 ft2?), the estimated intercept is not meaningful.

  • The positive estimate on 1indicates a positive relationship, although the slope is not constant (we’ve estimated a curve, not a straight line).

  • The negative value on 2indicates the rate of increase in power usage declines for larger homes.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models7

12.6: Model Building: Quadratic and Other Higher Order Models

  • The Global F-Test

    • H0: 1= 2= 0

    • Ha: At least one of the coefficients ≠ 0

      • The test statistic is F = 189.71, p-value near 0.

      • Reject H0.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models8

12.6: Model Building: Quadratic and Other Higher Order Models

  • t-Test of 2

    • H0: 2= 0

      Ha: 2< 0

      • The test statistic is t = -7.62, p-value = .0001 (two-tailed).

      • The one-tailed test statistic is .0001/2 = .00005

      • Reject the null hypothesis.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models9

12.6: Model Building: Quadratic and Other Higher Order Models

Complete Second-Order Model with Two Quantitative Independent Variables

E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22

y-intercept

Signs and values of these parameters control the type of surface and the rates of curvature

Changing 1and 2causes the surface to shift along the x1and x2axes

Controls the rotation of the surface

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 6 model building quadratic and other higher order models10

12.6: Model Building: Quadratic and Other Higher Order Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models

12.7: Model Building: Qualitative (Dummy) Variable Models

  • Qualitative variables can be included in regression models through the use of dummy variables.

  • Assign a value of 0 (the base level) to one category and 1, 2, 3 … to the other categories.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models1

12.7: Model Building: Qualitative (Dummy) Variable Models

A Qualitative Independent Variable with k Levels

where xi is the dummy variable for level i + 1 and

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models2

12.7: Model Building: Qualitative (Dummy) Variable Models

  • For the golf ball example from Chapter 10, there were four levels (the brands).Testing differences in brands can be done with the model

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models3

12.7: Model Building: Qualitative (Dummy) Variable Models

  • Brand A is the base level, so 0 represents the mean distance (A) for Brand A, and

    1 = B - A

    2 = C - A

    3 = D - A

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models4

12.7: Model Building: Qualitative (Dummy) Variable Models

  • Testing that the four means are equal is equivalent to testing the significance of the s:

    H0: 1 = 2 = 3 = 0

    Ha: At least of one the s ≠ 0

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models5

12.7: Model Building: Qualitative (Dummy) Variable Models

  • Testing that the four means are equal is equivalent to testing the significance of the s:

    H0: 1 = 2 = 3 = 0

    Ha: At least of one the s ≠ 0

The test statistic is the F-statistic.

Here F = 43.99, p-value  .000.

Hence we reject the null hypothesis

that the golf balls all have the same

mean driving distance.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 7 model building qualitative dummy variable models6

12.7: Model Building: Qualitative (Dummy) Variable Models

  • Testing that the four means are equal is equivalent to testing the significance of the s:

    H0: 1 = 2 = 3 = 0

    Ha: At least of one the s ≠ 0

The test statistic if the F-statistic.

Here F = 43.99, p-value  .000.

Hence we reject the null hypothesis

that the golf balls all have the same

mean driving distance.

Remember that the maximum number of dummy variables is one less than the number of levels for the qualitative variable.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

  • Suppose a first-order model is used to evaluate the impact on mean monthly sales of expenditures in three advertising media: television, radio and newspaper.

    • Expenditure, x1, is a quantitative variable

    • Types of media, x2 and x3, are qualitative variables (limited to k levels -1)

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables1

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables2

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables3

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

  • Suppose now a second-order model is used to evaluate the impact of expenditures in the three advertising media on sales.

  • The relationship between expenditures, x1, and sales, y, is assumed to be curvilinear.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables4

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

In this model, each medium is assumed to have the save impact on sales.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables5

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

In this model, the

intercepts differ but the shapes of the curves

are the same.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 8 model building models with both quantitative and qualitative variables6

12.8: Model Building: Models with Both Quantitative and Qualitative Variables

In this model, the response curve for each media type is different – that is, advertising expenditure and media type interact, at varying rates.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models

12.9: Model Building: Comparing Nested Models

  • Two models are nested if one model contains all the terms of the second model and at least one additional term. The more complex of the two models is called the complete model and the simpler of the two is called the reduced model.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models1

12.9: Model Building: Comparing Nested Models

  • Recall the interaction model relating the auction price (y) of antique clocks to age (x1) and bidders (x2) :

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models2

12.9: Model Building: Comparing Nested Models

  • If the relationship is not constant, a second-order model should be considered:

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models3

12.9: Model Building: Comparing Nested Models

  • If the complete model produces a better fit, then the s on the quadratic terms should be significant.

    H0: 4 = 5 = 0

    Ha: At least one of 4 and 5 is non-zero

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models4

12.9: Model Building: Comparing Nested Models

F-Test for Comparing Nested Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models5

12.9: Model Building: Comparing Nested Models

F-Test for Comparing Nested Models

where

SSER = sum of squared errors for the reduced model

SSEC = sum of squared errors for the complete model

MSEC = mean square error (s2) for the complete model

k – g = number of  parameters specified in H0

k + 1 = number of  parameters in the complete model

n = sample size

Rejection region: F > F, with k – g numerator and n – (k + 1) denominator degrees of freedom.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models6

12.9: Model Building: Comparing Nested Models

  • The growth of carnations (y) is assumed to be a function of the temperature (x1) and the amount of fertilizer (x2).

  • The data are shown in Table 12.6 in the text.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models7

12.9: Model Building: Comparing Nested Models

The growth of carnations (y) is assumed to be a function of the temperature (x1) and the amount of fertilizer (x2).

The complete second order model is

The least squares prediction equation from Table 12.6 is

rounded to

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models8

12.9: Model Building: Comparing Nested Models

The growth of carnations (y) is assumed to be a function of the temperature (x1) and the amount of fertilizer (x2).

To test the significance of the contribution of the interaction and second-order terms, use

H0: 3 = 4 = 5 = 0

Ha: At least one of 3, 4 or 5 ≠ 0

This requires estimating the complete model in reduced

form, dropping the parameters in the null hypothesis.

Results are given in Figure 12.31.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models9

12.9: Model Building: Comparing Nested Models

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models10

12.9: Model Building: Comparing Nested Models

Reject the null hypothesis: the complete model seems to provide better predictions than the reduced model.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models11

12.9: Model Building: Comparing Nested Models

  • A parsimonious model is a general linear model with a small number of  parameters. In situations where two competing models have essentially the same predictive power (as determined by an F-test), choose the more parsimonious of the two.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 9 model building comparing nested models12

12.9: Model Building: Comparing Nested Models

  • A parsimonious model is a general linear model with a small number of  parameters. In situations where two competing models have essentially the same predictive power (as determined by an F-test), choose the more parsimonious of the two.

If the models are not nested, the choice is more subjective, based on Ra2, s, and an understanding of the theory behind the model.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 10 model building stepwise regression

12.10: Model Building: Stepwise Regression

  • It is often unclear which independent variables have a significant impact on y.

  • Screening variables in an attempt to identify the most important ones is known as stepwise regression.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 10 model building stepwise regression1

12.10: Model Building: Stepwise Regression

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 10 model building stepwise regression2

12.10: Model Building: Stepwise Regression

  • Stepwise regression must be used with caution

    • Many t-tests are conducted, leading to high probabilities of Type I or Type II errors.

    • Usually, no interaction or higher-order terms are considered – and reality may not be that simple.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions

12.11: Residual Analysis: Checking the Regression Assumptions

  • Regression analysis is based on the four assumptions about the random error  considered earlier.

  • The mean is equal to 0.

  • The variance is equal to  2.

  • The probability distribution is a normal distribution.

  • Random errors are independent of one another.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions1

12.11: Residual Analysis: Checking the Regression Assumptions

  • If these assumptions are not valid, the results of the regression estimation are called into question.

  • Checking the validity of the assumptions involves analyzing the residuals of the regression.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions2

12.11: Residual Analysis: Checking the Regression Assumptions

  • A regression residual is defined as the difference between an observed y-valueand its corresponding predicted value:

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions3

12.11: Residual Analysis: Checking the Regression Assumptions

Properties of the Regression Residuals

  • The mean of the residuals is equal to 0.

  • The standard deviation of the residuals is equal to the standard deviations of the fitted regression model.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions4

12.11: Residual Analysis: Checking the Regression Assumptions

  • If the model is misspecified, the mean of  will not equal 0.

    • Residual analysis may reveal this problem.

    • The home-size electricity usage example illustrates this.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions5

12.11: Residual Analysis: Checking the Regression Assumptions

  • The plot of the first-order model shows a curvilinear residual pattern …

  • while the quadratic model shows a more random pattern.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions6

12.11: Residual Analysis: Checking the Regression Assumptions

A pattern in the residual plot may indicate a problem with the model.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions7

12.11: Residual Analysis: Checking the Regression Assumptions

  • A residual larger than 3s (in absolute value) is considered an outlier.

    • Outliers will have an undue influence on the estimates.

      1. Mistakenly recorded data

      2. An observation that is for some reason truly different from the others

      3. Random chance

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions8

12.11: Residual Analysis: Checking the Regression Assumptions

  • A residual larger than 3s (in absolute value) is considered an outlier.

    • Leaving an outlier that should be removed in the data set will produce misleading estimates and predictions (#1 & #2 above).

    • So will removing an outlier that actually belongs in the data set (#3 above).

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions9

12.11: Residual Analysis: Checking the Regression Assumptions

  • Residual plots should be centered on 0 and within ±3s of 0.

  • Residual histograms should be relatively bell-shaped.

  • Residual normal probability plots should display straight lines.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions10

12.11: Residual Analysis: Checking the Regression Assumptions

Regression Analysis

is Robust with

respect to (small)

nonnormal errors.

  • Slight departures from normality will not seriously harm the validity of the estimates, but as the departure from normality grows, the validity falls.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions11

12.11: Residual Analysis: Checking the Regression Assumptions

  • If the variance of  changes as y changes, the constant variance assumption is violated.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions12

12.11: Residual Analysis: Checking the Regression Assumptions

  • A first-order model is used to relate the salaries (y) of social workers to years of experience (x).

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions13

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions14

12.11: Residual Analysis: Checking the Regression Assumptions

  • The model seems to provide good predictions, but the residual plot reveals a non-random pattern:

  • The residual increases as the estimated mean salary increases, violating the constant variance assumption

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions15

12.11: Residual Analysis: Checking the Regression Assumptions

  • Transforming the dependent variable often stabilizes the residual

    • Possible transformations of y

    • Natural logarithm

    • Square root

    • sin-1y1/2

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions16

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions17

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions18

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions19

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions20

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 11 residual analysis checking the regression assumptions21

12.11: Residual Analysis: Checking the Regression Assumptions

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation1

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation2

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 1: Parameter Estimability

If x does not take on a sufficient number of different values, no single unique line can be estimated.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation3

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 2: Multicollinearity

Multicollinearityexists when two or more of the

independent variables in a regression are correlated.

If xi and xj move together in some way, finding the impact on y of a one-unit change in either of them holding the other constant will be difficult or impossible.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation4

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 2: Multicollinearity

Multicollinearitycan be detected in various ways.

A simple check is to calculate the correlation coefficients (rij)

for each pair of independent variables in the model.

Any significant rijmay indicate a multicollinearity problem.

  • If severe multicollinearity exists, the result may be

  • Significant F-values but insignificant t-values

  • Signs on s opposite to those expected

  • Errors in  estimates, standard errors, etc.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation5

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

  • The Federal Trade Commission (FTC) ranks cigarettes according to their tar (x1), nicotine (x2), weight in grams (x3) and carbon monoxide (y) content .

  • 25 data points (see Table 12.11) are used to estimate the model

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation6

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation7

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

  • F = 78.98, p-value < .0001

    • t1= 3.97, p-value = .0007

    • t2= -0.67, p-value = .5072

    • t3= -0.3, p-value = .9735

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation8

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

  • F = 78.98, p-value < .0001

    • t1= 3.97, p-value = .0007

    • t2= -0.67, p-value = .5072

    • t3= -0.3, p-value = .9735

The negative signs on two variables and the insignificant t-values are suggestive of multicollinearity .

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation9

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

  • The coefficients of correlation, rij, provide further evidence:

    • rtar, nicotine = .9766

    • rtar, weight = .4908

    • rweight, nicotine = .5002

  • Each rij is significantly different from 0 at the  = .05 level.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation10

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

  • Possible Responses to Problems Created by Multicollinearity in Regression

    • Drop one or more correlated independent variables from the model.

    • If all the xs are retained,

      • Avoid making inferences about the individual  parameters from the t-tests.

      • Restrict inferences about E(y) and future y values to values of the xs that fall within the range of the sample data.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation11

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 3: Extrapolation

The data used to estimate the model provide information only on the range of values in the data set. There is no reason to assume that the dependent variable’s response will be the same over a different range of values.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation12

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 3: Extrapolation

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


12 12 some pitfalls estimability multicollinearity and extrapolation13

12.12: Some Pitfalls: Estimability, Multicollinearity and Extrapolation

Problem 4: Correlated Errors

If the error terms are not independent (a frequent problem in time series), the model tests and prediction intervals are invalid. Special techniques are used to deal with time series models.

McClave: Statistics, 11th ed. Chapter 12: Multiple Regression and Model Building


  • Login