Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

Chapter 15 Learning Objectives (LOs) LO 15.1:Conduct tests of individual significance. LO 15.2:Conduct a test of joint significance. LO 15.3:Conduct a general test of linear restrictions. LO 15.4:Calculate and interpret interval estimates for predictions. LO 15.5:Explain the role of the assumptions on the OLS estimators. LO 15.6:Describe common violations of the assumptions and offer remedies.

Analyzing the Winning Percentage in Baseball • Sports analysts frequently quarrel over what statistics separate winning teams from the losers. • Is a high batting average (BA) the best predictor, or is it a low earned run average (ERA)? Or both? • We will fit three regression models and use the statistical significance of the predictors to help decide.

15.1Tests of Significance LO 15.1 Conduct tests of individual significance.

Tests of Individual Significance LO 15.1

The Test Statistic LO 15.1

LO 15.1

Computer-Generated Output LO 15.1 • Virtually all statistical software will automatically report a test statistic and a p-value with each coefficient estimate. • These values can be used to test whether the regression coefficient differs from zero. • To perform a one-sided test where the hypothesized value is zero, divide the computer-reported p-value in half. • If we wish to test whether the coefficient differs from a nonzero value, we need to compute a new test statistic.

Example 15.1 LO 15.1

Intervals for the Parameters LO 15.1

A Test for a Non-Zero Slope LO 15.1

Example 15.3 LO 15.1

Test of Joint Significance LO 15.2 Conduct a test of joint significance.

Example 15.4 LO 15.2

Synopsis of the Introductory Case • Goodness-of-fit measures indicated that including both batting average and ERA is most appropriate. • In Model 3, explanatory variables are individually significant and the regression is jointly significant. • We can conclude that both batting average and earned run average are good predictors of overall winning percentage.

15.2: A General Test of Linear Restrictions LO 15.3 Conduct a general test of linear restrictions.

Restricted and Unrestricted Models LO 15.3 • To conduct the partial F test, we estimate the model with and without the restrictions. • In the restricted model we do not estimate the coefficients that are restricted under the null hypothesis. • The unrestricted model is a complete model that imposes no restrictions on the coefficients. • If restrictions are valid, then the restricted model’s error sum of squares, SSER, will not be significantly larger than the unrestricted model’s error sum of squares, SSEU.

Example 15.5 LO 15.3 • A manager at a car wash company wants to determine which promotions improve sales. • He has information on sales, price discounts, and advertising expenditures on Radio and Newspaper in 40 Missouri counties.

Testing the Effects of Advertising LO 15.3

Estimate the Two Models LO 15.3 The table on the right displays the estimates and SSE for each model. The p-values are in parentheses. We can see from the table that the SSEU = 1208.1348, while the SSER = 2182.5649. We can now proceed to computing the value of the test statistic.

Example 15.5 LO 15.3

Are Advertising Returns Equal? LO 15.3

Example 15.6 LO 15.3

Conducting the Test LO 15.3

15.3: Interval Estimates for Predictions LO 15.4 Calculateand interpret interval estimates for predictions.

Two Types of Predictions LO 15.4

The Confidence Interval LO 15.4

Modified Regression LO 15.4

Confidence Interval for Winning Percentage LO 15.4 • In the baseball example, we first shift the data by our hypothesized values: • Estimating the modified regression now reveals the confidence interval:

Confidence Interval for Winning Percentage LO 15.4

The Prediction Interval LO 15.4

Prediction Interval for Winning Percentage LO 15.4

15.4: Model Assumptions and Common Violations LO 15.5 Explain the role of the assumptions on the OLS estimators.

Assumptions (continued) LO 15.5

Checking the Assumptions LO 15.5

Common Violation 1: The Model Suffers from Multicollinearity LO 15.6 Describe common violations of the assumptions and offer remedies.

Remedying Multicollinearity LO 15.6 • A good remedy may be to simply drop one of the collinear variables if we can justify it as redundant. • Alternatively, we could try to increase our sample size. • Another option would be to try to transform our variables so that they are no longer collinear. • Last, especially if we are interested only in maintaining a high predictive power, it may make sense to do nothing.

Common Violation 2: The Error Term Is Heteroskedastic LO 15.6 • The variance of the error term changes for different values of at least one explanatory variable. • Informal residual plots can gauge heteroskedasticity. In Example 15.10, we try to predict sales by the square footage of convenience stores. The residuals display a marked pattern:

Remedying Heteroskedasticity LO 15.6 • Heteroskedasticity results in inefficient estimators and the hypothesis tests for significance are no longer valid. • To get around the second problem, some researchers use OLS estimates along with corrected standard errors, called White’s standard errors. Many statistical packages have this option available, unfortunately the current version of Excel does not.

Common Violation 3: The Error Term Is Serially Correlated LO 15.6 • We assume that the error term is uncorrelated across observations when obtaining OLS estimates. • But this often breaks down in time series data. In Example 15.11, we predict sales at a sushi restaurant over a period of time. A plot of the residuals against time shows: • Remedies are not easily accessible using Excel.

Common Violation 4: The Explanatory Variable is Endogenous LO 15.6 • Endogeneity in the regression model refers to the error term being correlated with the explanatory variables. • This commonly occurs due to an omitted explanatory variable. • For example, a person’s salary may be highly correlated with that person’s innate ability. But since we cannot include it, ability gets incorporated in the error term. If we try to predict salary by years of education, which may also be correlated with innate ability, then we have an endogeneity problem.

Common Violation 4: The Explanatory Variable is Endogenous LO 15.6 • Endogeneity will result in biased estimators, and so is quite a serious problem. • Unfortunately, endogeneity is difficult to fix. Most commonly, we would like to find an instrumental variable, one that is correlated with the endogenous explanatory variable but uncorrelated with the error term. But it may be difficult to find such a variable. • Further discussion of the instrumental variable approach is beyond the scope of the text.

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly