1 / 51

# Multiple Regression - PowerPoint PPT Presentation

Multiple Regression. Chapter 17. Introduction. In this chapter we extend the simple linear regression model, and allow for any number of independent variables. We expect to build a model that fits the data better than the simple linear regression model. Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Multiple Regression' - ulf

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Multiple Regression

Chapter 17

• In this chapter we extend the simple linear regression model, and allow for any number of independent variables.

• We expect to build a model that fits the data better than the simple linear regression model.

• We all believe that weight is affected by the amount of calories consumed. Yet, the actual effect is different from one individual to another.

• Therefore, a simple linear relationship leaves much unexplained error.

Weight

Calories consumed

Introduction

Weight

Click to to continue

In an attempt to reduce the unexplained errors, we’ll adda second explanatory (independent) variable

Weight

Height

Calories consumed

• If we believe a person’s height explains his/her weight too, we can add this variable to our model.

• The resulting Multiple regression model is shown:

Weight = b0 + b1Calories+ b2Height + e

• We shall use computer printout to

• Assess the model

• How well it fits the data

• Is it useful

• Are any required conditions violated?

• Employ the model

• Interpreting the coefficients

• Making predictions using the prediction equation

• Estimating the expected value of the dependent variable

Random error variable

Dependent variable

Independent variables

17.1 Model and Required Conditions

• We allow k independent variables to potentially explain the dependent variable

y = b0 + b1x1+ b2x2 + …+ bkxk + e

• The error e is normally distributed.

• The mean is equal to zero and the standard deviation is constant (se)for all values of y.

• The errors are independent.

17.2 Estimating the Coefficients and Assessing the Model

• The procedure used to perform regression analysis:

• Obtain the model coefficients and statistics using a statistical software.

• Diagnose violations of required conditions. Try to remedy problems when identified.

• Assess the model fit using statistics obtained from the sample.

• If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.

• Example 1 Where to locate a new motor inn?

• La Quinta Motor Inns is planning an expansion.

• Management wishes to predict which sites are likely to be profitable.

• Several areas where predictors of profitability can be identified are:

• Competition

• Market awareness

• Demand generators

• Demographics

• Physical quality

Estimating the Coefficients and Assessing the Model, Example

Operating Margin

Profitability

Market

awareness

Competition

Customers

Community

X1 x2 x3 x4 x5 x6

Rooms

Nearest

Office

space

Income

Distance

Enrollment

Median

household

income.

Number of

hotels/motels

rooms within

3 miles from

the site.

Distance to

the nearest

La Quinta inn.

Distance to

downtown.

College

Enrollment

• Data were collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:

Margin = b0 + b1Rooms + b2Nearest + b3Office + b4College + b5Income + b6Disttwn + e

La Quinta

This is the sample regression equation

(sometimes called the prediction equation)

MARGIN = 38.14 - 0.0076ROOMS+1.65NEAREST

+ 0.02OFFICE +0.21COLLEGE

+0.41INCOME - 0.23DISTTWN

Model Assessment -Standard Error of Estimate

• A small value of se indicates (by definition) a small variation of the errors around their mean.

• Since the mean is zero, small variation of the errors means the errors are close to zero.

• So we would prefer a model with a small standard deviation of the error rather than a large one.

• How can we determine whether the standard deviation of the error is small/large?

The magnitude of se is judged by comparing it to

Model Assessment -Standard Error of Estimate

• The standard deviation of the error se is estimated by the Standard Error of Estimate se:

Calculating the mean value of y we have

Standard Error of Estimate

From the printout, se = 5.5121

Model Assessment – Coefficient of Determination

• In our example it seems se is not particularly small, or is it?

• If seis small the model fits the data well, and is considered useful. The usefulness of the model is evaluated by the amount of variability in the ‘y’ values explained by the model. This is done by the coefficient of determination.

• The coefficient of determination is calculated byAs you can see, SSE (thus se) effects the value of r2.

From the printout, R2 = 0.5251that is, 52.51% of the variabilityin the margin values is explainedby this model.

• We pose the question:

Is there at least one independent variable linearly related to the dependent variable?

• To answer the question we test the hypothesis

• H0: b1 = b2 = … = bk = 0

• H1: At least one bi is not equal to zero.

• If at least one bi is not equal to zero, the model has some validity.

The total variation in y (SS(Total)) can be explained in part by the regression (SSR) while the rest remains unexplained (SSE):SS(Total) = SSR + SSE or

Note, that if all the data points satisfy the linear equation without errors, yi and coincide, and thus SSE = 0. In this case all the variation in y is explained bythe regression (SS(Total) = SSR).

If errors exist in small amounts, SSR will be close to SS(Total) and the ratioSSR/SSE will be large. This leads to the F ratio test presented next.

Define the Mean of the Sum of Squares-Regression (MSR)

Define the Mean of the Sum of Squares-Error (MSE)

The ratio MSR/MSE is F-distributed

Note.

A Large F results from a large SSR, which indicates much of the variation in y is explained by the regression model; this is when the model is useful. Hence, the null hypothesis (which states that the model is not useful) should be rejected when F is sufficiently large. Therefore, the rejection region has the form of F > Fa,k,n-k-1

• Rejection region

• F>Fa,k,n-k-1

n–k–1 =

n–1 =

Testing the Model Validity of the La Quinta Inns Regression Model

The F ratio test is performed using the ANOVAportion of the regression output

MSR/MSE

SSR

MSR=SSR/k

MSE=SSE/(n-k-1)

SSE

n–k–1 =

n–1 =

Testing the Model Validity of the La Quinta Inns Regression Model

If alpha = .05, the critical F isFa,k,n-k-1 = F0.05,6,100-6-1=2.17

F = 17.14 > 2.17

Conclusion: There is sufficient evidence to reject

the null hypothesis in favor of the alternative hypothesis.

At least one of the bi is not equal to zero, thus, the independent variable associated with it has linear relationship to y.

This linear regression model is useful

Also, the p-value = 3.033(10)-13. Clearly, p-value=3.033(10)-13 < 0.05= a,

• b0 = 38.14. This is the y intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.

• Interpreting the coefficients b1 through bk y = b0 + b1x1 + b2x2 +…+bkxk

• y = b0 + b1(x1+1) + b2x2 +…+bkxk = b0 + b1x1 + b2x2 +…+bkxk + b1

• b1 = – 0.0076. In this model, for each additional room within 3 mile of the La Quinta inn, the operating margin decreases on the average by .0076% (assuming the other variables are held constant).

• b2 = 1.65. In this model, for each additional mile that the nearest competitor is to a La Quinta inn, the average operating margin increases by 1.65% when the other variables are held constant.

• b3 = 0.02.For each additional 1000 sq-ft of office space, the average increase in operating margin will be .02%.

• b4 = 0.21. For each additional thousand students the average operating margin increases by .21% when the othervariablesremain constant.

• b5 = 0.41. For additional \$1000 increase in median household income, the average operating margin increases by .41%, when the other variables remain constant.

• b6 = - 0.23. For each additional mile to the downtown center, the average operating margin decreases by .23%.

Test statistic

Testing the Coefficients

• The hypothesis for each bi is

• Excel printout

For example, a test for b1:t = (-.007618-0)/.001255 = -6.068Suppose alpha=.01. t.005,100-6-1=3.39There is sufficient evidence to rejectH0 at 1% significance level.

Moreover the p=value of the test is 2.77(10-8). Clearly H0 is strongly rejected. The number of rooms is linearly related to the margin.

H0: bi= 0

H1: bi¹ 0

• The hypothesis for each bi is

• Excel printout

H0: bi= 0

H1: bi¹ 0

See next the interpretation of the p-value results

• Interpretation of the regression results for this model

• The number of hotel and motel rooms, distance to the nearest motel, the amount of office space, and the median household income are linearly related to the operating margin

• Students enrollment and distance from downtown are not linearly related to the margin

• Preferable locations have only few other motels nearby, much office space, and the surrounding households are affluent.

• The model can be used for making predictions by

• Producing prediction interval estimate of the particular value of y, for given values of xi.

• Producing a confidence interval estimate for the expected value of y, for given values of xi.

• The model can be used to learn about relationships between the independent variables xi, and the dependent variable y, by interpreting the coefficients bi

La Quinta

• Predict the average operating margin of an inn at a site with the following characteristics:

• 3815 rooms within 3 miles,

• Closet competitor 3.4 miles away,

• 476,000 sq-ft of office space,

• 24,500 college students,

• \$39,000 median household income,

• 3.6 miles distance to downtown center.

MARGIN = 38.14 - 0.0076(3815)-1.646(.9) + 0.02(476)

+0.212(24.5) - 0.413(35) + 0.225(11.2) = 37.1%

• Interval estimates by Excel (Data analysis plus)

It is predicted that the average operating margin will lie within 25.4% and 48.8%, with 95% confidence.

It is expected the average operating margin of all sites that fit this category falls within 33% and 41.2% with 95% confidence.

The average inn would not be profitable (Less than 50%).

• In many real-life situations one or more independent variables are qualitative.

• Including qualitative variables in a regression analysis model is done via indicator variables.

• An indicator variable (I) can assume one out of two values, “zero” or “one”.

1 if a degree earned is in Finance

0 if a degree earned is not in Finance

1 if the temperature was below 50o

0 if the temperature was 50o or more

1 if a first condition out of two is met

0 if a second condition out of two is met

1 if data were collected before 1980

0 if data were collected after 1980

I=

• Example 2 - continued

• Recall: A car dealer wants to predict the auction price of a car.

• The dealer believes now that both odometer reading and car color are variables that affect a car’s price.

• Three color categories are considered:

• White

• Silver

• Other colors

• Note: “Color” is a qualitative variable.

• Example 2 - continued

1 if the color is white

0 if the color is not white

I1 =

1 if the color is silver

0 if the color is not silver

I2 =

The category “Other colors” is defined by:

I1 = 0; I2 = 0

How Many Indicator Variables? Price (II)

• Note: To represent the situation of three possible colors we need only two indicator variables.

• Generally to represent a nominal variable with m possible values, we must create m-1 indicator variables.

• Solution

• the proposed model is y = b0 + b1(Odometer) + b2I1 + b3I2 + e

• The data

White color

Other color

Silver color

Enter the data in Excel as usual

Price Price (II)

17.167 - .0591(Odometer)

16.928 - .0591(Odometer)

16.837 - .0591(Odometer)

Odometer

Example: Auction Car Price (II)The Regression Equation

From Excel we get the regression equation

PRICE = 16.837 - .0591(Odometer) + .0911(I-1) + .3304(I-2)

The equation for a

silver color car.

Price = 16.837 - .0591(Odometer) + .0911(0) + .3304(1)

The equation for a

white color car.

Price=16.837 - .0591(Odometer) + .0911(1) + .3304(0)

Price = 16.837 - .0591(Odometer) + .0911(0) + .3304(0)

The equation for an

“other color” car.

Example: Auction Car Price (II) Price (II)The Regression Equation

Interpreting the equation

From Excel we get the regression equation

PRICE = 16701-.0591(Odometer)+.0911(I-1)+.3304(I-2)

For one additional mile the auction price decreases by

5.91 cents on the average.

A white car sells, on the average,

for \$91.1 more than a car of the “Other color” category

A silver color car sells, on the average,

for \$330.4 more than a car of the “Other color” category.

There is insufficient evidence Price (II)

to infer that a white color car and

a car of “other color” sell for a

different auction price.

There is sufficient evidence

to infer that a silver color car

sells for a larger price than a

car of the “other color” category.

Example: Auction Car Price (II)The Regression Equation

Car Price-Dummy

Qualitative Independent Variables; Price (II)Example: MBA Program Admission (II)

• Recall: The Dean wanted to evaluate applications for the MBA program by predicting future performance of the applicants.

• The following three predictors were suggested:

• GMAT score

• Years of work experience

• It is now believed that the type of undergraduate degree should be included in the model.

Note: The undergraduate degree is qualitative.

Qualitative Independent Variables; Price (II)Example: MBA Program Admission (II)

1 if B.A.

0 otherwise

I1 =

1 if B.B.A

0 otherwise

I2 =

1 if B.Sc. or B.Eng.

0 otherwise

I3 =

The category “Other group” is defined by:

I1 = 0; I2 = 0; I3 = 0

Qualitative Independent Variables; Price (II)Example: MBA Program Admission (II)

MBA-II

• Pay-equity can be handled in two different forms:

• Equal pay for equal work

• Equal pay for work of equal value.

• Regression analysis is extensively employed in cases of equal pay for equal work.

Human Resources Management: Price (II)Pay-Equity

• Example 3

• Is there sex discrimination against female managers in a large firm?

• A random sample of 100 managers was selected and data were collected as follows:

• Annual salary

• Years of education

• Years of experience

• Gender

Human Resources Management: Price (II)Pay-Equity

• Solution

• Construct the following multiple regression model:y = b0 + b1Education + b2Experience + b3Gender + e

• Note the nature of the variables:

• Education – quantitative

• Experience – quantitative

• Gender – qualitative (Gender = 1 if male; =0 otherwise).

Human Resources Management: Price (II)Pay-Equity

• Solution – Continued (HumanResource)

• Analysis and Interpretation

• The model fits the data quite well.

• The model is very useful.

• Experience is a variable strongly related to salary.

• There is no evidence of sex discrimination.

Human Resources Management: Price (II)Pay-Equity

• Solution – Continued (HumanResource)

• Analysis and Interpretation

• Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17

• Average salary for female manager is \$76,189 Average salary for male manager is \$97,832

Review problems Price (II)