1 / 76

# Statistics for Business and Economics - PowerPoint PPT Presentation

Statistics for Business and Economics. Chapter 10 Simple Linear Regression. Learning Objectives. Describe the Linear Regression Model State the Regression Modeling Steps Explain Least Squares Compute Regression Coefficients Explain Correlation Predict Response Variable.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Statistics for Business and Economics' - rinah-hicks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics for Business and Economics

Chapter 10

Simple Linear Regression

• Describe the Linear Regression Model

• State the Regression Modeling Steps

• Explain Least Squares

• Compute Regression Coefficients

• Explain Correlation

• Predict Response Variable

• 1. Method of modeling relationships

• between a response variable Y and one or more predictors X

• 2. A way of “fitting line through data”

Example 1: Store Site Selection

Model sales Y at existing sites as a function of demographic variables:

X1 = Population in store vicinity

X2 = Income in area

X3 = Age of houses in area

X4 =

X5 =

From equation, predict sales at new sites

Example 2:Marketing Research

Model consumer response to a product on basis of product characteristics:

Y = Taste score on soft drink

X1 = Sugar level

X2 = Carbonation level

X3 =

X4 =

Example 3:Operations/Quality

Model product quality in plant as a function of manufacturing process characteristics:

Y = Quality score of sheet metal

X1 = Raw material purity score

X2 = Molten aluminum temp

X3 = Line speed

X4 =

Example 4:Real Estate Pricing

Y = Selling price of houses

X1 = Square feet

X2 = Taxes

X3 = Lot acreage

X4 =

X5 =

X6 =

25 Homes Sold in Essex County, New Jersey, 1996

Regression Equation: Y= -39001 + 85.0 X

• Answers ‘What is the relationship between the variables?’

• Equation used

• One numerical dependent (response) variable

• What is to be predicted

• One or more numerical or categorical independent (explanatory) variables

• Used mainly for prediction and estimation

Y= -39001 + 85.0(4000)= \$300,999

• Hypothesize deterministic component

• Estimate unknown model parameters

• Specify probability distribution of random error term

• Estimate standard deviation of error

• Evaluate model

• Use model for prediction and estimation

• Define variables

• Empirical (e.g., List price, regular price)

• Measurement (e.g., \$, Units)

• Hypothesize nature of relationship

• Expected effects (i.e., Coefficients’ signs)

• Functional form (linear or non-linear)

• Interactions

Model Specification Is Based on Theory

• Theory of field (e.g., Sociology)

• Mathematical theory

• Previous research

• ‘Common sense’

Thinking Challenge: Which Is More Logical?

Sales

Sales

Sales

Sales

2+ Explanatory

Variable

Variables

Multiple

Simple

Non-

Non-

Linear

Linear

Linear

Linear

Types of Regression Models

RegressionModels

Relationship between variables is a linear function

Population y-intercept

Population Slope

Random Error

y

x

0

1

Dependent (Response) Variable

Independent (Explanatory) Variable

y

E(y) = β0 + β1x (line of means)

Change in y

β1 = Slope

Change in x

β0 =y-intercept

x

High school teacher

• 1. Relationship between variables is a linear function

Population Y-intercept

Population slope

Independent (explanatory) variable

Y

X

i

0

1

i

i

Dependent (response) variable

Random error

Observedvalue

i= Random error

Observed value

Sample LinearRegression Model

Y

True Unknown Line

X

Observed value

Sample LinearRegression Model

Y

i= Random error

^

X

Observed value

• Hypothesize deterministic component

• Estimate unknown model parameters

• Specify probability distribution of random error term

• Estimate standard deviation of error

• Evaluate model

• Use model for prediction and estimation

60

40

20

0

x

0

20

40

60

Scattergram

• Plot of all (xi, yi) pairs

• Suggests how well model will fit

60

40

20

0

x

0

20

40

60

Thinking Challenge

• How would you draw a line through the points?

• How do you determine which line ‘fits best’?

• ‘Best fit’ means difference between actual y values and predicted y values are a minimum

• But positive differences off-set negative

• Least Squares minimizes the Sum of the Squared Differences (SSE)

y

^

e

^

4

e

2

^

e

^

e

1

3

x

Prediction Equation

Slope

y-intercept

2

2

xi

yi

xi

xiyi

yi

2

2

x1

y1

x1

x1y1

y1

2

2

x2

y2

x2

y2

x2y2

:

:

:

:

:

2

2

xn

yn

xn

yn

xnyn

2

2

Σxi

Σyi

Σxi

Σxiyi

Σyi

• Slope (1)

• Estimated y changes by 1 for each 1unit increase in x

• If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x)

^

^

^

• Y-Intercept (0)

• Average value of y when x = 0

• If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0

^

Interpretation of Coefficients

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad \$Sales (Units)1 1 2 1 3 2 4 2 5 4

Find the least squares line relatingsales and advertising.

Sales

4

3

2

1

0

0

1

2

3

4

5

2

2

xi

yi

xi

xiyi

yi

1

1

1

1

1

2

1

4

1

2

3

2

9

4

6

4

2

16

4

8

5

4

25

16

20

15

10

55

26

37

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 -0.1000 0.6350 -0.157 0.8849

ADVERT 1 0.7000 0.1914 3.656 0.0354

Parameter Estimation Computer Output

^

0

^

1

• Slope (1)

• Sales Volume (y) is expected to increase by .7 units for each \$1 increase in Advertising (x)

^

• Y-Intercept (0)

• Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0

• Difficult to explain to marketing manager

• Expect some sales without advertising

Coefficient Interpretation Solution

• Hypothesize deterministic component

• Estimate unknown model parameters

• Specify probability distribution of random error term

• Estimate standard deviation of error

• Evaluate model

• Use model for prediction and estimation

• Mean of probability distribution of error, ε, is 0

• Probability distribution of error has constant variance

• Probability distribution of error, ε, is normal

• Errors are independent

Error Probability Distribution

^

f(

)

Y

X

1

X

2

X

Error Probability Distribution

^

f(

)

Y

X

1

X

2

X

Error Probability Distribution

^

f(

)

Y

X

1

X

2

X

Recall that:

where is our estimator of the mean.

Now substitute (1) Yi for Xi , (2) ,

and n-2 df for n-1:

1. Two statistics are estimated to compute the regression line, b0 and b1

2. As a result, if I know any n-2 residuals, the other two can be computed from ei = 0 and SeiXi = 0.

^

^

^

^

• Hypothesize deterministic component

• Estimate unknown model parameters

• Specify probability distribution of random error term

• Estimate standard deviation of error

• Evaluate model

• Use model for prediction and estimation

• Shows if there is a linear relationship between x and y

• Involves population slope 1

• Hypotheses

• H0: 1 = 0 (No Linear Relationship)

• Ha: 1 0 (Linear Relationship)

• Theoretical basis is sampling distribution of slope

All Possible Sample Slopes

Sample 1: 2.5

Sample 2: 1.6

Sample 3: 1.8

Sample 4: 2.1 : :Very large number of sample slopes

Sampling Distribution

^

S

1

^

b

1

1

Sampling Distribution of Sample Slopes

Sample 1 Line

y

Sample 2 Line

Population Line

x

Slope Coefficient Test Statistic

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1,β1 = .7and s= .6055.

Ad \$Sales (Units)1 1 2 1 3 2 4 2 5 4

Is the relationship significantat the .05 level of significance?

^

^

JMPFit Model Results

S

^

^

1

t = 1/ S

^

1

^

1

P-Value

A (1 - a)100% confidence interval for the true slope parameter b1 is given by:

This assumes that all of the standard regression assumptions are met!

Example from previous slide:

Proportion of variation ‘explained’ by relationship between x and y

0 r2 1

r2 = (coefficient of correlation)2

r2 = 1

r2 = 1

r2 = .8

r2 = 0

You’re a marketing analyst for Hasbro Toys. You know r = .904.

Ad \$Sales (Units) 1 1 2 1 3 2 4 2 5 4

Calculate and interpret thecoefficient of determination.

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad \$ (x) to predict Sales (y) in the linear model.

Dep Mean 2.00000 Adj R-sq 0.7556

C.V. 30.27650

r2 Computer Output

r2

r2 adjusted for number of explanatory variables & sample size

JMPFit Model Results

r2

r2 adjusted for number of explanatory variables & sample size

1. Answer ‘How strongis the linear relationship between 2 variables?’

2. Coefficient of correlation used

Population coefficient denoted  (rho)

Values range from -1 to +1

Measures degree of association

3. Used mainly for understanding

Sample Coefficient of Correlation

1. Pearson product moment coefficient of correlation, r

Perfect Negative Correlation

Perfect Positive Correlation

No Correlation

-1.0

-.5

0

+.5

+1.0

Coefficient of CorrelationExamples

r = 1

r = -1

r = .89

r = 0

• Hypothesize deterministic component

• Estimate unknown model parameters

• Specify probability distribution of random error term

• Estimate standard deviation of error

• Evaluate model

• Use model for prediction and estimation

• Types of predictions

• Point estimates

• Interval estimates

• What is predicted

• Population mean response E(y) for given x

• Point on population regression line

• Individual response (yi) for given x

y

^

y

^

Individual

yi = b0 +b1x

^

Mean y, E(y)

E(y) = b0 +b1x

^

Prediction, y

x

xP

df = n – 2

Factors Affecting Interval Width

• Level of confidence (1 – )

• Width increases as confidence increases

• Data dispersion (s)

• Width increases as variation increases

• Sample size

• Width decreases as sample size increases

• Distance of xp from meanx

• Width increases as distance increases

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1,β1 = .7and s= .6055.

Ad \$Sales (Units)1 1 2 1 3 2 4 2 5 4

Find a 95% confidence interval forthe mean sales when advertising is \$4.

^

^

x to be predicted

Prediction Interval of Individual Value of y at x = xp

Note!

df = n – 2

^

yi = b0 +b1xi

^

Why the Extra ‘S’?

y

y we're trying topredict

e

Expected

(Mean) y

E(y) = b0 +b1x

^

Prediction, y

x

xp

x to be predicted

Obs SALES Value Predict Mean Mean Predict Predict

1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037

2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497

3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111

4 2.000 2.700 0.332 1.644 3.755 0.502 4.897

5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Interval Estimate Computer Output

PredictionInterval

Predicted y when x = 4

Confidence Interval

SY

^

^

yi = b0 +b1xi

^

Confidence Intervals v. Prediction Intervals

y

x

x

• Described the Linear Regression Model

• Stated the Regression Modeling Steps

• Explained Least Squares

• Computed Regression Coefficients

• Used the model for prediction and estimation

• Evaluated model on the basis of significance, Rsquare, size of prediction intervals

• Discussed Rsquare and correlation coefficient