Statistics for business and economics
Download
1 / 76

Statistics for Business and Economics - PowerPoint PPT Presentation


  • 197 Views
  • Uploaded on

Statistics for Business and Economics. Chapter 10 Simple Linear Regression. Learning Objectives. Describe the Linear Regression Model State the Regression Modeling Steps Explain Least Squares Compute Regression Coefficients Explain Correlation Predict Response Variable.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Statistics for Business and Economics' - rinah-hicks


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics for business and economics

Statistics for Business and Economics

Chapter 10

Simple Linear Regression


Learning objectives
Learning Objectives

  • Describe the Linear Regression Model

  • State the Regression Modeling Steps

  • Explain Least Squares

  • Compute Regression Coefficients

  • Explain Correlation

  • Predict Response Variable


What is regression
What is Regression?

  • 1. Method of modeling relationships

  • between a response variable Y and one or more predictors X

  • 2. A way of “fitting line through data”


Example 1 store site selection
Example 1: Store Site Selection

Model sales Y at existing sites as a function of demographic variables:

X1 = Population in store vicinity

X2 = Income in area

X3 = Age of houses in area

X4 =

X5 =

From equation, predict sales at new sites


Example 2 marketing research
Example 2:Marketing Research

Model consumer response to a product on basis of product characteristics:

Y = Taste score on soft drink

X1 = Sugar level

X2 = Carbonation level

X3 =

X4 =


Example 3 operations quality
Example 3:Operations/Quality

Model product quality in plant as a function of manufacturing process characteristics:

Y = Quality score of sheet metal

X1 = Raw material purity score

X2 = Molten aluminum temp

X3 = Line speed

X4 =


Example 4 real estate pricing
Example 4:Real Estate Pricing

Y = Selling price of houses

X1 = Square feet

X2 = Taxes

X3 = Lot acreage

X4 =

X5 =

X6 =


One predictor regression
One-Predictor Regression

25 Homes Sold in Essex County, New Jersey, 1996


One predictor regression1
One-Predictor Regression

Regression Equation: Y= -39001 + 85.0 X


Regression models
Regression Models

  • Answers ‘What is the relationship between the variables?’

  • Equation used

    • One numerical dependent (response) variable

      • What is to be predicted

    • One or more numerical or categorical independent (explanatory) variables

  • Used mainly for prediction and estimation


Prediction using regression
Prediction Using Regression

Y= -39001 + 85.0(4000)= $300,999


Regression modeling steps
Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Specifying the model
Specifying the Model

  • Define variables

    • Conceptual (e.g., Advertising, price)

    • Empirical (e.g., List price, regular price)

    • Measurement (e.g., $, Units)

  • Hypothesize nature of relationship

    • Expected effects (i.e., Coefficients’ signs)

    • Functional form (linear or non-linear)

    • Interactions


Model specification is based on theory
Model Specification Is Based on Theory

  • Theory of field (e.g., Sociology)

  • Mathematical theory

  • Previous research

  • ‘Common sense’


Thinking challenge which is more logical
Thinking Challenge: Which Is More Logical?

Sales

Sales

Advertising

Advertising

Sales

Sales

Advertising

Advertising


Types of regression models

1 Explanatory

2+ Explanatory

Variable

Variables

Multiple

Simple

Non-

Non-

Linear

Linear

Linear

Linear

Types of Regression Models

RegressionModels


Linear regression model
Linear Regression Model

Relationship between variables is a linear function

Population y-intercept

Population Slope

Random Error

y

x

0

1

Dependent (Response) Variable

Independent (Explanatory) Variable


Line of means
Line of Means

y

E(y) = β0 + β1x (line of means)

Change in y

β1 = Slope

Change in x

β0 =y-intercept

x

High school teacher


Linear regression model1
Linear Regression Model

  • 1. Relationship between variables is a linear function

Population Y-intercept

Population slope

Independent (explanatory) variable

Y

X

i

0

1

i

i

Dependent (response) variable

Random error


Population linear regression model
Population Linear Regression Model

Observedvalue

i= Random error

Observed value


Sample linear regression model
Sample LinearRegression Model

Y

True Unknown Line

X

Observed value


Sample linear regression model1
Sample LinearRegression Model

Y

i= Random error

^

X

Observed value


Regression modeling steps1
Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Scattergram

y

60

40

20

0

x

0

20

40

60

Scattergram

  • Plot of all (xi, yi) pairs

  • Suggests how well model will fit


Thinking challenge

y

60

40

20

0

x

0

20

40

60

Thinking Challenge

  • How would you draw a line through the points?

  • How do you determine which line ‘fits best’?


Least squares
Least Squares

  • ‘Best fit’ means difference between actual y values and predicted y values are a minimum

    • But positive differences off-set negative

  • Least Squares minimizes the Sum of the Squared Differences (SSE)


Least squares graphically
Least Squares Graphically

y

^

e

^

4

e

2

^

e

^

e

1

3

x


Coefficient equations
Coefficient Equations

Prediction Equation

Slope

y-intercept


Computation table
Computation Table

2

2

xi

yi

xi

xiyi

yi

2

2

x1

y1

x1

x1y1

y1

2

2

x2

y2

x2

y2

x2y2

:

:

:

:

:

2

2

xn

yn

xn

yn

xnyn

2

2

Σxi

Σyi

Σxi

Σxiyi

Σyi


Interpretation of coefficients

^

  • Slope (1)

    • Estimated y changes by 1 for each 1unit increase in x

      • If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x)

^

^

^

  • Y-Intercept (0)

    • Average value of y when x = 0

      • If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0

^

Interpretation of Coefficients


Least squares example
Least Squares Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Find the least squares line relatingsales and advertising.


Scattergram sales vs advertising
Scattergram Sales vs. Advertising

Sales

4

3

2

1

0

0

1

2

3

4

5

Advertising


Parameter estimation solution table
Parameter Estimation Solution Table

2

2

xi

yi

xi

xiyi

yi

1

1

1

1

1

2

1

4

1

2

3

2

9

4

6

4

2

16

4

8

5

4

25

16

20

15

10

55

26

37



Parameter estimation computer output

Parameter Estimates

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 -0.1000 0.6350 -0.157 0.8849

ADVERT 1 0.7000 0.1914 3.656 0.0354

Parameter Estimation Computer Output

^

0

^

1



Coefficient interpretation solution

^

  • Slope (1)

    • Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)

^

  • Y-Intercept (0)

    • Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0

      • Difficult to explain to marketing manager

      • Expect some sales without advertising

Coefficient Interpretation Solution


Regression modeling steps2
Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Linear regression assumptions
Linear Regression Assumptions

  • Mean of probability distribution of error, ε, is 0

  • Probability distribution of error has constant variance

  • Probability distribution of error, ε, is normal

  • Errors are independent


Error probability distribution
Error Probability Distribution

^

f(

)

Y

X

1

X

2

X


Error probability distribution1
Error Probability Distribution

^

f(

)

Y

X

1

X

2

X


Error probability distribution2
Error Probability Distribution

^

f(

)

Y

X

1

X

2

X


Estimating s 2
Estimating s2

Recall that:

where is our estimator of the mean.

Now substitute (1) Yi for Xi , (2) ,

and n-2 df for n-1:


Why are df n 2
Why are df = n - 2?

1. Two statistics are estimated to compute the regression line, b0 and b1

2. As a result, if I know any n-2 residuals, the other two can be computed from ei = 0 and SeiXi = 0.

^

^

^

^


Regression modeling steps3
Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Test of slope coefficient
Test of Slope Coefficient

  • Shows if there is a linear relationship between x and y

  • Involves population slope 1

  • Hypotheses

    • H0: 1 = 0 (No Linear Relationship)

    • Ha: 1 0 (Linear Relationship)

  • Theoretical basis is sampling distribution of slope


Sampling distribution of sample slopes

All Possible Sample Slopes

Sample 1: 2.5

Sample 2: 1.6

Sample 3: 1.8

Sample 4: 2.1 : :Very large number of sample slopes

Sampling Distribution

^

S

1

^

b

1

1

Sampling Distribution of Sample Slopes

Sample 1 Line

y

Sample 2 Line

Population Line

x


Slope coefficient test statistic
Slope Coefficient Test Statistic


Test of slope coefficient example
Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1,β1 = .7and s= .6055.

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Is the relationship significantat the .05 level of significance?

^

^



Jmp fit model results1
JMPFit Model Results

S

^

^

1

t = 1/ S

^

1

^

1

P-Value


Computing a ci for a slope
Computing a CI for a Slope

A (1 - a)100% confidence interval for the true slope parameter b1 is given by:

This assumes that all of the standard regression assumptions are met!

Example from previous slide:


Coefficient of determination
Coefficient of Determination

Proportion of variation ‘explained’ by relationship between x and y

0 r2 1

r2 = (coefficient of correlation)2


Coefficient of determination examples
Coefficient of Determination Examples

r2 = 1

r2 = 1

r2 = .8

r2 = 0


Coefficient of determination example
Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys. You know r = .904.

Ad $Sales (Units) 1 1 2 1 3 2 4 2 5 4

Calculate and interpret thecoefficient of determination.


Coefficient of determination solution
Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.


R 2 computer output

Root MSE 0.60553 R-square 0.8167

Dep Mean 2.00000 Adj R-sq 0.7556

C.V. 30.27650

r2 Computer Output

r2

r2 adjusted for number of explanatory variables & sample size


Jmp fit model results2
JMPFit Model Results

r2

r2 adjusted for number of explanatory variables & sample size


Correlation models
Correlation Models

1. Answer ‘How strongis the linear relationship between 2 variables?’

2. Coefficient of correlation used

Population coefficient denoted  (rho)

Values range from -1 to +1

Measures degree of association

3. Used mainly for understanding


Sample coefficient of correlation
Sample Coefficient of Correlation

1. Pearson product moment coefficient of correlation, r


Coefficient of correlation values
Coefficient of Correlation Values

Perfect Negative Correlation

Perfect Positive Correlation

No Correlation

-1.0

-.5

0

+.5

+1.0


Coefficient of correlation examples
Coefficient of CorrelationExamples

r = 1

r = -1

r = .89

r = 0


Regression modeling steps4
Regression Modeling Steps

  • Hypothesize deterministic component

  • Estimate unknown model parameters

  • Specify probability distribution of random error term

    • Estimate standard deviation of error

  • Evaluate model

  • Use model for prediction and estimation


Prediction with regression models
Prediction With Regression Models

  • Types of predictions

    • Point estimates

    • Interval estimates

  • What is predicted

    • Population mean response E(y) for given x

      • Point on population regression line

    • Individual response (yi) for given x


What is predicted
What Is Predicted

y

^

y

^

Individual

yi = b0 +b1x

^

Mean y, E(y)

E(y) = b0 +b1x

^

Prediction, y

x

xP



Factors affecting interval width
Factors Affecting Interval Width

  • Level of confidence (1 – )

    • Width increases as confidence increases

  • Data dispersion (s)

    • Width increases as variation increases

  • Sample size

    • Width decreases as sample size increases

  • Distance of xp from meanx

    • Width increases as distance increases


Confidence interval estimate example
Confidence Interval Estimate Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1,β1 = .7and s= .6055.

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^

^



Prediction interval of individual value of y at x x p
Prediction Interval of Individual Value of y at x = xp

Note!

df = n – 2


Why the extra s

^

^

yi = b0 +b1xi

^

Why the Extra ‘S’?

y

y we're trying topredict

e

Expected

(Mean) y

E(y) = b0 +b1x

^

Prediction, y

x

xp


Prediction interval solution
Prediction Interval Solution

x to be predicted


Interval estimate computer output

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%

Obs SALES Value Predict Mean Mean Predict Predict

1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037

2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497

3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111

4 2.000 2.700 0.332 1.644 3.755 0.502 4.897

5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Interval Estimate Computer Output

PredictionInterval

Predicted y when x = 4

Confidence Interval

SY

^


Confidence intervals v prediction intervals

^

^

yi = b0 +b1xi

^

Confidence Intervals v. Prediction Intervals

y

x

x



Conclusion
Conclusion

  • Described the Linear Regression Model

  • Stated the Regression Modeling Steps

  • Explained Least Squares

  • Computed Regression Coefficients

  • Used the model for prediction and estimation

  • Evaluated model on the basis of significance, Rsquare, size of prediction intervals

  • Discussed Rsquare and correlation coefficient


ad