Loading in 5 sec....

Statistics for Business and EconomicsPowerPoint Presentation

Statistics for Business and Economics

- 197 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Statistics for Business and Economics' - rinah-hicks

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Regression Modeling Steps

Regression Modeling Steps

Learning Objectives

- Describe the Linear Regression Model
- State the Regression Modeling Steps
- Explain Least Squares
- Compute Regression Coefficients
- Explain Correlation
- Predict Response Variable

What is Regression?

- 1. Method of modeling relationships
- between a response variable Y and one or more predictors X
- 2. A way of “fitting line through data”

Example 1: Store Site Selection

Model sales Y at existing sites as a function of demographic variables:

X1 = Population in store vicinity

X2 = Income in area

X3 = Age of houses in area

X4 =

X5 =

From equation, predict sales at new sites

Example 2:Marketing Research

Model consumer response to a product on basis of product characteristics:

Y = Taste score on soft drink

X1 = Sugar level

X2 = Carbonation level

X3 =

X4 =

Example 3:Operations/Quality

Model product quality in plant as a function of manufacturing process characteristics:

Y = Quality score of sheet metal

X1 = Raw material purity score

X2 = Molten aluminum temp

X3 = Line speed

X4 =

Example 4:Real Estate Pricing

Y = Selling price of houses

X1 = Square feet

X2 = Taxes

X3 = Lot acreage

X4 =

X5 =

X6 =

One-Predictor Regression

25 Homes Sold in Essex County, New Jersey, 1996

One-Predictor Regression

Regression Equation: Y= -39001 + 85.0 X

Regression Models

- Answers ‘What is the relationship between the variables?’
- Equation used
- One numerical dependent (response) variable
- What is to be predicted

- One or more numerical or categorical independent (explanatory) variables

- One numerical dependent (response) variable
- Used mainly for prediction and estimation

Prediction Using Regression

Y= -39001 + 85.0(4000)= $300,999

Regression Modeling Steps

- Hypothesize deterministic component
- Estimate unknown model parameters
- Specify probability distribution of random error term
- Estimate standard deviation of error

- Evaluate model
- Use model for prediction and estimation

Specifying the Model

- Define variables
- Conceptual (e.g., Advertising, price)
- Empirical (e.g., List price, regular price)
- Measurement (e.g., $, Units)

- Hypothesize nature of relationship
- Expected effects (i.e., Coefficients’ signs)
- Functional form (linear or non-linear)
- Interactions

Model Specification Is Based on Theory

- Theory of field (e.g., Sociology)
- Mathematical theory
- Previous research
- ‘Common sense’

Thinking Challenge: Which Is More Logical?

Sales

Sales

Advertising

Advertising

Sales

Sales

Advertising

Advertising

2+ Explanatory

Variable

Variables

Multiple

Simple

Non-

Non-

Linear

Linear

Linear

Linear

Types of Regression ModelsRegressionModels

Linear Regression Model

Relationship between variables is a linear function

Population y-intercept

Population Slope

Random Error

y

x

0

1

Dependent (Response) Variable

Independent (Explanatory) Variable

Line of Means

y

E(y) = β0 + β1x (line of means)

Change in y

β1 = Slope

Change in x

β0 =y-intercept

x

High school teacher

Linear Regression Model

- 1. Relationship between variables is a linear function

Population Y-intercept

Population slope

Independent (explanatory) variable

Y

X

i

0

1

i

i

Dependent (response) variable

Random error

Regression Modeling Steps

- Hypothesize deterministic component
- Estimate unknown model parameters
- Specify probability distribution of random error term
- Estimate standard deviation of error

- Evaluate model
- Use model for prediction and estimation

60

40

20

0

x

0

20

40

60

Thinking Challenge- How would you draw a line through the points?
- How do you determine which line ‘fits best’?

Least Squares

- ‘Best fit’ means difference between actual y values and predicted y values are a minimum
- But positive differences off-set negative

- Least Squares minimizes the Sum of the Squared Differences (SSE)

Computation Table

2

2

xi

yi

xi

xiyi

yi

2

2

x1

y1

x1

x1y1

y1

2

2

x2

y2

x2

y2

x2y2

:

:

:

:

:

2

2

xn

yn

xn

yn

xnyn

2

2

Σxi

Σyi

Σxi

Σxiyi

Σyi

- Slope (1)
- Estimated y changes by 1 for each 1unit increase in x
- If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x)

- Estimated y changes by 1 for each 1unit increase in x

^

^

^

- Y-Intercept (0)
- Average value of y when x = 0
- If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0

- Average value of y when x = 0

^

Interpretation of CoefficientsLeast Squares Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Find the least squares line relatingsales and advertising.

Parameter Estimation Solution Table

2

2

xi

yi

xi

xiyi

yi

1

1

1

1

1

2

1

4

1

2

3

2

9

4

6

4

2

16

4

8

5

4

25

16

20

15

10

55

26

37

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 -0.1000 0.6350 -0.157 0.8849

ADVERT 1 0.7000 0.1914 3.656 0.0354

Parameter Estimation Computer Output^

0

^

1

- Slope (1)
- Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)

^

- Y-Intercept (0)
- Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0
- Difficult to explain to marketing manager
- Expect some sales without advertising

- Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0

Regression Modeling Steps

- Hypothesize deterministic component
- Estimate unknown model parameters
- Specify probability distribution of random error term
- Estimate standard deviation of error

- Evaluate model
- Use model for prediction and estimation

Linear Regression Assumptions

- Mean of probability distribution of error, ε, is 0
- Probability distribution of error has constant variance
- Probability distribution of error, ε, is normal
- Errors are independent

Estimating s2

Recall that:

where is our estimator of the mean.

Now substitute (1) Yi for Xi , (2) ,

and n-2 df for n-1:

Why are df = n - 2?

1. Two statistics are estimated to compute the regression line, b0 and b1

2. As a result, if I know any n-2 residuals, the other two can be computed from ei = 0 and SeiXi = 0.

^

^

^

^

- Hypothesize deterministic component
- Estimate unknown model parameters
- Specify probability distribution of random error term
- Estimate standard deviation of error

- Evaluate model
- Use model for prediction and estimation

Test of Slope Coefficient

- Shows if there is a linear relationship between x and y
- Involves population slope 1
- Hypotheses
- H0: 1 = 0 (No Linear Relationship)
- Ha: 1 0 (Linear Relationship)

- Theoretical basis is sampling distribution of slope

All Possible Sample Slopes

Sample 1: 2.5

Sample 2: 1.6

Sample 3: 1.8

Sample 4: 2.1 : :Very large number of sample slopes

Sampling Distribution

^

S

1

^

b

1

1

Sampling Distribution of Sample SlopesSample 1 Line

y

Sample 2 Line

Population Line

x

Slope Coefficient Test Statistic

Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1,β1 = .7and s= .6055.

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Is the relationship significantat the .05 level of significance?

^

^

Test StatisticSolution

Computing a CI for a Slope

A (1 - a)100% confidence interval for the true slope parameter b1 is given by:

This assumes that all of the standard regression assumptions are met!

Example from previous slide:

Coefficient of Determination

Proportion of variation ‘explained’ by relationship between x and y

0 r2 1

r2 = (coefficient of correlation)2

Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys. You know r = .904.

Ad $Sales (Units) 1 1 2 1 3 2 4 2 5 4

Calculate and interpret thecoefficient of determination.

Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Root MSE 0.60553 R-square 0.8167

Dep Mean 2.00000 Adj R-sq 0.7556

C.V. 30.27650

r2 Computer Outputr2

r2 adjusted for number of explanatory variables & sample size

Correlation Models

1. Answer ‘How strongis the linear relationship between 2 variables?’

2. Coefficient of correlation used

Population coefficient denoted (rho)

Values range from -1 to +1

Measures degree of association

3. Used mainly for understanding

Sample Coefficient of Correlation

1. Pearson product moment coefficient of correlation, r

Coefficient of Correlation Values

Perfect Negative Correlation

Perfect Positive Correlation

No Correlation

-1.0

-.5

0

+.5

+1.0

- Hypothesize deterministic component
- Estimate unknown model parameters
- Specify probability distribution of random error term
- Estimate standard deviation of error

- Evaluate model
- Use model for prediction and estimation

Prediction With Regression Models

- Types of predictions
- Point estimates
- Interval estimates

- What is predicted
- Population mean response E(y) for given x
- Point on population regression line

- Individual response (yi) for given x

- Population mean response E(y) for given x

What Is Predicted

y

^

y

^

Individual

yi = b0 +b1x

^

Mean y, E(y)

E(y) = b0 +b1x

^

Prediction, y

x

xP

Confidence Interval Estimate for Mean Value of y at x = xp

df = n – 2

Factors Affecting Interval Width

- Level of confidence (1 – )
- Width increases as confidence increases

- Data dispersion (s)
- Width increases as variation increases

- Sample size
- Width decreases as sample size increases

- Distance of xp from meanx
- Width increases as distance increases

Confidence Interval Estimate Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1,β1 = .7and s= .6055.

Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^

^

Confidence Interval Estimate Solution

x to be predicted

^

yi = b0 +b1xi

^

Why the Extra ‘S’?y

y we're trying topredict

e

Expected

(Mean) y

E(y) = b0 +b1x

^

Prediction, y

x

xp

Prediction Interval Solution

x to be predicted

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%

Obs SALES Value Predict Mean Mean Predict Predict

1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037

2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497

3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111

4 2.000 2.700 0.332 1.644 3.755 0.502 4.897

5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Interval Estimate Computer OutputPredictionInterval

Predicted y when x = 4

Confidence Interval

SY

^

Conclusion

- Described the Linear Regression Model
- Stated the Regression Modeling Steps
- Explained Least Squares
- Computed Regression Coefficients
- Used the model for prediction and estimation
- Evaluated model on the basis of significance, Rsquare, size of prediction intervals
- Discussed Rsquare and correlation coefficient

Download Presentation

Connecting to Server..