Determining Factors of Market Success

1 / 35

# Determining Factors of Market Success - PowerPoint PPT Presentation

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Determining Factors of Market Success' - giselle-brown

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Determining Factors of Market Success

DMD #4

David Kopcso and Richard Cleary

BabsonCollege

Learning Objectives
• Determine the strength of (linear) relationships
• Describe a regression model with one or more explanatory variables
• Interpret regression coefficients
• Evaluate the model in a business context
Modeling Relationships
• If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship.
• Hopefully, this helps to:
• Make more accurate predictions
• Show the direction and strength of relationship
• Reduce the amount of uncertainty.
Approach

Investigate variables individually and jointly.

IndividuallyJointly

Numerically: Standard Stats Correlation

Graphically: Histogram Scatter Plot

Box Plot

Scatter Plots
• Positive Linear Relationship
• Nonlinear Relationship
• Negative Linear Relationship
• No Relationship
Correlation Coefficient
• Unit free: ranges between -1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker any linear relationship
• Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence
Correlation Coefficient r
• r = +1
• Y
• Y
• r = -1
• X
• X
• r = +0.9
• r = 0
• Y
• Y
• X
• X
Correlation MeasuresOnly Linear Dependence!

X Y = exp(X)

1 3

2 7

3 20

4 55

5 148

6 403

7 1097

8 2981

9 8103

10 22026

11 59874

12 162755

13 442413

14 1202604

15 3269017

16 8886111

17 24154953

18 65659969

19 178482301

20 485165195

• X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539
Linear RegressionModel
• Assume that the relationship between the variables is linear:
• Slope
• Y-Intercept
• Error
• Y
• X
• 0
• 1
• i
• Dependent (Response) Variable
• Independent (Explanatory) Variable
Model

Do you think knowing the size of a house helps “explain” the variation in house prices?

Population Model:

Price = b0 + b1 Sq. Footage + e

Estimated Equation:

Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage

Y

• b
• b
• X
• e
• i
• 0
• 1
• i
• i
• e
• = Residual
• i
• ^
• Y
• b
• b
• X
• i
• 0
• 1
• i
Linear Regression Model
• Y
• Unsampled Observation
• X
Estimated Model

Est. Price = b0 + b1 Sq. Footage

Est. Price = 117,663 + 173 Sq. Footage

Model Interpretation
• b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage.
• Price will increase by \$173 on average for each additional square foot.
• b0: The average Price when Square Footage equals zero.
• Average value of Price is \$117,663 when there is no Square Footage.
• Does this statement make sense? Does this result have managerial significance?
Hypothesis Test: No Linear Relationship
• Tests whether there is a (linear) relationship between X & Y
• Hypotheses
• H0: 1 = 0 (No Linear Relationship)
• H1: 1 0 (Linear Relationship)
• Compare p-value to a
• Interpretation
• If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.
Quality of Model
• We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy.
• We have two measures of fit: R-squared and S (aka SEE).

2

• R
The Famed R2
• Explained Variance
• Coefficient of determination (R2)
• The closer the R2 to 1, the better the “fit”

R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value).

• Square Footage explains 92% of the variation in Price.
• Total Variance
Accuracy: Standard Error of Estimate
• Standard error of the estimate: S (or SEE)
• The smaller the S, the better the “fit”
• The units of S are the same as the units of the Y variable.
• When using our regression model for predicting home prices, we would be off on average plus/minus \$46,631.
In-Class Activity:
• Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable.
• Interpret the following in the context of the model:
• Slope and intercept
• Strength of linear relationship (R)
• The usefulness of the slope (p-value)
• Graph of relationship
• Evaluation of the model, i.e., R2 and S.
• Use the model to predict Salary for a fictitious employee.
Multiple Linear Regression(MLR)
• We assume that the relationship between variables is linear:
• Y
• X
• X
• X
• 2
• 3
• 0
• 1
• 3
• 1
• 2
Model Building
• Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors.
• Generally, you want at least 10 observations per variable selected if possible.
Variable Investigation
• Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots.
• To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.
Multiple Regression Output
• Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms
Slopes in MLR
• Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant.
• Price will increase on average by \$111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.
p-values in MLR
• Each explanatory (independent) variable has its own p-value.
• When looked at individually, is the variable’s slope statistically different than zero?
• If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted.
• If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.
Coefficient of Determination R2
• R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.
Output
• Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.
Standard Error of Estimate in MLR
• The interpretation of S (aka SEE) is the same in multiple regression as it is in simple.
• Thus, we expect to be off on average plus or minus \$28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.
Variation Reduction
• How do we know if the S (SEE) is low or high?

Is it small enough to make the predictions from the regressions useful?

Compare it to the standard deviation of the response (dependent) variable.

S: (SEE) S: St Dev(Price)

\$28,765 vs. \$161,666

Predicting Using Regression
• Recall we have:
• Assume this is a good equation.
• Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths.
• Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.

• A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as:
• Est Price +/- 2 *SEE/SQRT(n).
• In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.
In-Class Activity:
• Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables.
• Interpret the following in the context of the model:
• Slopes and intercept.
• Strength of linear relationship (R)
• The usefulness of the slope (p-value).
• Graphs of relationship.
• Evaluation of the model, i.e., R2 and SEE.
• Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.