# Determining Factors of Market Success - PowerPoint PPT Presentation

1 / 35

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Determining Factors of Market Success

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Determining Factors of Market Success

DMD #4

David Kopcso and Richard Cleary

BabsonCollege

F. W. Olin Graduate School of Business

### Learning Objectives

• Determine the strength of (linear) relationships

• Describe a regression model with one or more explanatory variables

• Interpret regression coefficients

• Evaluate the model in a business context

### Modeling Relationships

• If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship.

• Hopefully, this helps to:

• Make more accurate predictions

• Show the direction and strength of relationship

• Reduce the amount of uncertainty.

### Approach

Investigate variables individually and jointly.

IndividuallyJointly

Numerically:Standard StatsCorrelation

Graphically:Histogram Scatter Plot

Box Plot

### Scatter Plots

• Positive Linear Relationship

• Nonlinear Relationship

• Negative Linear Relationship

• No Relationship

### Correlation Coefficient

• Unit free: ranges between -1 and 1

• The closer to –1, the stronger the negative linear relationship

• The closer to 1, the stronger the positive linear relationship

• The closer to 0, the weaker any linear relationship

• Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence

• r = +1

• Y

• Y

• r = -1

• X

• X

• r = +0.9

• r = 0

• Y

• Y

• X

• X

### Correlation MeasuresOnly Linear Dependence!

XY = exp(X)

13

27

320

455

5148

6403

71097

82981

98103

1022026

1159874

12162755

13442413

141202604

153269017

168886111

1724154953

1865659969

19178482301

20485165195

• X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539

### Linear RegressionModel

• Assume that the relationship between the variables is linear:

• Slope

• Y-Intercept

• Error

• Y

• X

• 0

• 1

• i

• Dependent (Response) Variable

• Independent (Explanatory) Variable

### Model

Do you think knowing the size of a house helps “explain” the variation in house prices?

Population Model:

Price = b0 + b1 Sq. Footage + e

Estimated Equation:

Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage

• Y

• b

• b

• X

• e

• i

• 0

• 1

• i

• i

• e

• = Residual

• i

• ^

• Y

• b

• b

• X

• i

• 0

• 1

• i

### Linear Regression Model

• Y

• Unsampled Observation

• X

### Estimated Model

Est. Price = b0 + b1 Sq. Footage

Est. Price = 117,663 + 173 Sq. Footage

### Model Interpretation

• b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage.

• Price will increase by \$173 on average for each additional square foot.

• b0: The average Price when Square Footage equals zero.

• Average value of Price is \$117,663 when there is no Square Footage.

• Does this statement make sense? Does this result have managerial significance?

### Hypothesis Test: No Linear Relationship

• Tests whether there is a (linear) relationship between X & Y

• Hypotheses

• H0: 1 = 0 (No Linear Relationship)

• H1: 1 0 (Linear Relationship)

• Compare p-value to a

• Interpretation

• If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.

### Quality of Model

• We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy.

• We have two measures of fit: R-squared and S (aka SEE).

• 2

• R

### The Famed R2

• Explained Variance

• Coefficient of determination (R2)

• The closer the R2 to 1, the better the “fit”

R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value).

• Square Footage explains 92% of the variation in Price.

• Total Variance

### Accuracy: Standard Error of Estimate

• Standard error of the estimate: S (or SEE)

• The smaller the S, the better the “fit”

• The units of S are the same as the units of the Y variable.

• When using our regression model for predicting home prices, we would be off on average plus/minus \$46,631.

### In-Class Activity:

• Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable.

• Interpret the following in the context of the model:

• Slope and intercept

• Strength of linear relationship (R)

• The usefulness of the slope (p-value)

• Graph of relationship

• Evaluation of the model, i.e., R2 and S.

• Use the model to predict Salary for a fictitious employee.

### Multiple Linear Regression(MLR)

• We assume that the relationship between variables is linear:

• Y

• X

• X

• X

• 2

• 3

• 0

• 1

• 3

• 1

• 2

### Model Building

• Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors.

• Generally, you want at least 10 observations per variable selected if possible.

### Variable Investigation

• Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots.

• To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.

• Correlation Matrix

### Multiple Regression Output

• Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

### Slopes in MLR

• Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant.

• Price will increase on average by \$111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.

### p-values in MLR

• Each explanatory (independent) variable has its own p-value.

• When looked at individually, is the variable’s slope statistically different than zero?

• If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted.

• If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.

### Coefficient of Determination R2

• R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.

### Output

• Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.

### Standard Error of Estimate in MLR

• The interpretation of S (aka SEE) is the same in multiple regression as it is in simple.

• Thus, we expect to be off on average plus or minus \$28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.

### Variation Reduction

• How do we know if the S (SEE) is low or high?

Is it small enough to make the predictions from the regressions useful?

Compare it to the standard deviation of the response (dependent) variable.

S: (SEE)S: St Dev(Price)

\$28,765vs.\$161,666

### Predicting Using Regression

• Recall we have:

• Assume this is a good equation.

• Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths.

• Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms

### How Confident Should You Be about Your Estimate?

About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.

### How Confident Should You Be about the Average Price of a Such a House?

• A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as:

• Est Price +/- 2 *SEE/SQRT(n).

• In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.

### In-Class Activity:

• Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables.

• Interpret the following in the context of the model:

• Slopes and intercept.

• Strength of linear relationship (R)

• The usefulness of the slope (p-value).

• Graphs of relationship.

• Evaluation of the model, i.e., R2 and SEE.

• Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.