Determining factors of market success
Sponsored Links
This presentation is the property of its rightful owner.
1 / 35

Determining Factors of Market Success PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

Download Presentation

Determining Factors of Market Success

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Determining Factors of Market Success

DMD #4

David Kopcso and Richard Cleary

BabsonCollege

F. W. Olin Graduate School of Business


Learning Objectives

  • Determine the strength of (linear) relationships

  • Describe a regression model with one or more explanatory variables

  • Interpret regression coefficients

  • Evaluate the model in a business context


Modeling Relationships

  • If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship.

  • Hopefully, this helps to:

    • Make more accurate predictions

    • Show the direction and strength of relationship

    • Reduce the amount of uncertainty.


Approach

Investigate variables individually and jointly.

IndividuallyJointly

Numerically:Standard StatsCorrelation

Graphically:Histogram Scatter Plot

Box Plot


Scatter Plots

  • Positive Linear Relationship

  • Nonlinear Relationship

  • Negative Linear Relationship

  • No Relationship


Correlation Coefficient

  • Unit free: ranges between -1 and 1

    • The closer to –1, the stronger the negative linear relationship

    • The closer to 1, the stronger the positive linear relationship

    • The closer to 0, the weaker any linear relationship

  • Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence


Correlation Coefficient r

  • r = +1

  • Y

  • Y

  • r = -1

  • X

  • X

  • r = +0.9

  • r = 0

  • Y

  • Y

  • X

  • X


Correlation MeasuresOnly Linear Dependence!

XY = exp(X)

13

27

320

455

5148

6403

71097

82981

98103

1022026

1159874

12162755

13442413

141202604

153269017

168886111

1724154953

1865659969

19178482301

20485165195

  • X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539


Linear RegressionModel

  • Assume that the relationship between the variables is linear:

  • Slope

  • Y-Intercept

  • Error

  • Y

  • X

  • 0

  • 1

  • i

  • Dependent (Response) Variable

  • Independent (Explanatory) Variable


Model

Do you think knowing the size of a house helps “explain” the variation in house prices?

Population Model:

Price = b0 + b1 Sq. Footage + e

Estimated Equation:

Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage


  • Y

  • b

  • b

  • X

  • e

  • i

  • 0

  • 1

  • i

  • i

  • e

  • = Residual

  • i

  • ^

  • Y

  • b

  • b

  • X

  • i

  • 0

  • 1

  • i

Linear Regression Model

  • Y

  • Unsampled Observation

  • X


Estimated Model

Est. Price = b0 + b1 Sq. Footage

Est. Price = 117,663 + 173 Sq. Footage


Model Interpretation

  • b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage.

    • Price will increase by $173 on average for each additional square foot.

  • b0: The average Price when Square Footage equals zero.

    • Average value of Price is $117,663 when there is no Square Footage.

  • Does this statement make sense? Does this result have managerial significance?


Hypothesis Test: No Linear Relationship

  • Tests whether there is a (linear) relationship between X & Y

  • Hypotheses

    • H0: 1 = 0 (No Linear Relationship)

    • H1: 1 0 (Linear Relationship)

  • Compare p-value to a

  • Interpretation

    • If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.


Quality of Model

  • We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy.

  • We have two measures of fit: R-squared and S (aka SEE).


  • 2

  • R

The Famed R2

  • Explained Variance

  • Coefficient of determination (R2)

    • The closer the R2 to 1, the better the “fit”

      R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value).

    • Square Footage explains 92% of the variation in Price.

  • Total Variance


Accuracy: Standard Error of Estimate

  • Standard error of the estimate: S (or SEE)

    • The smaller the S, the better the “fit”

    • The units of S are the same as the units of the Y variable.

    • When using our regression model for predicting home prices, we would be off on average plus/minus $46,631.


In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable.

  • Interpret the following in the context of the model:

    • Slope and intercept

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value)

    • Graph of relationship

    • Evaluation of the model, i.e., R2 and S.

    • Use the model to predict Salary for a fictitious employee.


Multiple Linear Regression(MLR)

  • We assume that the relationship between variables is linear:

  • Y

  • X

  • X

  • X

  • 2

  • 3

  • 0

  • 1

  • 3

  • 1

  • 2


Model Building

  • Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors.

  • Generally, you want at least 10 observations per variable selected if possible.


Variable Investigation

  • Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots.

  • To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.


  • Correlation Matrix


Multiple Regression Output

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


Slopes in MLR

  • Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant.

  • Price will increase on average by $111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.


p-values in MLR

  • Each explanatory (independent) variable has its own p-value.

  • When looked at individually, is the variable’s slope statistically different than zero?

    • If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted.

    • If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.


Coefficient of Determination R2

  • R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.


Output

  • Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.


Standard Error of Estimate in MLR

  • The interpretation of S (aka SEE) is the same in multiple regression as it is in simple.

  • Thus, we expect to be off on average plus or minus $28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.


Variation Reduction

  • How do we know if the S (SEE) is low or high?

    Is it small enough to make the predictions from the regressions useful?

    Compare it to the standard deviation of the response (dependent) variable.

    S: (SEE)S: St Dev(Price)

    $28,765vs.$161,666


Predicting Using Regression

  • Recall we have:

  • Assume this is a good equation.

  • Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths.

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


How Confident Should You Be about Your Estimate?

About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.


How Confident Should You Be about the Average Price of a Such a House?

  • A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as:

  • Est Price +/- 2 *SEE/SQRT(n).

  • In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.


In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables.

  • Interpret the following in the context of the model:

    • Slopes and intercept.

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value).

    • Graphs of relationship.

    • Evaluation of the model, i.e., R2 and SEE.

    • Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.


  • Login