Determining factors of market success
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Determining Factors of Market Success PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on
  • Presentation posted in: General

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

Download Presentation

Determining Factors of Market Success

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Determining factors of market success

Determining Factors of Market Success

DMD #4

David Kopcso and Richard Cleary

BabsonCollege

F. W. Olin Graduate School of Business


Learning objectives

Learning Objectives

  • Determine the strength of (linear) relationships

  • Describe a regression model with one or more explanatory variables

  • Interpret regression coefficients

  • Evaluate the model in a business context


Modeling relationships

Modeling Relationships

  • If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship.

  • Hopefully, this helps to:

    • Make more accurate predictions

    • Show the direction and strength of relationship

    • Reduce the amount of uncertainty.


Approach

Approach

Investigate variables individually and jointly.

IndividuallyJointly

Numerically:Standard StatsCorrelation

Graphically:Histogram Scatter Plot

Box Plot


Scatter plots

Scatter Plots

  • Positive Linear Relationship

  • Nonlinear Relationship

  • Negative Linear Relationship

  • No Relationship


Correlation coefficient

Correlation Coefficient

  • Unit free: ranges between -1 and 1

    • The closer to –1, the stronger the negative linear relationship

    • The closer to 1, the stronger the positive linear relationship

    • The closer to 0, the weaker any linear relationship

  • Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence


Correlation coefficient r

Correlation Coefficient r

  • r = +1

  • Y

  • Y

  • r = -1

  • X

  • X

  • r = +0.9

  • r = 0

  • Y

  • Y

  • X

  • X


Correlation measures only linear dependence

Correlation MeasuresOnly Linear Dependence!

XY = exp(X)

13

27

320

455

5148

6403

71097

82981

98103

1022026

1159874

12162755

13442413

141202604

153269017

168886111

1724154953

1865659969

19178482301

20485165195

  • X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539


Linear regression model

Linear RegressionModel

  • Assume that the relationship between the variables is linear:

  • Slope

  • Y-Intercept

  • Error

  • Y

  • X

  • 0

  • 1

  • i

  • Dependent (Response) Variable

  • Independent (Explanatory) Variable


Model

Model

Do you think knowing the size of a house helps “explain” the variation in house prices?

Population Model:

Price = b0 + b1 Sq. Footage + e

Estimated Equation:

Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage


Linear regression model1

  • Y

  • b

  • b

  • X

  • e

  • i

  • 0

  • 1

  • i

  • i

  • e

  • = Residual

  • i

  • ^

  • Y

  • b

  • b

  • X

  • i

  • 0

  • 1

  • i

Linear Regression Model

  • Y

  • Unsampled Observation

  • X


Estimated model

Estimated Model

Est. Price = b0 + b1 Sq. Footage

Est. Price = 117,663 + 173 Sq. Footage


Model interpretation

Model Interpretation

  • b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage.

    • Price will increase by $173 on average for each additional square foot.

  • b0: The average Price when Square Footage equals zero.

    • Average value of Price is $117,663 when there is no Square Footage.

  • Does this statement make sense? Does this result have managerial significance?


Hypothesis test no linear relationship

Hypothesis Test: No Linear Relationship

  • Tests whether there is a (linear) relationship between X & Y

  • Hypotheses

    • H0: 1 = 0 (No Linear Relationship)

    • H1: 1 0 (Linear Relationship)

  • Compare p-value to a

  • Interpretation

    • If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.


Quality of model

Quality of Model

  • We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy.

  • We have two measures of fit: R-squared and S (aka SEE).


The famed r 2

  • 2

  • R

The Famed R2

  • Explained Variance

  • Coefficient of determination (R2)

    • The closer the R2 to 1, the better the “fit”

      R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value).

    • Square Footage explains 92% of the variation in Price.

  • Total Variance


Accuracy standard error of estimate

Accuracy: Standard Error of Estimate

  • Standard error of the estimate: S (or SEE)

    • The smaller the S, the better the “fit”

    • The units of S are the same as the units of the Y variable.

    • When using our regression model for predicting home prices, we would be off on average plus/minus $46,631.


In class activity

In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable.

  • Interpret the following in the context of the model:

    • Slope and intercept

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value)

    • Graph of relationship

    • Evaluation of the model, i.e., R2 and S.

    • Use the model to predict Salary for a fictitious employee.


Multiple linear regression mlr

Multiple Linear Regression(MLR)

  • We assume that the relationship between variables is linear:

  • Y

  • X

  • X

  • X

  • 2

  • 3

  • 0

  • 1

  • 3

  • 1

  • 2


Model building

Model Building

  • Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors.

  • Generally, you want at least 10 observations per variable selected if possible.


Variable investigation

Variable Investigation

  • Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots.

  • To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.


Determining factors of market success

  • Correlation Matrix


Multiple regression output

Multiple Regression Output

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


Slopes in mlr

Slopes in MLR

  • Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant.

  • Price will increase on average by $111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.


P values in mlr

p-values in MLR

  • Each explanatory (independent) variable has its own p-value.

  • When looked at individually, is the variable’s slope statistically different than zero?

    • If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted.

    • If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.


Coefficient of determination r 2

Coefficient of Determination R2

  • R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.


Output

Output

  • Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.


Standard error of estimate in mlr

Standard Error of Estimate in MLR

  • The interpretation of S (aka SEE) is the same in multiple regression as it is in simple.

  • Thus, we expect to be off on average plus or minus $28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.


Variation reduction

Variation Reduction

  • How do we know if the S (SEE) is low or high?

    Is it small enough to make the predictions from the regressions useful?

    Compare it to the standard deviation of the response (dependent) variable.

    S: (SEE)S: St Dev(Price)

    $28,765vs.$161,666


Predicting using regression

Predicting Using Regression

  • Recall we have:

  • Assume this is a good equation.

  • Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths.

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


How confident should you be about your estimate

How Confident Should You Be about Your Estimate?

About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.


How confident should you be about the average price of a such a house

How Confident Should You Be about the Average Price of a Such a House?

  • A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as:

  • Est Price +/- 2 *SEE/SQRT(n).

  • In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.


In class activity1

In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables.

  • Interpret the following in the context of the model:

    • Slopes and intercept.

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value).

    • Graphs of relationship.

    • Evaluation of the model, i.e., R2 and SEE.

    • Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.


  • Login