Determining factors of market success
Download
1 / 35

Determining Factors of Market Success - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Determining Factors of Market Success. DMD #4 David Kopcso and Richard Cleary Babson College F. W. Olin Graduate School of Business. Learning Objectives. Determine the strength of (linear) relationships Describe a regression model with one or more explanatory variables

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Determining Factors of Market Success' - giselle-brown


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Determining factors of market success

Determining Factors of Market Success

DMD #4

David Kopcso and Richard Cleary

BabsonCollege

F. W. Olin Graduate School of Business


Learning objectives
Learning Objectives

  • Determine the strength of (linear) relationships

  • Describe a regression model with one or more explanatory variables

  • Interpret regression coefficients

  • Evaluate the model in a business context


Modeling relationships
Modeling Relationships

  • If we believe that two (or more) random variables are related, then we would like to model and exploit the relationship.

  • Hopefully, this helps to:

    • Make more accurate predictions

    • Show the direction and strength of relationship

    • Reduce the amount of uncertainty.


Approach
Approach

Investigate variables individually and jointly.

IndividuallyJointly

Numerically: Standard Stats Correlation

Graphically: Histogram Scatter Plot

Box Plot


Scatter plots
Scatter Plots

  • Positive Linear Relationship

  • Nonlinear Relationship

  • Negative Linear Relationship

  • No Relationship


Correlation coefficient
Correlation Coefficient

  • Unit free: ranges between -1 and 1

    • The closer to –1, the stronger the negative linear relationship

    • The closer to 1, the stronger the positive linear relationship

    • The closer to 0, the weaker any linear relationship

  • Note: Correlation does not deal with cause and effect; it only measures strength of linear dependence


Correlation coefficient r
Correlation Coefficient r

  • r = +1

  • Y

  • Y

  • r = -1

  • X

  • X

  • r = +0.9

  • r = 0

  • Y

  • Y

  • X

  • X


Correlation measures only linear dependence
Correlation MeasuresOnly Linear Dependence!

X Y = exp(X)

1 3

2 7

3 20

4 55

5 148

6 403

7 1097

8 2981

9 8103

10 22026

11 59874

12 162755

13 442413

14 1202604

15 3269017

16 8886111

17 24154953

18 65659969

19 178482301

20 485165195

  • X and Y are perfectly related. However, the correlation of X and Y (where Y=exp(X) ) is 0.539


Linear regression model
Linear RegressionModel

  • Assume that the relationship between the variables is linear:

  • Slope

  • Y-Intercept

  • Error

  • Y

  • X

  • 0

  • 1

  • i

  • Dependent (Response) Variable

  • Independent (Explanatory) Variable


Model
Model

Do you think knowing the size of a house helps “explain” the variation in house prices?

Population Model:

Price = b0 + b1 Sq. Footage + e

Estimated Equation:

Est. Price = b0 + b1 Sq. Footage^or Price = b0 + b1 Sq. Footage


Linear regression model1

  • b

  • b

  • X

  • e

  • i

  • 0

  • 1

  • i

  • i

  • e

  • = Residual

  • i

  • ^

  • Y

  • b

  • b

  • X

  • i

  • 0

  • 1

  • i

Linear Regression Model

  • Y

  • Unsampled Observation

  • X


Estimated model
Estimated Model

Est. Price = b0 + b1 Sq. Footage

Est. Price = 117,663 + 173 Sq. Footage


Model interpretation
Model Interpretation

  • b1: The average marginal increase/decrease in Price for a unit increase in Sq. Footage.

    • Price will increase by $173 on average for each additional square foot.

  • b0: The average Price when Square Footage equals zero.

    • Average value of Price is $117,663 when there is no Square Footage.

  • Does this statement make sense? Does this result have managerial significance?


Hypothesis test no linear relationship
Hypothesis Test: No Linear Relationship

  • Tests whether there is a (linear) relationship between X & Y

  • Hypotheses

    • H0: 1 = 0 (No Linear Relationship)

    • H1: 1 0 (Linear Relationship)

  • Compare p-value to a

  • Interpretation

    • If p-value is less than a, we have enough information to conclude that Square Footage is linearly related to Price and we can interpret the slope.


Quality of model
Quality of Model

  • We would like to know how well our model fits the facts (data). The better the fit, the more we believe in the model’s accuracy.

  • We have two measures of fit: R-squared and S (aka SEE).


The famed r 2

  • R

The Famed R2

  • Explained Variance

  • Coefficient of determination (R2)

    • The closer the R2 to 1, the better the “fit”

      R2 is the percentage of variation of the Y variable that is explained by (accounted for by or reduced by) knowing the X variable (i.e., by using the regression to predict the response rather than the average response value).

    • Square Footage explains 92% of the variation in Price.

  • Total Variance


Accuracy standard error of estimate
Accuracy: Standard Error of Estimate

  • Standard error of the estimate: S (or SEE)

    • The smaller the S, the better the “fit”

    • The units of S are the same as the units of the Y variable.

    • When using our regression model for predicting home prices, we would be off on average plus/minus $46,631.


In class activity
In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using only one variable as the explanatory (independent) variable.

  • Interpret the following in the context of the model:

    • Slope and intercept

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value)

    • Graph of relationship

    • Evaluation of the model, i.e., R2 and S.

    • Use the model to predict Salary for a fictitious employee.


Multiple linear regression mlr
Multiple Linear Regression(MLR)

  • We assume that the relationship between variables is linear:

  • Y

  • X

  • X

  • X

  • 2

  • 3

  • 0

  • 1

  • 3

  • 1

  • 2


Model building
Model Building

  • Before running any regressions or even any data analysis, determine which of your variables you believe are good predictors.

  • Generally, you want at least 10 observations per variable selected if possible.


Variable investigation
Variable Investigation

  • Next, investigate the relationship between the response or dependent (Y) variable and each of the explanatory or independent (X) variables. Use the correlation matrix and scatter plots.

  • To avoid ‘problems’, also make sure that the correlation among the explanatory (independent) variables is not too high. As a rule of thumb, anything above 0.90 in absolute value can cause trouble.



Multiple regression output
Multiple Regression Output

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


Slopes in mlr
Slopes in MLR

  • Be careful when interpreting the slopes in a multiple linear regression as it is necessary to hold all other variables constant.

  • Price will increase on average by $111 for each additional square foot when holding all other explanatory variables constant. If the p-value is not less than alpha, then you cannot interpret the slope.


P values in mlr
p-values in MLR

  • Each explanatory (independent) variable has its own p-value.

  • When looked at individually, is the variable’s slope statistically different than zero?

    • If yes (p-value < a), then that variable is a good predictor within the context of the model and the slope can be interpreted.

    • If no (p-value > a), then that variable is not a good predictor within the context of the model and the slope can not be interpreted. Some variables are confirmatory and may remain in the model even though their p-value > a.


Coefficient of determination r 2
Coefficient of Determination R2

  • R2 is still the percentage of variation of the Y variable explained by knowing all the X variables. The focus is on explaining the variation in Price, not on explaining the data.


Output
Output

  • Knowing the square footage, the number of bedrooms and the number of bathrooms of a house, explains 97% of the variation in house prices.


Standard error of estimate in mlr
Standard Error of Estimate in MLR

  • The interpretation of S (aka SEE) is the same in multiple regression as it is in simple.

  • Thus, we expect to be off on average plus or minus $28,765 when predicting house prices using the square footage, the number of bedrooms and the number of bathrooms in the house.


Variation reduction
Variation Reduction

  • How do we know if the S (SEE) is low or high?

    Is it small enough to make the predictions from the regressions useful?

    Compare it to the standard deviation of the response (dependent) variable.

    S: (SEE) S: St Dev(Price)

    $28,765 vs. $161,666


Predicting using regression
Predicting Using Regression

  • Recall we have:

  • Assume this is a good equation.

  • Use it to predict the expected selling price of a home with 2000 sq. ft. of living space, 4 bedrooms, and 2 baths.

  • Est Price = 44,392 + 111*SqFt + 85,345*Bedrooms + 572*Bathrooms


How confident should you be about your estimate
How Confident Should You Be about Your Estimate?

About two-thirds (68%) of the data should fall within +/- SEE of the value determined by the regression equation. Similarly about 95% should fall within 2*SEE. Therefore, a 95% interval for the prediction of a specific house at 533 Main St. which has2000 sq. ft., 4 bedrooms, & 2 baths can be computed as Est Price +/- 2*SEE.That is, we are 95% confident that this specific house’s price is between these two values.Since this is about a specific house, the interval is called a prediction interval not a confidence interval.


How confident should you be about the average price of a such a house
How Confident Should You Be about the Average Price of a Such a House?

  • A 95% confidence interval for the average price of a 2000 sq. ft., 4 bed, 2 bath house can be computed as:

  • Est Price +/- 2 *SEE/SQRT(n).

  • In words, based on our regression, we are 95% confident that the average 2000 sq. ft., 4 bed, 2 bath house price is between these two values. Since this is about an average of all such houses, the interval is called a confidence interval not a prediction interval.


In class activity1
In-Class Activity:

  • Investigate models of Salary from the file Salary_handout.xls using any set of variables you wish as the explanatory (independent) variables.

  • Interpret the following in the context of the model:

    • Slopes and intercept.

    • Strength of linear relationship (R)

    • The usefulness of the slope (p-value).

    • Graphs of relationship.

    • Evaluation of the model, i.e., R2 and SEE.

    • Use the model to predict Salary for a fictitious employee and build Prediction and Confidence intervals for this prediction.


ad