Lecture 9 diagnostics review
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Lecture 9: Diagnostics & Review PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 9: Diagnostics & Review. February 10, 2014. Question. A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in $)

Download Presentation

Lecture 9: Diagnostics & Review

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 9 diagnostics review

Lecture 9:Diagnostics & Review

February 10, 2014


Question

Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

  • If there is positive correlation r between x and y, then b1 must be positive

  • The units of the intercept and slope will be the same as the response variable, y.

  • If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y

  • None of the above, more than one of the above, or not enough information to tell.


Question1

Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

  • If there is positive correlation r between x and y, then b1 must be positive

    b1 = r * sy / sx

    So if r> 0, then b1 is positive because syand sx> 0


Administrative

Administrative

  • Problem set 4 due (9am)

    • How was it?

  • Next week: Multiple Regression

  • Exam Wednesday

    • Sample question

      • Taken from Exam 1 - #37 last year


Last time

Last time

  • What did we talk about?

  • Outliers

    • Sensitivity analysis

  • Heteroscedasticity


Common problems and fixes

Common problems and fixes:

Say we’re estimating price of a lease by the size of the house:

Price = β0 + β1 * SqFt + ε

Interpretation of the estimates?

  • β0would be fixed costs and

  • β1would be marginal costs


Common problems heteroscedasticity

Common Problems:Heteroscedasticity

Heteroscedasticity: What does that mean for your analysis?

  • Point estimates for β’s?

    • Still OK. No bias.

  • Prediction and Confidence intervals?

    • Not reliable; too narrow or too wide.

    • Hypothesis tests regarding β0 and β1 are not reliable.


Common problems heteroscedasticity1

Common Problems:Heteroscedasticity

Fixing the problem:

  • Revise the model: how will depend on the substance.

    • Try revising the model to estimate Price/SqFt by dividing the original eq by SqFt:

  • Notice the change in the

  • intercept and slope:

  • Don’t be locked into thinking the intercept is fixed cost

  • How to interpret them depends

    • Think about the data!


Common problems heteroscedasticity2

Common Problems:Heteroscedasticity

Fixing the problem:

Price/SqFt = M + F * (1/SqFt) + ε

  • Revise by thinking about the substance

    • Here it was predict price per sqft directly.

  • Don’t revise by doing weird things

    • Use theory!

  • After revising, check if the residuals have similar variances?

    • Sometimes they won’t.

    • In this case they do:


Common problems heteroscedasticity3

Common Problems:Heteroscedasticity

Comparing the revised and original model:

  • Revised model may have different (and smaller) R2.

    • Again, so? R2 is great but it’s only one notion of fit.

  • In the example, the revised model provides a narrower (hence better) confidence interval for fixed and variable costs:

Original Model

Revised Model

Original Model

Revised Model


Common problems heteroscedasticity4

Common Problems:Heteroscedasticity

Comparing the revised and original model:

  • It also provides a more sensible prediction interval

    • The data originally indicated that large homes varied in price more:


Common problems heteroscedasticity5

Common Problems:Heteroscedasticity

How do you know how to remodel the problem?

  • Practice

  • Creativity; try different things.

  • There is no magic bullet; sometimes you can’t.


Common problems correlated errors

Common Problems:Correlated Errors

Problem: Dependence between residuals (autocorrelation)

  • The amount of error (detected by the size of the residual) you make at observation x+ 1 is related to the amount of error you make at observation x.

  • Why is this a problem?

    • SRM assumes that the errors, ε, are independent.

    • Common problem for time series data, but not just a time series problem.

      • Recall the u-shaped pattern in one of the residual plots before


Common problems correlated errors1

Common Problems:Correlated Errors

Detecting the problem:

  • Easier with time series data:

    • plot the residuals versus time and look for a pattern (is t+1 related to t?). Not guaranteed to find it but often helpful.

  • Use the Durbin-Watson statistic to test for correlation between adjacent residuals (aka serial- or auto-correlation)

    • With time series data adjacency is temporal.

    • In non time series data, we’re still talking about errors next to one another being related.

    • For things like spatial autocorrelation, there are more advanced things like mapping the residuals and tests we can do


Durbin watson statistic

Durbin-Watson Statistic

  • Tests to see if the correlation between the residuals is 0

    • Null hypothesis: H0: ρε = 0

  • It’s calculated as:

  • From the Durbin-Watson, D,statistic and sample size you can calculate the p-value for the hypothesis test

    • You’ll see this more in multiple regression and forecasting


Common problems correlated errors2

Common Problems:Correlated Errors

Consequences of Dependence:

  • With autocorrelation in the errors the estimated standard errors are too small

    • Estimated slope and intercept are less precise than as indicated by the output


Common problems correlated errors3

Common Problems:Correlated Errors

How do you fix it?

  • Try to model it directly or transform the data.

  • Example: number of mobile phone users:

    • Growth rate isn’t linear; try different transformations

Original data

Transformed data


Common problems correlated errors4

Common Problems:Correlated Errors

Does this fix the problem?

  • Linear pattern looks better

  • You still need to check the other SRM conditions!!

    • Omitted variables?

    • Analysis of residuals. Might still be a problem.

Original data

Transformed data


Exam review

Exam Review

  • Download diamonds.xlsx

  • Regress price on weight

  • Are the residuals distributed Normal?

  • Yes

  • No

  • Maybe?

  • I have no idea how to verify that


Exam review1

Exam Review

  • Using your regression model from the last slide, predict the price of a diamond that weighs 0.44 carats

  • What is the approximate 95% confidence interval?

  • [$877.75, $1558.61]

  • [$2324.80, $3014.69]

  • [$-97.97, $184.95]

  • [$2330.41, $3009.09]

  • I have no idea


Exam review2

Exam Review

  • Using your regression model from the last slide, predict the price of a diamond that weighs 0.28 carats

  • What is the prediction interval?

  • [$877.75, $1558.61]

  • [$452.57, $1129.46]

  • [$764.38, $1058.25]

  • [$345.61, $678.34]

  • I have no idea


Exam review3

Exam Review

  • Question about transformations:

    • Again, no magic bullet. Try different ones.

    • How do you decide if you transform the X or Y?

      • Often depends on the substance.


Exam review4

Exam Review

  • Transformations

    • A common mistake is to forget to convert back to the appropriate units.

      • Say your data and interest is in km/l and you transform the response to be liters / 100 km. Don’t forget to transform back to the correct units. Similarly for ln(x) [ in excel e is =exp() ]


Exam review5

Exam Review

  • Conditions for the SRM

    • Know them.

    • Don’t be hesitant to try to fit a model if they are violated; just be cautious.

    • Some of you might think a regression model is inappropriate if you don’t see a pattern in the data, i.e.,:

      • Totally fine to try to fit a model

      • The slope will probably be 0.


Exam review6

Exam Review

Check list:

  • Is the association between y and x linear?

    • Maybe one could exist but you don’t obviously see it (much more common in multiple regression)

  • Have omitted/lurking variables been ruled out?

    • In the exam, I’ll try to give you the necessary info.

  • Are the errors evidently independent?

    • How do you verify this?

  • Are the variances of the residuals similar?

    • How do you verify this?

  • Are the residuals nearly normal?

    • How do you verify this?


Exam review7

Exam Review

  • What do you need to know?

    • Everything from chapters 19 through 22…

    • No CAPM; we’ll come back to it.

  • What do you need to know from last semester?

    • Statistics builds on itself. I’ll assume you’re comfortable with some basic concepts (confidence intervals, hypothesis tests, z-scores, means, etc., etc.)

    • Will there be decision problems like those on Quiz 1? Maybe, but probably not. I want this to be more applied data analysis.


Exam review8

Exam Review

  • Types of Questions?

    • Possibly homework like.

    • Some business related decision making

    • Some non-business related analysis

  • Best way to study?

    • Do the problems. Then do more.


  • Login