lecture 9 diagnostics review
Download
Skip this Video
Download Presentation
Lecture 9: Diagnostics & Review

Loading in 2 Seconds...

play fullscreen
1 / 27

Lecture 9: Diagnostics & Review - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Lecture 9: Diagnostics & Review. February 10, 2014. Question. A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in $)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture 9: Diagnostics & Review' - elden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
question
Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

  • If there is positive correlation r between x and y, then b1 must be positive
  • The units of the intercept and slope will be the same as the response variable, y.
  • If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y
  • None of the above, more than one of the above, or not enough information to tell.
question1
Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

  • If there is positive correlation r between x and y, then b1 must be positive

b1 = r * sy / sx

So if r> 0, then b1 is positive because syand sx> 0

administrative
Administrative
  • Problem set 4 due (9am)
    • How was it?
  • Next week: Multiple Regression
  • Exam Wednesday
    • Sample question
      • Taken from Exam 1 - #37 last year
last time
Last time
  • What did we talk about?
  • Outliers
    • Sensitivity analysis
  • Heteroscedasticity
common problems and fixes
Common problems and fixes:

Say we’re estimating price of a lease by the size of the house:

Price = β0 + β1 * SqFt + ε

Interpretation of the estimates?

  • β0would be fixed costs and
  • β1would be marginal costs
common problems heteroscedasticity
Common Problems:Heteroscedasticity

Heteroscedasticity: What does that mean for your analysis?

  • Point estimates for β’s?
    • Still OK. No bias.
  • Prediction and Confidence intervals?
    • Not reliable; too narrow or too wide.
    • Hypothesis tests regarding β0 and β1 are not reliable.
common problems heteroscedasticity1
Common Problems:Heteroscedasticity

Fixing the problem:

  • Revise the model: how will depend on the substance.
    • Try revising the model to estimate Price/SqFt by dividing the original eq by SqFt:
  • Notice the change in the
  • intercept and slope:
  • Don’t be locked into thinking the intercept is fixed cost
  • How to interpret them depends
    • Think about the data!
common problems heteroscedasticity2
Common Problems:Heteroscedasticity

Fixing the problem:

Price/SqFt = M + F * (1/SqFt) + ε

  • Revise by thinking about the substance
    • Here it was predict price per sqft directly.
  • Don’t revise by doing weird things
    • Use theory!
  • After revising, check if the residuals have similar variances?
    • Sometimes they won’t.
    • In this case they do:
common problems heteroscedasticity3
Common Problems:Heteroscedasticity

Comparing the revised and original model:

  • Revised model may have different (and smaller) R2.
    • Again, so? R2 is great but it’s only one notion of fit.
  • In the example, the revised model provides a narrower (hence better) confidence interval for fixed and variable costs:

Original Model

Revised Model

Original Model

Revised Model

common problems heteroscedasticity4
Common Problems:Heteroscedasticity

Comparing the revised and original model:

  • It also provides a more sensible prediction interval
    • The data originally indicated that large homes varied in price more:
common problems heteroscedasticity5
Common Problems:Heteroscedasticity

How do you know how to remodel the problem?

  • Practice
  • Creativity; try different things.
  • There is no magic bullet; sometimes you can’t.
common problems correlated errors
Common Problems:Correlated Errors

Problem: Dependence between residuals (autocorrelation)

  • The amount of error (detected by the size of the residual) you make at observation x+ 1 is related to the amount of error you make at observation x.
  • Why is this a problem?
    • SRM assumes that the errors, ε, are independent.
    • Common problem for time series data, but not just a time series problem.
      • Recall the u-shaped pattern in one of the residual plots before
common problems correlated errors1
Common Problems:Correlated Errors

Detecting the problem:

  • Easier with time series data:
    • plot the residuals versus time and look for a pattern (is t+1 related to t?). Not guaranteed to find it but often helpful.
  • Use the Durbin-Watson statistic to test for correlation between adjacent residuals (aka serial- or auto-correlation)
    • With time series data adjacency is temporal.
    • In non time series data, we’re still talking about errors next to one another being related.
    • For things like spatial autocorrelation, there are more advanced things like mapping the residuals and tests we can do
durbin watson statistic
Durbin-Watson Statistic
  • Tests to see if the correlation between the residuals is 0
    • Null hypothesis: H0: ρε = 0
  • It’s calculated as:
  • From the Durbin-Watson, D,statistic and sample size you can calculate the p-value for the hypothesis test
    • You’ll see this more in multiple regression and forecasting
common problems correlated errors2
Common Problems:Correlated Errors

Consequences of Dependence:

  • With autocorrelation in the errors the estimated standard errors are too small
    • Estimated slope and intercept are less precise than as indicated by the output
common problems correlated errors3
Common Problems:Correlated Errors

How do you fix it?

  • Try to model it directly or transform the data.
  • Example: number of mobile phone users:
    • Growth rate isn’t linear; try different transformations

Original data

Transformed data

common problems correlated errors4
Common Problems:Correlated Errors

Does this fix the problem?

  • Linear pattern looks better
  • You still need to check the other SRM conditions!!
    • Omitted variables?
    • Analysis of residuals. Might still be a problem.

Original data

Transformed data

exam review
Exam Review
  • Download diamonds.xlsx
  • Regress price on weight
  • Are the residuals distributed Normal?
  • Yes
  • No
  • Maybe?
  • I have no idea how to verify that
exam review1
Exam Review
  • Using your regression model from the last slide, predict the price of a diamond that weighs 0.44 carats
  • What is the approximate 95% confidence interval?
  • [$877.75, $1558.61]
  • [$2324.80, $3014.69]
  • [$-97.97, $184.95]
  • [$2330.41, $3009.09]
  • I have no idea
exam review2
Exam Review
  • Using your regression model from the last slide, predict the price of a diamond that weighs 0.28 carats
  • What is the prediction interval?
  • [$877.75, $1558.61]
  • [$452.57, $1129.46]
  • [$764.38, $1058.25]
  • [$345.61, $678.34]
  • I have no idea
exam review3
Exam Review
  • Question about transformations:
    • Again, no magic bullet. Try different ones.
    • How do you decide if you transform the X or Y?
      • Often depends on the substance.
exam review4
Exam Review
  • Transformations
    • A common mistake is to forget to convert back to the appropriate units.
      • Say your data and interest is in km/l and you transform the response to be liters / 100 km. Don’t forget to transform back to the correct units. Similarly for ln(x) [ in excel e is =exp() ]
exam review5
Exam Review
  • Conditions for the SRM
    • Know them.
    • Don’t be hesitant to try to fit a model if they are violated; just be cautious.
    • Some of you might think a regression model is inappropriate if you don’t see a pattern in the data, i.e.,:
      • Totally fine to try to fit a model
      • The slope will probably be 0.
exam review6
Exam Review

Check list:

  • Is the association between y and x linear?
    • Maybe one could exist but you don’t obviously see it (much more common in multiple regression)
  • Have omitted/lurking variables been ruled out?
    • In the exam, I’ll try to give you the necessary info.
  • Are the errors evidently independent?
    • How do you verify this?
  • Are the variances of the residuals similar?
    • How do you verify this?
  • Are the residuals nearly normal?
    • How do you verify this?
exam review7
Exam Review
  • What do you need to know?
    • Everything from chapters 19 through 22…
    • No CAPM; we’ll come back to it.
  • What do you need to know from last semester?
    • Statistics builds on itself. I’ll assume you’re comfortable with some basic concepts (confidence intervals, hypothesis tests, z-scores, means, etc., etc.)
    • Will there be decision problems like those on Quiz 1? Maybe, but probably not. I want this to be more applied data analysis.
exam review8
Exam Review
  • Types of Questions?
    • Possibly homework like.
    • Some business related decision making
    • Some non-business related analysis
  • Best way to study?
    • Do the problems. Then do more.
ad