- By
**elden** - Follow User

- 98 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Lecture 9: Diagnostics & Review' - elden

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Lecture 9:Diagnostics & Review

February 10, 2014

Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

- If there is positive correlation r between x and y, then b1 must be positive
- The units of the intercept and slope will be the same as the response variable, y.
- If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y
- None of the above, more than one of the above, or not enough information to tell.

Question

A least squares regression line is determined from a sample of values for variables x and y where

x = size of a listed home (in sq feet)

y = selling price of the home (in $)

Which of the following is true about the model b0 + b1x?

- If there is positive correlation r between x and y, then b1 must be positive

b1 = r * sy / sx

So if r> 0, then b1 is positive because syand sx> 0

Administrative

- Problem set 4 due (9am)
- How was it?
- Next week: Multiple Regression
- Exam Wednesday
- Sample question
- Taken from Exam 1 - #37 last year

Last time

- What did we talk about?
- Outliers
- Sensitivity analysis
- Heteroscedasticity

Common problems and fixes:

Say we’re estimating price of a lease by the size of the house:

Price = β0 + β1 * SqFt + ε

Interpretation of the estimates?

- β0would be fixed costs and
- β1would be marginal costs

Common Problems:Heteroscedasticity

Heteroscedasticity: What does that mean for your analysis?

- Point estimates for β’s?
- Still OK. No bias.
- Prediction and Confidence intervals?
- Not reliable; too narrow or too wide.
- Hypothesis tests regarding β0 and β1 are not reliable.

Common Problems:Heteroscedasticity

Fixing the problem:

- Revise the model: how will depend on the substance.
- Try revising the model to estimate Price/SqFt by dividing the original eq by SqFt:

- Notice the change in the
- intercept and slope:
- Don’t be locked into thinking the intercept is fixed cost
- How to interpret them depends
- Think about the data!

Common Problems:Heteroscedasticity

Fixing the problem:

Price/SqFt = M + F * (1/SqFt) + ε

- Revise by thinking about the substance
- Here it was predict price per sqft directly.
- Don’t revise by doing weird things
- Use theory!
- After revising, check if the residuals have similar variances?
- Sometimes they won’t.
- In this case they do:

Common Problems:Heteroscedasticity

Comparing the revised and original model:

- Revised model may have different (and smaller) R2.
- Again, so? R2 is great but it’s only one notion of fit.
- In the example, the revised model provides a narrower (hence better) confidence interval for fixed and variable costs:

Original Model

Revised Model

Original Model

Revised Model

Common Problems:Heteroscedasticity

Comparing the revised and original model:

- It also provides a more sensible prediction interval
- The data originally indicated that large homes varied in price more:

Common Problems:Heteroscedasticity

How do you know how to remodel the problem?

- Practice
- Creativity; try different things.
- There is no magic bullet; sometimes you can’t.

Common Problems:Correlated Errors

Problem: Dependence between residuals (autocorrelation)

- The amount of error (detected by the size of the residual) you make at observation x+ 1 is related to the amount of error you make at observation x.
- Why is this a problem?
- SRM assumes that the errors, ε, are independent.
- Common problem for time series data, but not just a time series problem.
- Recall the u-shaped pattern in one of the residual plots before

Common Problems:Correlated Errors

Detecting the problem:

- Easier with time series data:
- plot the residuals versus time and look for a pattern (is t+1 related to t?). Not guaranteed to find it but often helpful.
- Use the Durbin-Watson statistic to test for correlation between adjacent residuals (aka serial- or auto-correlation)
- With time series data adjacency is temporal.
- In non time series data, we’re still talking about errors next to one another being related.
- For things like spatial autocorrelation, there are more advanced things like mapping the residuals and tests we can do

Durbin-Watson Statistic

- Tests to see if the correlation between the residuals is 0
- Null hypothesis: H0: ρε = 0
- It’s calculated as:
- From the Durbin-Watson, D,statistic and sample size you can calculate the p-value for the hypothesis test
- You’ll see this more in multiple regression and forecasting

Common Problems:Correlated Errors

Consequences of Dependence:

- With autocorrelation in the errors the estimated standard errors are too small
- Estimated slope and intercept are less precise than as indicated by the output

Common Problems:Correlated Errors

How do you fix it?

- Try to model it directly or transform the data.
- Example: number of mobile phone users:
- Growth rate isn’t linear; try different transformations

Original data

Transformed data

Common Problems:Correlated Errors

Does this fix the problem?

- Linear pattern looks better
- You still need to check the other SRM conditions!!
- Omitted variables?
- Analysis of residuals. Might still be a problem.

Original data

Transformed data

Exam Review

- Download diamonds.xlsx
- Regress price on weight
- Are the residuals distributed Normal?
- Yes
- No
- Maybe?
- I have no idea how to verify that

Exam Review

- Using your regression model from the last slide, predict the price of a diamond that weighs 0.44 carats
- What is the approximate 95% confidence interval?
- [$877.75, $1558.61]
- [$2324.80, $3014.69]
- [$-97.97, $184.95]
- [$2330.41, $3009.09]
- I have no idea

Exam Review

- Using your regression model from the last slide, predict the price of a diamond that weighs 0.28 carats
- What is the prediction interval?
- [$877.75, $1558.61]
- [$452.57, $1129.46]
- [$764.38, $1058.25]
- [$345.61, $678.34]
- I have no idea

Exam Review

- Question about transformations:
- Again, no magic bullet. Try different ones.
- How do you decide if you transform the X or Y?
- Often depends on the substance.

Exam Review

- Transformations
- A common mistake is to forget to convert back to the appropriate units.
- Say your data and interest is in km/l and you transform the response to be liters / 100 km. Don’t forget to transform back to the correct units. Similarly for ln(x) [ in excel e is =exp() ]

Exam Review

- Conditions for the SRM
- Know them.
- Don’t be hesitant to try to fit a model if they are violated; just be cautious.
- Some of you might think a regression model is inappropriate if you don’t see a pattern in the data, i.e.,:
- Totally fine to try to fit a model
- The slope will probably be 0.

Exam Review

Check list:

- Is the association between y and x linear?
- Maybe one could exist but you don’t obviously see it (much more common in multiple regression)
- Have omitted/lurking variables been ruled out?
- In the exam, I’ll try to give you the necessary info.
- Are the errors evidently independent?
- How do you verify this?
- Are the variances of the residuals similar?
- How do you verify this?
- Are the residuals nearly normal?
- How do you verify this?

Exam Review

- What do you need to know?
- Everything from chapters 19 through 22…
- No CAPM; we’ll come back to it.
- What do you need to know from last semester?
- Statistics builds on itself. I’ll assume you’re comfortable with some basic concepts (confidence intervals, hypothesis tests, z-scores, means, etc., etc.)
- Will there be decision problems like those on Quiz 1? Maybe, but probably not. I want this to be more applied data analysis.

Exam Review

- Types of Questions?
- Possibly homework like.
- Some business related decision making
- Some non-business related analysis
- Best way to study?
- Do the problems. Then do more.

Download Presentation

Connecting to Server..