- 130 Views
- Uploaded on
- Presentation posted in: General

ASSUMPTION CHECKING

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- In regression analysis with Stata
- In multi-level analysis with Stata (not much extra)
- In logistic regression analysis with Stata
NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS

Assumption checking

in “normal” multiple regression

with Stata

- No multi-collinearity
- All relevant predictor variables
- included
- Homoscedasticity: all residuals are
- from a distribution with the same variance
- Linearity: the “true” model should be
- linear.
- Independent errors: having information
- about the value of a residual should not
- give you information about the value of
- other residuals
- Errors are distributed normally

FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals

Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual.

Typical cases:

-repeatedmeasures

-clusteredobservations

(peoplewithinfirms /

pupilswithin schools)

Consequences: as forheteroscedasticity

Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).

Cure: usemulti-level analyses

Example:

the Stata “auto.dta” data set

sysuse auto

corr (correlation)

vif (variance inflation factors)

ovtest (omitted variable test)

hettest (heterogeneity test)

predict e, resid

swilk(test for normality)

- “help regress”
- “regress postestimation”
and you will find most of them (and more) there

Multi-collinearity

A strongcorrelationbetweentwoor more of your predictor variables

Youdon’t want it, because:

- It is more difficult to gethigher R’s
- The importance of predictorscanbedifficult to establish (b-hatstend to go to zero)
- The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”)
Detect:

- Look at correlation matrix of predictor variables
- calculateVIF-factorswhile running regression
Cure:

Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable

Homoscedasticity: all residuals are from a distribution with the samevariance

Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.

- Your residuals should have the same variance for all values of Y hettest
- Your residuals should have the same variance for all values of X hettest, rhs

Errors are distributednormally

(justthe errors, not the variables themselves!)

Detect: look at the residual plots, test fornormality

Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.

Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).

First calculate the errors:

predict e, resid

Then test for normality

swilke

Assumption checking

in multi-level multiple regression

with Stata

- Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”)
Add:

- xttest0 (see last week)
Add (extra):

Test visually whether the normality assumption holds, but do this for the random

tab school, gen(sch_)

regy sch2 – sch28

gen coefs = .

for num 2/28: replace coefs =_b[schX] if _n==X

swilkcoefs

Assumption checking

in multi-level multiple regression

with Stata

- Y is 0/1
- Ratio of cases to variables should be “reasonable”
- No cases where you have complete separation (Stata will remove these cases automatically)
- Linearity in the logit (comparable to “the true model should be linear” in multiple regression)
- Independence of errors (as in multiple regression)

- Check goodness of fit and prediction for different groups (as done in the do-file you have)
- Check the correlation matrix for strong correlations between predictors (corr)
- Check for outliers using regress and diag(but don’t tell anyone I suggested this)