- 186 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' ASSUMPTION CHECKING' - lethia

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

ASSUMPTION CHECKING

- In regression analysis with Stata
- In multi-level analysis with Stata (not much extra)
- In logistic regression analysis with Stata

NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS

Assumptions in regression analysis

- No multi-collinearity
- All relevant predictor variables
- included
- Homoscedasticity: all residuals are
- from a distribution with the same variance
- Linearity: the “true” model should be
- linear.
- Independent errors: having information
- about the value of a residual should not
- give you information about the value of
- other residuals
- Errors are distributed normally

FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals

Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual.

Typical cases:

-repeatedmeasures

-clusteredobservations

(peoplewithinfirms /

pupilswithin schools)

Consequences: as forheteroscedasticity

Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).

Cure: usemulti-level analyses

In Stata:

Example:

the Stata “auto.dta” data set

sysuse auto

corr (correlation)

vif (variance inflation factors)

ovtest (omitted variable test)

hettest (heterogeneity test)

predict e, resid

swilk (test for normality)

Finding the commands

- “help regress”
- “regress postestimation”

and you will find most of them (and more) there

Multi-collinearity

A strongcorrelationbetweentwoor more of your predictor variables

Youdon’t want it, because:

- It is more difficult to gethigher R’s
- The importance of predictorscanbedifficult to establish (b-hatstend to go to zero)
- The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”)

Detect:

- Look at correlation matrix of predictor variables
- calculateVIF-factorswhile running regression

Cure:

Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable

Stata: calculating the correlation matrix (“corr”) and VIF statistics (“vif”)

Misspecificationtests(replaces: all relevant predictor variables included)

Homoscedasticity: all residuals are from a distribution with the samevariance

Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.

Testing for heteroscedasticity in Stata

- Your residuals should have the same variance for all values of Y hettest
- Your residuals should have the same variance for all values of X hettest, rhs

Errorsdistributednormally

Errors are distributednormally

(justthe errors, not the variables themselves!)

Detect: look at the residual plots, test fornormality

Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.

Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).

In multi-level

- Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”)

Add:

- xttest0 (see last week)

Add (extra):

Test visually whether the normality assumption holds, but do this for the random

Note: extra material(= not on the exam, bonus points if you know how to use it)

tab school, gen(sch_)

regy sch2 – sch28

gen coefs = .

for num 2/28: replace coefs =_b[schX] if _n==X

swilkcoefs

Assumptions

- Y is 0/1
- Ratio of cases to variables should be “reasonable”
- No cases where you have complete separation (Stata will remove these cases automatically)
- Linearity in the logit (comparable to “the true model should be linear” in multiple regression)
- Independence of errors (as in multiple regression)

Further things to do:

- Check goodness of fit and prediction for different groups (as done in the do-file you have)
- Check the correlation matrix for strong correlations between predictors (corr)
- Check for outliers using regress and diag(but don’t tell anyone I suggested this)

Download Presentation

Connecting to Server..