This presentation is the property of its rightful owner.
1 / 19

# ASSUMPTION CHECKING PowerPoint PPT Presentation

ASSUMPTION CHECKING. In regression analysis with Stata In multi-level analysis with Stata (not much extra) In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS. Assumption checking in “normal” multiple regression with Stata.

ASSUMPTION CHECKING

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

### ASSUMPTION CHECKING

• In regression analysis with Stata

• In multi-level analysis with Stata (not much extra)

• In logistic regression analysis with Stata

NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS

Assumption checking

in “normal” multiple regression

with Stata

### Assumptions in regression analysis

• No multi-collinearity

• All relevant predictor variables

• included

• Homoscedasticity: all residuals are

• from a distribution with the same variance

• Linearity: the “true” model should be

• linear.

• Independent errors: having information

• about the value of a residual should not

• give you information about the value of

• other residuals

• Errors are distributed normally

FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals

Typical cases:

-repeatedmeasures

-clusteredobservations

(peoplewithinfirms /

pupilswithin schools)

Consequences: as forheteroscedasticity

Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).

Cure: usemulti-level analyses

### In Stata:

Example:

the Stata “auto.dta” data set

sysuse auto

corr (correlation)

vif (variance inflation factors)

ovtest (omitted variable test)

hettest (heterogeneity test)

predict e, resid

swilk(test for normality)

### Finding the commands

• “help regress”

•  “regress postestimation”

and you will find most of them (and more) there

Multi-collinearity

A strongcorrelationbetweentwoor more of your predictor variables

Youdon’t want it, because:

• It is more difficult to gethigher R’s

• The importance of predictorscanbedifficult to establish (b-hatstend to go to zero)

• The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”)

Detect:

• Look at correlation matrix of predictor variables

• calculateVIF-factorswhile running regression

Cure:

Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable

### Misspecificationtests(replaces: all relevant predictor variables included)

Homoscedasticity: all residuals are from a distribution with the samevariance

Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.

### Testing for heteroscedasticity in Stata

• Your residuals should have the same variance for all values of Y hettest

• Your residuals should have the same variance for all values of X hettest, rhs

### Errorsdistributednormally

Errors are distributednormally

(justthe errors, not the variables themselves!)

Detect: look at the residual plots, test fornormality

Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.

Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).

### Errorsdistributednormally

First calculate the errors:

predict e, resid

Then test for normality

swilke

Assumption checking

in multi-level multiple regression

with Stata

### In multi-level

• Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”)

• xttest0 (see last week)

Test visually whether the normality assumption holds, but do this for the random 

### Note: extra material(= not on the exam, bonus points if you know how to use it)

tab school, gen(sch_)

regy sch2 – sch28

gen coefs = .

for num 2/28: replace coefs =_b[schX] if _n==X

swilkcoefs

Assumption checking

in multi-level multiple regression

with Stata

### Assumptions

• Y is 0/1

• Ratio of cases to variables should be “reasonable”

• No cases where you have complete separation (Stata will remove these cases automatically)

• Linearity in the logit (comparable to “the true model should be linear” in multiple regression)

• Independence of errors (as in multiple regression)

### Further things to do:

• Check goodness of fit and prediction for different groups (as done in the do-file you have)

• Check the correlation matrix for strong correlations between predictors (corr)

• Check for outliers using regress and diag(but don’t tell anyone I suggested this)