ASSUMPTION CHECKING

1 / 19

# ASSUMPTION CHECKING - PowerPoint PPT Presentation

ASSUMPTION CHECKING. In regression analysis with Stata In multi-level analysis with Stata (not much extra) In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS. Assumption checking in “normal” multiple regression with Stata.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'ASSUMPTION CHECKING' - lethia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ASSUMPTION CHECKING
• In regression analysis with Stata
• In multi-level analysis with Stata (not much extra)
• In logistic regression analysis with Stata

NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS

Assumption checking

in “normal” multiple regression

with Stata

Assumptions in regression analysis
• No multi-collinearity
• All relevant predictor variables
• included
• Homoscedasticity: all residuals are
• from a distribution with the same variance
• Linearity: the “true” model should be
• linear.
• Independent errors: having information
• about the value of a residual should not
• give you information about the value of
• other residuals
• Errors are distributed normally
FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals

Typical cases:

-repeatedmeasures

-clusteredobservations

(peoplewithinfirms /

pupilswithin schools)

Consequences: as forheteroscedasticity

Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).

Cure: usemulti-level analyses

In Stata:

Example:

the Stata “auto.dta” data set

sysuse auto

corr (correlation)

vif (variance inflation factors)

ovtest (omitted variable test)

hettest (heterogeneity test)

predict e, resid

swilk (test for normality)

Finding the commands
• “help regress”
•  “regress postestimation”

and you will find most of them (and more) there

Multi-collinearity

A strongcorrelationbetweentwoor more of your predictor variables

Youdon’t want it, because:

• It is more difficult to gethigher R’s
• The importance of predictorscanbedifficult to establish (b-hatstend to go to zero)
• The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”)

Detect:

• Look at correlation matrix of predictor variables
• calculateVIF-factorswhile running regression

Cure:

Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable

Homoscedasticity: all residuals are from a distribution with the samevariance

Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.

Testing for heteroscedasticity in Stata
• Your residuals should have the same variance for all values of Y hettest
• Your residuals should have the same variance for all values of X hettest, rhs
Errorsdistributednormally

Errors are distributednormally

(justthe errors, not the variables themselves!)

Detect: look at the residual plots, test fornormality

Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.

Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).

Errorsdistributednormally

First calculate the errors:

predict e, resid

Then test for normality

swilke

Assumption checking

in multi-level multiple regression

with Stata

In multi-level
• Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”)

• xttest0 (see last week)

Test visually whether the normality assumption holds, but do this for the random 

Note: extra material(= not on the exam, bonus points if you know how to use it)

tab school, gen(sch_)

regy sch2 – sch28

gen coefs = .

for num 2/28: replace coefs =_b[schX] if _n==X

swilkcoefs

Assumption checking

in multi-level multiple regression

with Stata

Assumptions
• Y is 0/1
• Ratio of cases to variables should be “reasonable”
• No cases where you have complete separation (Stata will remove these cases automatically)
• Linearity in the logit (comparable to “the true model should be linear” in multiple regression)
• Independence of errors (as in multiple regression)
Further things to do:
• Check goodness of fit and prediction for different groups (as done in the do-file you have)
• Check the correlation matrix for strong correlations between predictors (corr)
• Check for outliers using regress and diag(but don’t tell anyone I suggested this)