ASSUMPTION CHECKING. In regression analysis with Stata In multi-level analysis with Stata (not much extra) In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS. Assumption checking in “normal” multiple regression with Stata.
NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS
in “normal” multiple regression
NOTHING NEW IN STATA
(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)
Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals
Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual.
Consequences: as forheteroscedasticity
Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).
Cure: usemulti-level analyses
the Stata “auto.dta” data set
vif (variance inflation factors)
ovtest (omitted variable test)
hettest (heterogeneity test)
predict e, resid
swilk (test for normality)
and you will find most of them (and more) there
A strongcorrelationbetweentwoor more of your predictor variables
Youdon’t want it, because:
Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable
Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.
Errors are distributednormally
(justthe errors, not the variables themselves!)
Detect: look at the residual plots, test fornormality
Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.
Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).
First calculate the errors:
predict e, resid
Then test for normality
Test visually whether the normality assumption holds, but do this for the random
tab school, gen(sch_)
regy sch2 – sch28
gen coefs = .
for num 2/28: replace coefs =_b[schX] if _n==X