Assumption checking
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

ASSUMPTION CHECKING PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on
  • Presentation posted in: General

ASSUMPTION CHECKING. In regression analysis with Stata In multi-level analysis with Stata (not much extra) In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS. Assumption checking in “normal” multiple regression with Stata.

Download Presentation

ASSUMPTION CHECKING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


ASSUMPTION CHECKING

  • In regression analysis with Stata

  • In multi-level analysis with Stata (not much extra)

  • In logistic regression analysis with Stata

    NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS


Assumption checking

in “normal” multiple regression

with Stata


Assumptions in regression analysis

  • No multi-collinearity

  • All relevant predictor variables

  • included

  • Homoscedasticity: all residuals are

  • from a distribution with the same variance

  • Linearity: the “true” model should be

  • linear.

  • Independent errors: having information

  • about the value of a residual should not

  • give you information about the value of

  • other residuals

  • Errors are distributed normally


FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals

Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual.

Typical cases:

-repeatedmeasures

-clusteredobservations

(peoplewithinfirms /

pupilswithin schools)

Consequences: as forheteroscedasticity

Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!).

Cure: usemulti-level analyses


In Stata:

Example:

the Stata “auto.dta” data set

sysuse auto

corr (correlation)

vif (variance inflation factors)

ovtest (omitted variable test)

hettest (heterogeneity test)

predict e, resid

swilk(test for normality)


Finding the commands

  • “help regress”

  •  “regress postestimation”

    and you will find most of them (and more) there


Multi-collinearity

A strongcorrelationbetweentwoor more of your predictor variables

Youdon’t want it, because:

  • It is more difficult to gethigher R’s

  • The importance of predictorscanbedifficult to establish (b-hatstend to go to zero)

  • The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”)

    Detect:

  • Look at correlation matrix of predictor variables

  • calculateVIF-factorswhile running regression

    Cure:

    Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable


Stata: calculating the correlation matrix (“corr”) and VIF statistics (“vif”)


Misspecificationtests(replaces: all relevant predictor variables included)


Homoscedasticity: all residuals are from a distribution with the samevariance

Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.


Testing for heteroscedasticity in Stata

  • Your residuals should have the same variance for all values of Y hettest

  • Your residuals should have the same variance for all values of X hettest, rhs


Errorsdistributednormally

Errors are distributednormally

(justthe errors, not the variables themselves!)

Detect: look at the residual plots, test fornormality

Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong.

Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).


Errorsdistributednormally

First calculate the errors:

predict e, resid

Then test for normality

swilke


Assumption checking

in multi-level multiple regression

with Stata


In multi-level

  • Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”)

    Add:

  • xttest0 (see last week)

    Add (extra):

    Test visually whether the normality assumption holds, but do this for the random 


Note: extra material(= not on the exam, bonus points if you know how to use it)

tab school, gen(sch_)

regy sch2 – sch28

gen coefs = .

for num 2/28: replace coefs =_b[schX] if _n==X

swilkcoefs


Assumption checking

in multi-level multiple regression

with Stata


Assumptions

  • Y is 0/1

  • Ratio of cases to variables should be “reasonable”

  • No cases where you have complete separation (Stata will remove these cases automatically)

  • Linearity in the logit (comparable to “the true model should be linear” in multiple regression)

  • Independence of errors (as in multiple regression)


Further things to do:

  • Check goodness of fit and prediction for different groups (as done in the do-file you have)

  • Check the correlation matrix for strong correlations between predictors (corr)

  • Check for outliers using regress and diag(but don’t tell anyone I suggested this)


  • Login