1 / 36

HGLM

HGLM. HGLM . It really is little more than the combination of GLM and HLM Example Estimating the probability of voting Data: Cumulative NES file (24 NESs). Data. Micro level variables: Partisan strength Education Age White Income. Data. Macro level Random term

elina
Download Presentation

HGLM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HGLM

  2. HGLM • It really is little more than the combination of GLM and HLM • Example • Estimating the probability of voting • Data: Cumulative NES file (24 NESs)

  3. Data • Micro level variables: • Partisan strength • Education • Age • White • Income

  4. Data • Macro level • Random term • Presidential election

  5. R command • It is a hybrid of GLM and lmer M1<-lmer(y~x1+x2+x3+x4+x5+(1|year), family=binomial(link="logit"))

  6. Results Generalized linear mixed model fit using Laplace Formula: y ~ x1 + x2 + x3 + x4 + x5 + (1 | year) Family: binomial(logit link) AIC BIC logLik deviance 40427 40486 -20206 40413 Random effects: Groups Name Variance Std.Dev. year (Intercept) 0.243 0.493 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.582592 0.122241 29.3 < 2e-16 *** x1 -0.363597 0.012498 -29.1 < 2e-16 *** x2 -0.294791 0.008584 -34.3 < 2e-16 *** x3 -0.027922 0.000802 -34.8 < 2e-16 *** x4 -0.206562 0.032863 -6.3 3.3e-10 *** x5 -0.317875 0.012053 -26.4 < 2e-16 ***

  7. Add a year level variable • Note the standard deviation of the random term is 0.493

  8. Results Generalized linear mixed model fit using Laplace Formula: y ~ x1 + x2 + x3 + x4 + x5 + z + (1 | year) Family: binomial(logit link) AIC BIC logLik deviance 40391 40459 -20187 40375 Random effects: Groups Name Variance Std.Dev. year (Intercept) 0.0451 0.212 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.062479 0.096465 42.1 < 2e-16 *** x1 -0.364464 0.012496 -29.2 < 2e-16 *** x2 -0.291768 0.008535 -34.2 < 2e-16 *** x3 -0.027870 0.000802 -34.8 < 2e-16 *** x4 -0.213119 0.032828 -6.5 8.5e-11 *** x5 -0.319210 0.012050 -26.5 < 2e-16 *** z -0.885312 0.090501 -9.8 < 2e-16 ***

  9. Results • Adding the indicator of whether or not it is a presidential election year soaks up a lot of the mean level variance • Do we still need the random term? • Remember—this is a nuisance term. It is there to account for what we do not specify in the intercept equation. • Test is a deviance test • Difference in deviance is 183—yes we need it.

  10. Add a random slope • We want to see if we need random slopes • Start with a random slope for x1 • M3<-lmer (y~x1+x2+x3+x4+x5+z+(1+x1|year), family=binomial(link="logit"))

  11. Results Generalized linear mixed model fit using Laplace Formula: y ~ x1 + x2 + x3 + x4 + x5 + z + (1 + x1 | year) Family: binomial(logit link) AIC BIC logLik deviance 40023 40108 -20002 40003 Random effects: Groups Name Variance Std.Dev. Corr year (Intercept) 0.4595 0.678 x1 0.0543 0.233 -0.951 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.16969 0.16339 25.5 < 2e-16 *** x1 -0.38179 0.04930 -7.7 9.7e-15 *** x2 -0.29784 0.00861 -34.6 < 2e-16 *** x3 -0.02845 0.00081 -35.1 < 2e-16 *** x4 -0.21587 0.03314 -6.5 7.3e-11 *** x5 -0.32396 0.01213 -26.7 < 2e-16 *** z -0.89406 0.08948 -10.0 < 2e-16 ***

  12. Results • Ok, first that correlation is really high • Why? What is the intercept? • When all the x’s equal zero. • But none of the x’s are ever zero • The data are not centered • So, subtract off the median

  13. New Results Generalized linear mixed model fit using Laplace Formula: y ~ x1b + x2b + x3b + x4b + x5b + z + (1 + x1b | year) Family: binomial(logit link) AIC BIC logLik deviance 40023 40108 -20002 40003 Random effects: Groups Name Variance Std.Dev. Corr year (Intercept) 0.0468 0.216 x1b 0.0543 0.233 0.252 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.06417 0.07202 -0.9 0.37 x1b -0.38132 0.04929 -7.7 1.0e-14 *** x2b -0.29784 0.00861 -34.6 < 2e-16 *** x3b -0.02844 0.00081 -35.1 < 2e-16 *** x4b -0.21587 0.03314 -6.5 7.4e-11 *** x5b -0.32394 0.01213 -26.7 < 2e-16 *** z -0.89418 0.08939 -10.0 < 2e-16 ***

  14. Results • The correlation is moderate • The coefficient on X1 changed slightly, but not much from model without random effect • Deviance test says we need the random slope term • But what if the slope varies as a function of presidential election? • Add the term

  15. Generalized linear mixed model fit using Laplace Formula: y ~ x1b + x2b + x3b + x4b + x5b + z + x1b:z + (1 | year) Family: binomial(logit link) AIC BIC logLik deviance 40365 40442 -20174 40347 Random effects: Groups Name Variance Std.Dev. year (Intercept) 0.0443 0.210 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.051128 0.071366 -0.7 0.47 x1b -0.299611 0.017549 -17.1 < 2e-16 *** x2b -0.291545 0.008530 -34.2 < 2e-16 *** x3b -0.027875 0.000802 -34.8 < 2e-16 *** x4b -0.212297 0.032868 -6.5 1.1e-10 *** x5b -0.319416 0.012048 -26.5 < 2e-16 *** z -0.916731 0.089988 -10.2 < 2e-16 *** x1b:z -0.128740 0.024592 -5.2 1.7e-07 *** • The interaction term adds to the model now, but what if we add the random slope?

  16. Generalized linear mixed model fit using Laplace Formula: y ~ x1b + x2b + x3b + x4b + x5b + z + x1b:z + (1 + x1b | year) Family: binomial(logit link) AIC BIC logLik deviance 40024 40118 -20001 40002 Random effects: Groups Name Variance Std.Dev. Corr year (Intercept) 0.0467 0.216 x1b 0.0513 0.226 0.248 number of obs: 36752, groups: year, 24 • Estimated scale (compare to 1 ) 1 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.05055 0.07299 -0.7 0.49 x1b -0.32210 0.07065 -4.6 5.1e-06 *** x2b -0.29783 0.00861 -34.6 < 2e-16 *** x3b -0.02845 0.00081 -35.1 < 2e-16 *** x4b -0.21610 0.03314 -6.5 7.0e-11 *** x5b -0.32389 0.01213 -26.7 < 2e-16 *** z -0.91938 0.09224 -10.0 < 2e-16 *** x1b:z -0.11019 0.09619 -1.1 0.25 • Now the interaction term is insignificant!

  17. The interaction does not add to the model • If we run the paired model comparisons we find: • Including the interaction term is better than omitting it if there is no random slope • Including the random slope is better than omitting it • The model with the random term and the interaction is not superior to the model with only the random slope • It is insignificant • The deviance test tells us to reject including it • So? Best model omits it. We don’t need it for specification (though we might for theory).

  18. Other x’s • Long story short, both the random term and the interaction with presidential election improve the model for x2, x4, & x5 • Only the random term improves fit for x3

  19. Multiple random slopes If we add a random slope on x2, we improve the model Formula: y ~ x1b + x2b + x3b + x4b + x5b + z + x2b:z + x4b:z + x5b:z + (1 + x1b + x2b | year) Family: binomial(logit link) AIC BIC logLik deviance 39425 39561 -19697 39393 Random effects: Groups Name Variance Std.Dev. Corr year (Intercept) 0.0789 0.281 x1b 0.0709 0.266 0.328 x2b 0.0228 0.151 0.383 0.988 number of obs: 36752, groups: year, 24 Estimated scale (compare to 1 ) 1

  20. Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.142094 0.093006 -1.5 0.12657 x1b -0.295615 0.055877 -5.3 1.2e-07 *** x2b -0.246501 0.034072 -7.2 4.7e-13 *** x3b -0.029652 0.000818 -36.3 < 2e-16 *** x4b -0.148631 0.047011 -3.2 0.00157 ** x5b -0.243784 0.016835 -14.5 < 2e-16 *** z -0.462598 0.124808 -3.7 0.00021 *** x2b:z -0.045045 0.023232 -1.9 0.05251 . x4b:z -0.251922 0.065749 -3.8 0.00013 *** x5b:z -0.177566 0.024155 -7.4 2.0e-13 ***

  21. The correlation between the random slopes is really high • Adding a random slope for x3 is intractable—you get negative estimates of the standard deviations. • Serious problem

  22. Item response • Basic idea • Each person has multiple indicators which tap the underlying concept of interest • Usually, not everyone gets the same indicators • No indicator is used for every person • Indicators differ in their difficulty • So, the dv is the probability that the answer, from the specific person on the specific question is a success (a 1)

  23. IRT • Where the αj is person j’s latent ability • and βk is question k’s difficulty • If people get different questions, we need to add a subscript i to denote which response we are speaking of

  24. IRT • Examples: • Roll call votes • Supreme Court decisions • Test scores (SAT, GRE) • Democracy • Knowledge • Inherent problem is that we need to estimate a person’s ability at the same time as we estimate the question’s difficulty

  25. Identifiability • The other problem is that the model is not identified • Add a constant to all of abilities and all of the difficulties and get the same answer • Just need to constrain the problem somehow—force a question to have a difficulty or a person to have an ability • If 0 & 1 aren’t natural you have a reflexive problem too • Easy

  26. IRT • How about we make this a multilevel problem: • Solve identification by fixing one of the means to zero.

  27. IRT • We can easily add group level predictors: • What are these? • The X’s in the first equation are things that we think predict the person’s ability (gender, race, party,…) • In the second equation they are whatever information we have about the difficulty of the question that is separate from what the data have to tell us.

  28. IRT • Hold on a minute • What are these? • The basic idea that is that everyone has a probability of getting a question right. • That probability is based on two things: • Your ability • How hard the question is • Given these two things, the probability can be defined for each person

  29. IRT

  30. IRT

  31. IRT • The basic prediction is easy: Person j will be a success on question k if his or her ability is greater than the difficulty of the question. • So you want to find the set of difficulty and ability parameters that best fit the data

  32. IRT-discrimination • The graph two slides ago (parallel lines) assumed that the questions were equally good at discriminating based on ability • More specifically that the effect of ability on each of the questions was the same—fixed slope • We can allow that to vary: • Gamma defines the ability of the question to discriminate—higher values mean that the question is a better predictor. That also means a sharper curve

  33. IRT

  34. IRT • We won’t estimate this yet • It is really hard to estimate and needs Bayesian • It is, however, pretty cool. Lots of applications • This is a fundamental measurement issue • Can improve on classic factor analyses or standard scaling techniques

  35. Other HGLM models • The R code is basically the same as the logit—put a different link and family • Problem is that the Likelihood based estimation problems get worse. My experience is that the binary choice is the easiest. • More parameters to estimate • Wait until we can do this as Bayesian

  36. Next time, MCMC • This gets into detailed probability theory • Gelman and Hill Chapter 18 • Jackman, Simon. 2000. Estimation and Inference via Bayesian Simulation. AJPS 375-404. • Casella and George. 1992. Explaining the Gibbs Sampler. American Statistician. 167-174. • “Markov Chain Monte Carlo in Practice: A Roundtable Discussion.” American Statistician 1998. 93-100.

More Related