categorical data analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Categorical Data Analysis PowerPoint Presentation
Download Presentation
Categorical Data Analysis

Loading in 2 Seconds...

play fullscreen
1 / 151

Categorical Data Analysis - PowerPoint PPT Presentation

  • Uploaded on

Categorical Data Analysis. Week 2. Binary Response Models. binary and binomial responses binary: y assumes values of 0 or 1 binomial: y is number of “successes” in n “ trials” distributions Bernoulli: Binomial:. Transformational Approach. linear probability model

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Categorical Data Analysis' - corin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
binary response models
Binary Response Models
  • binary and binomial responses
    • binary: y assumes values of 0 or 1
    • binomial: y is number of “successes” in n “trials”
  • distributions
    • Bernoulli:
    • Binomial:
transformational approach
Transformational Approach
  • linear probability model
    • use grouped data (events/trials):
    • “identity” link:
    • linear predictor:
    • problems of prediction outside [0,1]
the logit model
The Logit Model
  • logit transformation:
  • inverse logit:
  • ensures that p is in [0,1] for all values of x and .
the logit model1
The Logit Model
  • odds and odds ratios are the key to understanding and interpreting this model
  • the log odds transformation is a “stretching” transformation to map probabilities to the real line
the logit transformation
The Logit Transformation
  • properties of logit


odds odds ratios and relative risk
Odds, Odds Ratios, and Relative Risk
  • odds of “success” is the ratio:
  • consider two groups with success probabilities:
  • odds ratio (OR) is a measure of the odds of success in group 1 relative to group 2
odds ratio
Odds Ratio


0 1

  • 2 X 2 table:
  • OR is the cross-product ratio (compare x = 1 group to x = 0 group)
  • odds of y = 1 are 4 times higher when x =1 than when x = 0




odds ratio1
Odds Ratio
  • equivalent interpretation
  • odds of y = 1 are 0.225 times higher when x = 0 than when x = 1
  • odds of y = 1 are 1-0.225 = .775 times lower when x = 0 than when x = 1
  • odds of y = 1 are 77.5% lower when x = 0 than when x = 1
log odds ratios
Log Odds Ratios
  • Consider the model:
  • D is a dummy variable coded 1 if group 1 and 0 otherwise.
  • group 1:
  • group 2:
  • LOR: OR:
relative risk
Relative Risk
  • similar to OR, but works with rates
  • relative risk or rate ratio (RR) is the rate in group 1 relative to group 2
  • OR RR as .
tutorial odds and odds ratios
Tutorial: odds and odds ratios
  • consider the following data
tutorial odds and odds ratios1
Tutorial: odds and odds ratios
  • read table:


input educ psex f

0 0 873

0 1 1190

1 0 533

1 1 1208


label define edlev 0 "HS or less" 1 "Col or more"

label val educ edlev

label var educ education

tutorial odds and odds ratios2
Tutorial: odds and odds ratios
  • compute odds:
  • verify by hand

tabodds psex educ [fw=f]

tutorial odds and odds ratios3
Tutorial: odds and odds ratios
  • compute odds ratios:
  • verify by hand

tabodds psex educ [fw=f], or

tutorial odds and odds ratios4
Tutorial: odds and odds ratios
  • stat facts:
    • variances of functions
      • use in statistical significance tests and forming confidence intervals
    • basic rule for variances of linear transformations
      • g(x) = a + bx is a linear function of x, then
      • this is a trivial case of the delta method applied to a single variable
      • the delta method for the variance of a nonlinear function g(x) of a single variable is
tutorial odds and odds ratios5
Tutorial: odds and odds ratios
  • stat facts:
    • variances of odds and odds ratios
      • we can use the delta method to find the variance in the odds and the odds ratios
      • from the asymptotic (large sample theory) perspective it is best to work with log odds and log odds ratios
      • the log odds ratio converges to normality at a faster rate than the odds ratio, so statistical tests may be more appropriate on log odds ratios (nonlinear functions of p)
tutorial odds and odds ratios6
Tutorial: odds and odds ratios
  • stat facts:
      • the log odds ratio is the difference in the log odds for two groups
      • groups are independent
      • variance of a difference is the sum of the variances
tutorial odds and odds ratios7
Tutorial: odds and odds ratios
  • data structures: grouped or individual level
    • note:
      • use frequency weights to handle grouped data
      • or we could “expand” this data by the frequency weights resulting in individual-level data
      • model results from either data structures are the same
    • expand the data and verify the following results

expand f

tutorial odds and odds ratios8
Tutorial: odds and odds ratios
  • statistical modeling
    • logit model (glm):
    • logit model (logit):

glm psex educ [fw=f], f(b) eform

logit psex educ [fw=f], or

tutorial odds and odds ratios9
Tutorial: odds and odds ratios
  • statistical modeling (#1)
    • logit model (glm):
tutorial odds and odds ratios10
Tutorial: odds and odds ratios
  • statistical modeling (#2)
    • some ideas from alternative normalizations
      • what parameters will this model produce?
      • what is the interpretation of the “constant”

gen cons = 1

glm psex cons educ [fw=f], nocons f(b) eform

tutorial odds and odds ratios11
Tutorial: odds and odds ratios
  • statistical modeling (#2)
tutorial odds and odds ratios12
Tutorial: odds and odds ratios
  • statistical modeling (#3)
      • what parameters does this model produce?
      • how do you interpret them?

gen lowed = educ == 0

gen hied = educ == 1

glm psex lowed hied [fw=f], nocons f(b) eform

tutorial odds and odds ratios13
Tutorial: odds and odds ratios
  • statistical modeling (#3)

are these odds ratios?

tutorial prediction
Tutorial: prediction
  • fitted probabilities (after most recent model)

predict p, mu

tab educ [fw=f], sum(p) nostandard nofreq

probit model
Probit Model
  • inverse probit is the CDF for a standard normal variable:
  • link function:
  • probit coefficients
    • interpreted as a standard normal variables (no log odds-ratio interpretation)
    • “scaled” versions of logit coefficients
  • probit models
    • more common in certain disciplines (economics)
    • analogy with linear regression (normal latent variable)
    • more easily extended to multivariate distributions
example grouped data
Example: Grouped Data
  • Swedish mortality data revisited

logit model

probit model

  • Stata: generalized linear model (glm)

glm y A2 A3 P2, family(b n) link(probit)

glm y A2 A3 P2, family(b n) link(logit)

  • idea of glm is to make model linear in the link.
    • old days: Iteratively Reweighted Least Squares
    • now: Fisher scoring, Newton-Raphson
    • both approaches yield MLEs
generalized linear models
Generalized Linear Models
  • applies to a broad class of models
    • iterative fitting (repeated updating)except for linear model
    • update parameters, weights W, and predicted values m
    • models differ in terms of W and m and assumptions about the distribution of y
    • common distributions for yinclude: normal, binomial, and Poisson
    • common links include: identity, logit, probit, and log
latent variable approach
Latent Variable Approach
  • example: insect mortality
    • suppose a researcher exposes insects to dosage levels (u) of an insecticide and observes whether the “subject” lives or dies at that dosage.
    • the response is expected to depend on the insect’s tolerance (c) to that dosage level.
    • the insect dies if u > c and survives if u < c
    • tolerance is not observed (survival is observed)
latent variables
Latent Variables
  • u and c are continuous latent variables
    • examples:
      • women’s employment: u is the market wage and c is the reservation wage
      • migration: u is the benefit of moving and c is the cost of moving.
    • observed outcome y =1 or y = 0 reveals the individual’s preference, which is assumed to maximize a rational individual’s utility function.
latent variables1
Latent Variables
  • Assume linear utility and criterion functions
  • over-parameterization = identification problem
    • we can identify differences in components but not the separate components
latent variables2
Latent Variables
  • constraints:
    • Then:
    • where F(.) is the CDF of ε
latent variables and standardization
Latent Variables and Standardization
  • Need to standardize the mean and variance of ε
    • binary dependent variables lack inherent scales
    • magnitude of βis only in reference to the mean and variance of ε which are unknown.
    • redefine ε to a common standard
    • where a and b are two chosen constants.
standardization for logit and probit models
Standardization for Logit and Probit Models
  • standardization implies
  • F*() is the cdf of ε*
  • location a and scale b need to be fixed
    • setting
    • and
standardization for logit and probit models1
Standardization for Logit and Probit Models
  • distribution of ε is standardized
    • standard normal probit
    • standard logistic  logit
  • both distributions have a mean of 0
  • variances differ
extending the latent variable approach
Extending the Latent Variable Approach
  • observed y is a dichotomous (binary) 0/1 variable
    • continuous latent variable:
      • linear predictor + residual
    • observed outcome
  • conditional means of latent variables obtained from index function:
  • obtain probabilities from inverse link functions

logit model:

probit model:

  • likelihood function
    • where if data are binary
  • log-likelihood function
assessing models
Assessing Models
  • definitions:
    • L null model (intercept only):
    • L saturated model (a parameter for each cell):
    • L current model:
  • grouped data (events/trials)
    • deviance (likelihood ratio statistic)
  • grouped data:
    • if cell sizes are reasonably large deviance is distributed as chi-square
  • individual-level data: Lf=1 and log Lf=0
    • deviance is not a “fit” statistic
  • deviance is like a residual sum of squares
    • larger values indicate poorer models
    • larger models have smaller deviance
  • deviance for the more constrained model (Model 1)
  • deviance for the less constrained model (Model 2)
  • assume that Model 1 is a constrained version of Model 2.
difference in deviance
Difference in Deviance
  • evaluate competing “nested” models using a likelihood ratio statistic
  • model chi-square is a special case
  • SAS, Stata, R, etc. report different statistics
other fit statistics
Other Fit Statistics
  • BIC & AIC (useful for non-nested models)
    • basic idea of IC : penalize log L for the number of parameters (AIC/BIC) and/or the size of the sample (BIC)
    • AIC s=1
    • BIC s= ½ log n (sample size)
    • dfmis the number of model parameters
hypothesis tests inference
Hypothesis Tests/Inference
  • single parameter:
    • MLE are asymptotically normal Z-test
  • multi-parameter:
    • likelihood ratio tests (after fitting)
    • Wald tests (test constraints from current model)
hypothesis tests inference1
Hypothesis Tests/Inference
  • Wald test (tests a vector of restrictions)
    • a set of r parameters are all equal to 0
    • a set of r parameters are linearly restricted

restriction matrix

constraint vector

parameter subset

interpreting parameters
Interpreting Parameters
  • odds ratios: consider the model where x is a continuous predictor and d is a dummy variable
  • suppose that d denotes sex and x denotes income and the problem concerns voting, where y* is the propensity to vote
  • results: logit(pi) = -1.92 + 0.012xi + 0.67di
interpreting parameters1
Interpreting Parameters
  • for d(dummy variable coded 1 for female) the odds ratio is straightforward
    • holding income constant, women’s odds of voting are nearly twice those of men
interpreting parameters2
Interpreting Parameters
  • for x(continuous variable for income in thousands of dollars) the odds ratio is a multiplicative effect
    • suppose we increase income by 1 unit ($1,000)
    • suppose we increase income by c units (cх $1,000$
interpreting parameters3
Interpreting Parameters
  • if income is increased by $10,000, this increases the odds of voting by about 13%
    • a note on percent change in odds:
    • if estimate of β > 0 then percent increase in odds for a unit change in x is
    • if estimate of β < 0 then percent decrease in odds for a unit change in x is
marginal effects
Marginal Effects
  • marginal effect:
    • effect of change in x on change in probability
    • pdf cdf
    • often we evaluate f(.) at the mean of x.
marginal effect of a change in a dummy variable
Marginal Effect of a Change in a Dummy Variable
  • if x is a continuous variable and z is a dummy variable
    • marginal effect of change in z from 0 to 1 is the difference
  • logit models for high school graduation
          • odds ratios (constant is baseline odds)
lr test
LR Test
  • Model 3 vs. 2
wald test
Wald Test
  • Test equality of parental education effects

logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

test mhs=fhs

test mcol=fcol

cannot reject H of equal parental education effects on HS graduation

basic estimation commands stata
Basic Estimation Commands (Stata)

estimation commands

model tests

* model 0 - null model

qui logit hsg

est store m0

* model 1 - race, sex, family structure

qui logit hsg blk hsp female nonint

est store m1

* model 1a - race X family structure interactions

qui xi: logit hsg blk hsp female nonint i.nonint*i.blk i.nonint*i.hsp

est store m1a

lrtest m1 m1a

* model 2 - SES

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol

est store m2

* model 3 - Indiv

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

est store m3

lrtest m2 m3

Fit Statistics etc.

* some 'hand' calculations with saved results

scalar ll = e(ll)

scalar npar = e(df_m)+1

scalar nobs = e(N)

scalar AIC = -2*ll + 2*npar

scalar BIC = -2*ll + log(nobs)*npar

scalar list AIC

scalar list BIC

* or use automated fitstat routine


*output as a table

estout1 m0 m1 m2 m3 using modF07, replace star stfmt(%9.2f %9.0f %9.0f) ///

stats(ll N df_m) eform


Generate Income Quartiles

qui sum adjinc, det

* quartiles for income distribution

gen incQ1 = adjinc < r(p25)

gen incQ2 = adjinc >= r(p25) & adjinc < r(p50)

gen incQ3 = adjinc >= r(p50) & adjinc < r(p75)

gen incQ4 = adjinc >= r(p75)

gen incQ = 1 if incQ1==1

replace incQ = 2 if incQ2==1

replace incQ = 3 if incQ3==1

replace incQ = 4 if incQ4==1

tab incQ


Fit Model for Each Quartile

calculate predictions

* look at marginal effects of test score on graduation by selected groups

* (1) model (income quartiles)

local i = 1

while `i' < 5 {

logit hsg blk female mhs nonint nsibs urban so wtest if incQ ==`i'


cap drop wm*

cap drop bm*

prgen wtest, x(blk=0 female=0 mhs=1 nonint=0) gen(wmi) from(-3) to(3)

prgen wtest, x(blk=0 female=0 mhs=1 nonint=1) gen(wmn) from(-3) to(3)

label var wmip1 "white/intact"

label var wmnp1 "white/nonintact"

prgen wtest, x(blk=1 female=0 mhs=1 nonint=0) gen(bmi) from(-3) to(3)

prgen wtest, x(blk=1 female=0 mhs=1 nonint=1) gen(bmn) from(-3) to(3)

label var bmip1 "black/intact"

label var bmnp1 "black/nonintact"



set scheme s2mono

twoway (line wmip1 wmix, sort xtitle("Test Score") ytitle("Pr(y=1)")) ///

(line wmnp1 wmix, sort) (line bmip1 wmix, sort) (line bmnp1 wmix, sort), ///

subtitle("Marginal Effect of Test Score on High School Graduation" ///

"Income Quartile `i'" ) saving(wtgrph`i', replace)

graph export wtgrph`i'.eps, as(eps) replace

local i = `i' + 1


fitted probabilities
Fitted Probabilities

logit hsg blk female mhs nonint inc nsibs urban so wtest

prtab nonint blk female

fitted probabilities1
Fitted Probabilities
  • predicted values
    • evaluate fitted probabilities at the sample mean values of x (or other fixed quantities)
    • averaging fitted probabilities over subgroup-specific models will produce marginal probabilities
alternative probability model
Alternative Probability Model
  • complementary log –log (cloglog or CLL)
    • standard extreme-value distribution for u:
    • cloglog model:
    • cloglog link function:
extreme value distribution
Extreme-Value Distribution
  • properties
    • mean of u (Euler’s constant):
    • variance of u:
    • difference in two independent extreme value variables yields a logistic variable
cll model
CLL Model
  • no “practical” differences from logit and probit models
    • often suited for survival data and other applications
    • interpretation of coefficients:
      • exp(β) is a relative risk or hazard ratio not an OR
      • glm: binomial distribution for y with a cloglog link
      • cloglog: use the cloglog command directly
cloglog and logit model compared
Cloglog and Logit Model Compared



more agreement when modeling rare events

extensions multilevel data
Extensions: Multilevel Data
  • what is multilevel data?
    • individuals are “nested” in a larger context:
      • children in families, kids in schools etc.

context 1

context 2

context 3

multilevel data
Multilevel Data
  • i.i.d. assumptions?
    • the outcomes for units in a given context could be associated
    • standard model would treat all outcomes (regardless of context) as independent
    • multilevel methods account for the within-cluster dependence
    • a general problem with binomial responses
      • we assume that trials are independent
      • this might not be realistic
      • non-independence will inflate the variance (overdispersion)
multilevel data1
Multilevel Data
  • example (in book):
    • 40 universities as units of analysis
      • for each university we observe the number of graduates (n) and the number receiving post-doctoral fellowships (y)
      • we could compute proportions (MLEs)
      • some proportions would be “better” estimates as they would have higher precision or lower variance
      • example: the data y1/n1 = 2/5 and y2/n2 = 20/50 give identical estimates of p but variances of 0.048 and 0.0048 respectively
      • the 2nd estimate is more precise than the 1st
multilevel data2
Multilevel Data
  • multilevel models allow for improved predictions of individual probabilities
    • MLE estimate is unaltered if it is precise
    • MLE estimate moved toward average if it is imprecise (shrinkage)
      • multilevel estimate of p would be a weighted average of the MLE and the average over all MLEs (weight (w) is based on the variance of each MLE and the variance over all the MLEs)
      • we are generally less interested in the p’s and more interested in the model parameters and variance components
shrinkage estimation
Shrinkage Estimation
  • primitive approach
      • assume we have a set of estimates (MLEs)
    • our best estimate of the variance of each MLE is
      • this is the within variance (no pooling)
      • if this is large, then the MLEis a poor estimate
        • a better estimate might be the average of the MLEs in this case (pooling the estimates)
      • we can average the MLEsand estimate the between variance as
shrinkage estimation1
Shrinkage Estimation
  • primitive approach
    • we can then estimate a weight wi
    • a revised estimate of pi would take account of the precision to for a precision-weighted average
      • precision is a function of ni
      • more weight is given to more precise MLE’s

results from full Bayesian (multilevel) Analysis

extension multilevel models
Extension: Multilevel Models
  • assumptions
    • within-context and between-context variation in outcomes
    • individuals within the same context share the same “random error” specific to that context
    • models are hierarchical
      • individuals (level-1)
      • contexts (level-2)
multilevel models background
Multilevel Models: Background
  • linear mixed model for continuous y

(multilevel, random coefficients, etc.)

    • level-1 model and level-2 sub-models (hierarchical)
multilevel models background1
Multilevel Models: Background
  • linear mixed model assumptions
    • level-1 and level-2 residuals
multilevel models background2
Multilevel Models: Background

composite residual

  • composite form

fixed effects

cross-level interaction

random effects (level-2)

multilevel models background4
Multilevel Models: Background
  • general form (linear mixed model)

variables associated with fixed coefficients

variables associated with random coefficients

multilevel models logit models
Multilevel Models: Logit Models
  • binomial model (random effect)
  • assumptions
  • u increases or decreases the expected response for individual j in context i independently of x
  • all individuals in context i share the same value of u
  • also called a random intercept model
multilevel models
Multilevel Models
  • a hierarchical model:
      • z is a level-1 variable; x is a level-2 variable
      • random intercept varies among level-2 units
      • note: level-1 residual variance is fixed (why?)
multilevel models1
Multilevel Models
  • a general expression
  • x are variables associated with “fixed” coefficients
  • zare variables associated with “random” coefficients
  • u is multivariate normal vector of level-2 residuals
  • mean of u is 0; covariance of u is
multilevel models2
Multilevel Models
  • random effects vs. random coefficients
    • random effects u
    • random coefficients β + u
  • variance components
    • interested in level-2 variation in u
  • prediction
    • E(y) is not equal to E(y|u)
    • model based predictions need to consider random effects
multilevel models generalized linear mixed models glmm
Multilevel Models: Generalized Linear Mixed Models (GLMM)

Conditional Expectation

Marginal Expectation

requires numerical integration or simulation

data structure
Data Structure
  • multilevel data structure
    • requires a “context” id to identify individuals belonging to the same context
    • NLSY sibling data contains a “family id” (constructed by researcher)
    • data are unbalanced (we do not require clusters to be the same size)
    • small clusters will contribute less information to the estimation of variance components than larger clusters
    • it is OK to have clusters of size 1

(i.e., an individual is a context unto themselves)

    • clusters of size 1 contribute to the estimation of fixed effects but not to the estimation of variance components
example clustered data
Example: clustered data
  • siblings nested in families
    • y is 1st premarital birth for NLSY women
    • select sib-ships of size > 2
    • null model (random intercept):

xtlogit fpmbir, i(famid)


xtmelogit fpmbir || famid:

example clustered data1
Example: clustered data

random intercept: xtlogit

example clustered data2
Example: clustered data

random intercept: xtmelogit

variance component
Variance Component
  • add predictors (mostly level-2)
variance component1
Variance Component
  • conditional variance in u is 2.107
  • proportionate reduction in error (PRE)
  • a 31% reduction in level-2 variance when level-2 predictors are accounted for
random effects
Random Effects
  • we can examine the distribution of random effects
random effects1
Random Effects
  • we can examine the distribution of random effects
random effects distribution
Random Effects Distribution
  • 90th percentile u90 = 1.338
  • 10th percentile u10 = 0.388
    • the risk for family at 90th percentile is

exp(1.338 – 0.388) = 2.586

times higher than for a family at the 10th percentile

    • even if families are compositionally identical on covariates, we can assess the hypothetical differential in risks
growth curve models
Growth Curve Models
  • growth models
    • individuals are level-2 units
    • repeated measures over time on individuals (level-1)
    • models imply that logits vary across individuals
      • intercept (conditional average logit) varies
      • slope (conditional average effect of time) varies
      • change is usually assumed to be linear
    • use GLMM
      • complications due to dimensionality
      • intercept and slope may co-vary (necessitating a more complex model) and more
growth curve models1
Growth Curve Models
  • multilevel logit model for change over time
  • T is time (strictly increasing)
  • fixed and random coefficients (with covariates)

assume that u0 and u1 are bivariate normal

multilevel logit models for change
Multilevel Logit Models for Change
  • Example: Log odds of employment of black men in the U.S. 1982-1988 (NLSY)

(consider 5 years in this period)

    • time is coded 0, 1, 3, 4, 6
    • dependent variable is: not-working, not-in-school
    • unconditional growth (no covariates except T)
    • conditional growth (add covariates)
    • note: cross-level interactions implied by composite model
fitting multilevel model for change
Fitting Multilevel Model for Change
  • programming
    • Stata (unconditional growth)
    • Stata (conditional growth)

xtmelogit y year || id: year, var cov(un)

xtmelogit y year south unem unemyr inc hs ||id: year, var cov(un)

logits observed conditional and marginal
Logits: Observed, Conditional, and Marginal

the log odds of idleness decreases with time and shows variation in level and change

composite residuals in a growth model
Composite Residuals in a Growth Model
  • composite residual
  • composite residual variance
  • covariance of composite residual
  • covariance term is 0 (from either model)
    • results in simplified interpretation
    • easier estimation via variance components (default option)
  • significant variation in slopes and initial levels
  • other results:
      • log odds of idleness decrease over time (negative slope)
      • other covariates except county unemployment have significant effects on the odds of idleness
      • the main effects are interpreted as effects on initial logits at time 1 or t = 0 or the 1982 baseline)
      • interaction of time and unemployment rate captures the effect of county unemployment rate in 1982 on the change log odds of idleness
      • the positive effect implies that higher county unemployment tends to dampen change in odds
irt models
IRT Models
  • IRT models
    • Item Response Theory
      • models account for an individual-level random effect on a set of items (i.e., ability)
      • items are assumed to tap a single latent construct (aptitude on a specific subject)
      • item difficulty
        • test items are assumed to be ordered on a difficulty scale
          • easier  harder
          • expected patterns emerge whereby if a more difficult item is answered correctly the easier items are likely to have been answered correctly
irt models1
IRT Models
  • IRT models
    • 1-parameter logistic (Rasch) model
      • pij individual i’s probability of a correct response on the jth item
      • θ individual i’s ability
      • b item j’s difficulty
    • properties
      • an individual’s ability parameter is invariant with respect to the item
      • the difficulty parameter is invariant with respect to individual’s ability
      • higher ability or lower item difficulty lead to a higher probability of a correct response
      • both ability and difficulty are measured on the same scale
  • item characteristics curve (item response curve)
    • depicts the probability of a correct response as a function of an examinee’s ability or trait level
    • curves are shifted rightward with increasing item difficulty
    • assume that item 3 is more difficult than item 2 and item 2 is more difficult than item 1
    • probability of a correct response decreases as the threshold θ = bj is crossed, reflecting increasing item difficulty
irt models icc 3 items
IRT Models: ICC (3 Items)

slopes of item characteristics curves are equal when ability = item difficulty

estimation as glmm
Estimation as GLMM
  • specification:
      • set up a person-item data structure
      • define x as a set of dummy variables
      • change signs on β to reflect “difficulty”
      • fit model without intercept to estimate all item difficulties
      • normalization is common
pl1 estimation
PL1 Estimation
  • Stata (data set up )


set memory 128m

infile junk y1-y5 f using LSAT.dat

drop if junk==11 | junk==13

expand f

drop f junk

gen cons = 1

collapse (sum) wt2=cons, by(y1-y5)

gen id = _n

sort id

reshape long y, i(id) j(item)

pl1 estimation1
PL1 Estimation
  • Stata (model set up )

gen i1 = 0

gen i2 = 0

gen i3 = 0

gen i4 = 0

gen i5 = 0

replace i1 = 1 if item == 1

replace i2 = 1 if item == 2

replace i3 = 1 if item == 3

replace i4 = 1 if item == 4

replace i5 = 1 if item == 5


* 1PL

* constrain sd=1

cons 1 [id1]_cons = 1

gllamm y i1-i5, i(id) weight(wt) nocons family(binom) cons(1) link(logit) adapt

pl1 estimation2
PL1 Estimation
  • Stata (output )
pl1 estimation3
PL1 Estimation
  • Stata (parameter normalization)

* normalized solution

*[1 -- standard 1PL]

*[2 -- coefs sum to 0] [var = 1]


bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

mb = mean(b')

bs = b:-mb

("MML Estimates", "IRT parameters", "B-A Normalization")

(-b', b', bs')


pl1 estimation4
PL1 Estimation
  • Stata (normalized solution)
irt extensions
IRT: Extensions

item discrimination parameters

  • 2-parameter logistic (2PL) model
irt extensions1
IRT: Extensions
  • 2-parameter logistic (2PL) model
    • item discrimination parameters
      • reveal differences in item’s utility to distinguish different ability levels among examinees
        • high values denote items that are more useful in terms of separating examinees into different ability levels
        • low values denote items that are less useful in distinguishing examinees in terms of ability
        • ICCs corresponding to this model can intersect as they differ in location and slope
          • steeper slope of the ICC is associated with a better discriminating item
irt extensions2
IRT: Extensions
  • 2-parameter logistic (2PL) model
irt extensions3
IRT: Extensions
  • 2-parameter logistic (2PL) model
    • Stata (estimation)

eq id: i1 i2 i3 i4 i5

cons 1 [id1_1]i1 = 1

gllamm y i1-i5, i(id) weight(wt) nocons family(binom) link(logit) frload(1) eqs(id) cons(1) adapt

matrix list e(b)

*normalized solutions

*1 standard 2PL)


bALL = st_matrix("e(b)")

b = bALL[1,1..5]

c = bALL[1,6..10]

a = -b:/c

("MML Estimates-Dif", "IRT Parameters")

(b', a')

("MML Discrimination Parameters")



irt extensions4
IRT: Extensions
  • 2-parameter logistic (2PL) model
    • Stata (estimation)

* Bock and Aitkin Normalization (p. 164 corrected)


bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

c = bALL[1,6..10]

lc = ln(c)

mb = mean(b')

mc = mean(lc')

bs = b:-mb

cs = exp(lc:-mc)

("B-A Normalization DIFFICULTY", "B-A Normalization DISCRIMINATION")

(bs', cs')


irt 2pl 2 bock aitkin normalization
IRT: 2PL (2) Bock-Aitkin Normalization

item 3 has highest difficulty and greatest discrimination

binary response models for event occurrence
Binary Response Models for Event Occurrence
  • discrete-time event-history models
    • purpose:
      • model the probability of an event occurring at some point in time
      • Pr(event at t | event has not yet occurred by t)
    • life table
      • events & trials
      • observe the number of events occurring to those who are at remain at risk as time passes
        • takes account of the changing composition of the sample as time passes
life table1
Life Table
  • observe
      • Rj number at risk in time interval j (R0 = n), where the number at risk in interval j is adjusted over time
      • Djevents in time interval j (D0 = 0)
      • Wjremoved from risk (censored) in time interval j (W0 = 0)

(removed from risk due to other unrelated causes)

life table2
Life Table
  • other key quantities
      • discrete-time hazard (event probability in interval j)
      • surviving fraction (survivor function in interval j)
discrete time hazard models
Discrete-Time Hazard Models
  • statistical concepts
    • discrete random variable Ti (individual’s event or censoring time)
    • pdf of T (probability that individual i experiences event in period j)
    • cdf of T (probability that individual i experiences event in period j or earlier)
    • survivor function (probability that individual i survives past period j)
discrete time hazard models1
Discrete-Time Hazard Models
  • statistical concepts
    • discrete hazard
    • the conditional probability of event occurrence in interval j for individual i given that an event has not already occurred to that individual by interval j
discrete time hazard models2
Discrete-Time Hazard Models
  • equivalent expression using binary data
    • binary data dij = 1 if individual i experiences an event in interval j, 0 otherwise
    • use the sequence of binary values at each interval to form a history of the process for individual i up to the time the event occurs
  • discrete hazard
discrete time hazard models3
Discrete-Time Hazard Models
  • modeling (logit link)
  • modeling (complementary log –log link)
  • non-proportional effects
data structure1
Data Structure
  • person-level data person-period form
data structure2
Data Structure
  • binary sequences
  • contributions to likelihood
      • contribution to log L for individual with event in period j
      • contribution to log L for individual censored in period j
      • combine
  • dropping out of Ph.D. programs (large US university)
    • data: 6,964 individual histories spanning 20 years
      • dropout cannot be distinguished from other types of leaving (transfer to other program etc.)
      • model the logit hazard of leaving the originally-entered program as a function of the following:
        • time in program (the time-dependent) baseline hazard)
        • female and percent female in program
        • race/ethnicity (black, Hispanic, Asian)
        • marital status
        • GRE score
      • also add a program-specific random effect (multilevel)


set memory 512m

infile CID devnt I1-I5 female pctfem black hisp asian married gre using DT28432.dat

logit devnt I1-I5, nocons or

est store m1

logit devnt I1-I5 female pctfem, nocons or

est store m2

logit devnt I1-I5 female pctfem black hisp asian , nocons or

est store m3

logit devnt I1-I5 female pctfem black hisp asian married, nocons or

est store m4

logit devnt I1-I5 female pctfem black hisp asian married gre , nocons or