Categorical Data Analysis

1 / 151

# Categorical Data Analysis - PowerPoint PPT Presentation

Categorical Data Analysis. Week 2. Binary Response Models. binary and binomial responses binary: y assumes values of 0 or 1 binomial: y is number of “successes” in n “ trials” distributions Bernoulli: Binomial:. Transformational Approach. linear probability model

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Categorical Data Analysis' - corin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Categorical Data Analysis

Week 2

Binary Response Models
• binary and binomial responses
• binary: y assumes values of 0 or 1
• binomial: y is number of “successes” in n “trials”
• distributions
• Bernoulli:
• Binomial:
Transformational Approach
• linear probability model
• use grouped data (events/trials):
• linear predictor:
• problems of prediction outside [0,1]
The Logit Model
• logit transformation:
• inverse logit:
• ensures that p is in [0,1] for all values of x and .
The Logit Model
• odds and odds ratios are the key to understanding and interpreting this model
• the log odds transformation is a “stretching” transformation to map probabilities to the real line
The Logit Transformation
• properties of logit

linear

Odds, Odds Ratios, and Relative Risk
• odds of “success” is the ratio:
• consider two groups with success probabilities:
• odds ratio (OR) is a measure of the odds of success in group 1 relative to group 2
Odds Ratio

Y

0 1

• 2 X 2 table:
• OR is the cross-product ratio (compare x = 1 group to x = 0 group)
• odds of y = 1 are 4 times higher when x =1 than when x = 0

0

1

X

Odds Ratio
• equivalent interpretation
• odds of y = 1 are 0.225 times higher when x = 0 than when x = 1
• odds of y = 1 are 1-0.225 = .775 times lower when x = 0 than when x = 1
• odds of y = 1 are 77.5% lower when x = 0 than when x = 1
Log Odds Ratios
• Consider the model:
• D is a dummy variable coded 1 if group 1 and 0 otherwise.
• group 1:
• group 2:
• LOR: OR:
Relative Risk
• similar to OR, but works with rates
• relative risk or rate ratio (RR) is the rate in group 1 relative to group 2
• OR RR as .
Tutorial: odds and odds ratios
• consider the following data
Tutorial: odds and odds ratios

clear

input educ psex f

0 0 873

0 1 1190

1 0 533

1 1 1208

end

label define edlev 0 "HS or less" 1 "Col or more"

label val educ edlev

label var educ education

Tutorial: odds and odds ratios
• compute odds:
• verify by hand

tabodds psex educ [fw=f]

Tutorial: odds and odds ratios
• compute odds ratios:
• verify by hand

tabodds psex educ [fw=f], or

Tutorial: odds and odds ratios
• stat facts:
• variances of functions
• use in statistical significance tests and forming confidence intervals
• basic rule for variances of linear transformations
• g(x) = a + bx is a linear function of x, then
• this is a trivial case of the delta method applied to a single variable
• the delta method for the variance of a nonlinear function g(x) of a single variable is
Tutorial: odds and odds ratios
• stat facts:
• variances of odds and odds ratios
• we can use the delta method to find the variance in the odds and the odds ratios
• from the asymptotic (large sample theory) perspective it is best to work with log odds and log odds ratios
• the log odds ratio converges to normality at a faster rate than the odds ratio, so statistical tests may be more appropriate on log odds ratios (nonlinear functions of p)
Tutorial: odds and odds ratios
• stat facts:
• the log odds ratio is the difference in the log odds for two groups
• groups are independent
• variance of a difference is the sum of the variances
Tutorial: odds and odds ratios
• data structures: grouped or individual level
• note:
• use frequency weights to handle grouped data
• or we could “expand” this data by the frequency weights resulting in individual-level data
• model results from either data structures are the same
• expand the data and verify the following results

expand f

Tutorial: odds and odds ratios
• statistical modeling
• logit model (glm):
• logit model (logit):

glm psex educ [fw=f], f(b) eform

logit psex educ [fw=f], or

Tutorial: odds and odds ratios
• statistical modeling (#1)
• logit model (glm):
Tutorial: odds and odds ratios
• statistical modeling (#2)
• some ideas from alternative normalizations
• what parameters will this model produce?
• what is the interpretation of the “constant”

gen cons = 1

glm psex cons educ [fw=f], nocons f(b) eform

Tutorial: odds and odds ratios
• statistical modeling (#2)
Tutorial: odds and odds ratios
• statistical modeling (#3)
• what parameters does this model produce?
• how do you interpret them?

gen lowed = educ == 0

gen hied = educ == 1

glm psex lowed hied [fw=f], nocons f(b) eform

Tutorial: odds and odds ratios
• statistical modeling (#3)

are these odds ratios?

Tutorial: prediction
• fitted probabilities (after most recent model)

predict p, mu

tab educ [fw=f], sum(p) nostandard nofreq

Probit Model
• inverse probit is the CDF for a standard normal variable:
Interpretation
• probit coefficients
• interpreted as a standard normal variables (no log odds-ratio interpretation)
• “scaled” versions of logit coefficients
• probit models
• more common in certain disciplines (economics)
• analogy with linear regression (normal latent variable)
• more easily extended to multivariate distributions
Example: Grouped Data
• Swedish mortality data revisited

logit model

probit model

Programming
• Stata: generalized linear model (glm)

glm y A2 A3 P2, family(b n) link(probit)

glm y A2 A3 P2, family(b n) link(logit)

• idea of glm is to make model linear in the link.
• old days: Iteratively Reweighted Least Squares
• now: Fisher scoring, Newton-Raphson
• both approaches yield MLEs
Generalized Linear Models
• applies to a broad class of models
• iterative fitting (repeated updating)except for linear model
• update parameters, weights W, and predicted values m
• models differ in terms of W and m and assumptions about the distribution of y
• common distributions for yinclude: normal, binomial, and Poisson
• common links include: identity, logit, probit, and log
Latent Variable Approach
• example: insect mortality
• suppose a researcher exposes insects to dosage levels (u) of an insecticide and observes whether the “subject” lives or dies at that dosage.
• the response is expected to depend on the insect’s tolerance (c) to that dosage level.
• the insect dies if u > c and survives if u < c
• tolerance is not observed (survival is observed)
Latent Variables
• u and c are continuous latent variables
• examples:
• women’s employment: u is the market wage and c is the reservation wage
• migration: u is the benefit of moving and c is the cost of moving.
• observed outcome y =1 or y = 0 reveals the individual’s preference, which is assumed to maximize a rational individual’s utility function.
Latent Variables
• Assume linear utility and criterion functions
• over-parameterization = identification problem
• we can identify differences in components but not the separate components
Latent Variables
• constraints:
• Then:
• where F(.) is the CDF of ε
Latent Variables and Standardization
• Need to standardize the mean and variance of ε
• binary dependent variables lack inherent scales
• magnitude of βis only in reference to the mean and variance of ε which are unknown.
• redefine ε to a common standard
• where a and b are two chosen constants.
Standardization for Logit and Probit Models
• standardization implies
• F*() is the cdf of ε*
• location a and scale b need to be fixed
• setting
• and
Standardization for Logit and Probit Models
• distribution of ε is standardized
• standard normal probit
• standard logistic  logit
• both distributions have a mean of 0
• variances differ
Extending the Latent Variable Approach
• observed y is a dichotomous (binary) 0/1 variable
• continuous latent variable:
• linear predictor + residual
• observed outcome
Notation
• conditional means of latent variables obtained from index function:
• obtain probabilities from inverse link functions

logit model:

probit model:

ML
• likelihood function
• where if data are binary
• log-likelihood function
Assessing Models
• definitions:
• L null model (intercept only):
• L saturated model (a parameter for each cell):
• L current model:
• grouped data (events/trials)
• deviance (likelihood ratio statistic)
Deviance
• grouped data:
• if cell sizes are reasonably large deviance is distributed as chi-square
• individual-level data: Lf=1 and log Lf=0
• deviance is not a “fit” statistic
Deviance
• deviance is like a residual sum of squares
• larger values indicate poorer models
• larger models have smaller deviance
• deviance for the more constrained model (Model 1)
• deviance for the less constrained model (Model 2)
• assume that Model 1 is a constrained version of Model 2.
Difference in Deviance
• evaluate competing “nested” models using a likelihood ratio statistic
• model chi-square is a special case
• SAS, Stata, R, etc. report different statistics
Other Fit Statistics
• BIC & AIC (useful for non-nested models)
• basic idea of IC : penalize log L for the number of parameters (AIC/BIC) and/or the size of the sample (BIC)
• AIC s=1
• BIC s= ½ log n (sample size)
• dfmis the number of model parameters
Hypothesis Tests/Inference
• single parameter:
• MLE are asymptotically normal Z-test
• multi-parameter:
• likelihood ratio tests (after fitting)
• Wald tests (test constraints from current model)
Hypothesis Tests/Inference
• Wald test (tests a vector of restrictions)
• a set of r parameters are all equal to 0
• a set of r parameters are linearly restricted

restriction matrix

constraint vector

parameter subset

Interpreting Parameters
• odds ratios: consider the model where x is a continuous predictor and d is a dummy variable
• suppose that d denotes sex and x denotes income and the problem concerns voting, where y* is the propensity to vote
• results: logit(pi) = -1.92 + 0.012xi + 0.67di
Interpreting Parameters
• for d(dummy variable coded 1 for female) the odds ratio is straightforward
• holding income constant, women’s odds of voting are nearly twice those of men
Interpreting Parameters
• for x(continuous variable for income in thousands of dollars) the odds ratio is a multiplicative effect
• suppose we increase income by 1 unit (\$1,000)
• suppose we increase income by c units (cх \$1,000\$
Interpreting Parameters
• if income is increased by \$10,000, this increases the odds of voting by about 13%
• a note on percent change in odds:
• if estimate of β > 0 then percent increase in odds for a unit change in x is
• if estimate of β < 0 then percent decrease in odds for a unit change in x is
Marginal Effects
• marginal effect:
• effect of change in x on change in probability
• pdf cdf
• often we evaluate f(.) at the mean of x.
Marginal Effect of a Change in a Dummy Variable
• if x is a continuous variable and z is a dummy variable
• marginal effect of change in z from 0 to 1 is the difference
Example
• logit models for high school graduation
• odds ratios (constant is baseline odds)
LR Test
• Model 3 vs. 2
Wald Test
• Test equality of parental education effects

logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

test mhs=fhs

test mcol=fcol

cannot reject H of equal parental education effects on HS graduation

Basic Estimation Commands (Stata)

estimation commands

model tests

* model 0 - null model

qui logit hsg

est store m0

* model 1 - race, sex, family structure

qui logit hsg blk hsp female nonint

est store m1

* model 1a - race X family structure interactions

qui xi: logit hsg blk hsp female nonint i.nonint*i.blk i.nonint*i.hsp

est store m1a

lrtest m1 m1a

* model 2 - SES

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol

est store m2

* model 3 - Indiv

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

est store m3

lrtest m2 m3

Fit Statistics etc.

* some 'hand' calculations with saved results

scalar ll = e(ll)

scalar npar = e(df_m)+1

scalar nobs = e(N)

scalar AIC = -2*ll + 2*npar

scalar BIC = -2*ll + log(nobs)*npar

scalar list AIC

scalar list BIC

* or use automated fitstat routine

fitstat

*output as a table

estout1 m0 m1 m2 m3 using modF07, replace star stfmt(%9.2f %9.0f %9.0f) ///

stats(ll N df_m) eform

Generate Income Quartiles

* quartiles for income distribution

gen incQ1 = adjinc < r(p25)

gen incQ4 = adjinc >= r(p75)

gen incQ = 1 if incQ1==1

replace incQ = 2 if incQ2==1

replace incQ = 3 if incQ3==1

replace incQ = 4 if incQ4==1

tab incQ

Fit Model for Each Quartile

calculate predictions

* look at marginal effects of test score on graduation by selected groups

* (1) model (income quartiles)

local i = 1

while `i' < 5 {

logit hsg blk female mhs nonint nsibs urban so wtest if incQ ==`i'

margeff

cap drop wm*

cap drop bm*

prgen wtest, x(blk=0 female=0 mhs=1 nonint=0) gen(wmi) from(-3) to(3)

prgen wtest, x(blk=0 female=0 mhs=1 nonint=1) gen(wmn) from(-3) to(3)

label var wmip1 "white/intact"

label var wmnp1 "white/nonintact"

prgen wtest, x(blk=1 female=0 mhs=1 nonint=0) gen(bmi) from(-3) to(3)

prgen wtest, x(blk=1 female=0 mhs=1 nonint=1) gen(bmn) from(-3) to(3)

label var bmip1 "black/intact"

label var bmnp1 "black/nonintact"

Graph

set scheme s2mono

twoway (line wmip1 wmix, sort xtitle("Test Score") ytitle("Pr(y=1)")) ///

(line wmnp1 wmix, sort) (line bmip1 wmix, sort) (line bmnp1 wmix, sort), ///

subtitle("Marginal Effect of Test Score on High School Graduation" ///

"Income Quartile `i'" ) saving(wtgrph`i', replace)

graph export wtgrph`i'.eps, as(eps) replace

local i = `i' + 1

}

Fitted Probabilities

logit hsg blk female mhs nonint inc nsibs urban so wtest

prtab nonint blk female

Fitted Probabilities
• predicted values
• evaluate fitted probabilities at the sample mean values of x (or other fixed quantities)
• averaging fitted probabilities over subgroup-specific models will produce marginal probabilities
Alternative Probability Model
• complementary log –log (cloglog or CLL)
• standard extreme-value distribution for u:
• cloglog model:
Extreme-Value Distribution
• properties
• mean of u (Euler’s constant):
• variance of u:
• difference in two independent extreme value variables yields a logistic variable
CLL Model
• no “practical” differences from logit and probit models
• often suited for survival data and other applications
• interpretation of coefficients:
• exp(β) is a relative risk or hazard ratio not an OR
• glm: binomial distribution for y with a cloglog link
• cloglog: use the cloglog command directly
Cloglog and Logit Model Compared

logit

cloglog

more agreement when modeling rare events

Extensions: Multilevel Data
• what is multilevel data?
• individuals are “nested” in a larger context:
• children in families, kids in schools etc.

context 1

context 2

context 3

Multilevel Data
• i.i.d. assumptions?
• the outcomes for units in a given context could be associated
• standard model would treat all outcomes (regardless of context) as independent
• multilevel methods account for the within-cluster dependence
• a general problem with binomial responses
• we assume that trials are independent
• this might not be realistic
• non-independence will inflate the variance (overdispersion)
Multilevel Data
• example (in book):
• 40 universities as units of analysis
• for each university we observe the number of graduates (n) and the number receiving post-doctoral fellowships (y)
• we could compute proportions (MLEs)
• some proportions would be “better” estimates as they would have higher precision or lower variance
• example: the data y1/n1 = 2/5 and y2/n2 = 20/50 give identical estimates of p but variances of 0.048 and 0.0048 respectively
• the 2nd estimate is more precise than the 1st
Multilevel Data
• multilevel models allow for improved predictions of individual probabilities
• MLE estimate is unaltered if it is precise
• MLE estimate moved toward average if it is imprecise (shrinkage)
• multilevel estimate of p would be a weighted average of the MLE and the average over all MLEs (weight (w) is based on the variance of each MLE and the variance over all the MLEs)
• we are generally less interested in the p’s and more interested in the model parameters and variance components
Shrinkage Estimation
• primitive approach
• assume we have a set of estimates (MLEs)
• our best estimate of the variance of each MLE is
• this is the within variance (no pooling)
• if this is large, then the MLEis a poor estimate
• a better estimate might be the average of the MLEs in this case (pooling the estimates)
• we can average the MLEsand estimate the between variance as
Shrinkage Estimation
• primitive approach
• we can then estimate a weight wi
• a revised estimate of pi would take account of the precision to for a precision-weighted average
• precision is a function of ni
• more weight is given to more precise MLE’s
Shrinkage

results from full Bayesian (multilevel) Analysis

Extension: Multilevel Models
• assumptions
• within-context and between-context variation in outcomes
• individuals within the same context share the same “random error” specific to that context
• models are hierarchical
• individuals (level-1)
• contexts (level-2)
Multilevel Models: Background
• linear mixed model for continuous y

(multilevel, random coefficients, etc.)

• level-1 model and level-2 sub-models (hierarchical)
Multilevel Models: Background
• linear mixed model assumptions
• level-1 and level-2 residuals
Multilevel Models: Background

composite residual

• composite form

fixed effects

cross-level interaction

random effects (level-2)

Multilevel Models: Background
• general form (linear mixed model)

variables associated with fixed coefficients

variables associated with random coefficients

Multilevel Models: Logit Models
• binomial model (random effect)
• assumptions
• u increases or decreases the expected response for individual j in context i independently of x
• all individuals in context i share the same value of u
• also called a random intercept model
Multilevel Models
• a hierarchical model:
• z is a level-1 variable; x is a level-2 variable
• random intercept varies among level-2 units
• note: level-1 residual variance is fixed (why?)
Multilevel Models
• a general expression
• x are variables associated with “fixed” coefficients
• zare variables associated with “random” coefficients
• u is multivariate normal vector of level-2 residuals
• mean of u is 0; covariance of u is
Multilevel Models
• random effects vs. random coefficients
• random effects u
• random coefficients β + u
• variance components
• interested in level-2 variation in u
• prediction
• E(y) is not equal to E(y|u)
• model based predictions need to consider random effects
Multilevel Models: Generalized Linear Mixed Models (GLMM)

Conditional Expectation

Marginal Expectation

requires numerical integration or simulation

Data Structure
• multilevel data structure
• requires a “context” id to identify individuals belonging to the same context
• NLSY sibling data contains a “family id” (constructed by researcher)
• data are unbalanced (we do not require clusters to be the same size)
• small clusters will contribute less information to the estimation of variance components than larger clusters
• it is OK to have clusters of size 1

(i.e., an individual is a context unto themselves)

• clusters of size 1 contribute to the estimation of fixed effects but not to the estimation of variance components
Example: clustered data
• siblings nested in families
• y is 1st premarital birth for NLSY women
• select sib-ships of size > 2
• null model (random intercept):

xtlogit fpmbir, i(famid)

or

xtmelogit fpmbir || famid:

Example: clustered data

random intercept: xtlogit

Example: clustered data

random intercept: xtmelogit

Variance Component
Variance Component
• conditional variance in u is 2.107
• proportionate reduction in error (PRE)
• a 31% reduction in level-2 variance when level-2 predictors are accounted for
Random Effects
• we can examine the distribution of random effects
Random Effects
• we can examine the distribution of random effects
Random Effects Distribution
• 90th percentile u90 = 1.338
• 10th percentile u10 = 0.388
• the risk for family at 90th percentile is

exp(1.338 – 0.388) = 2.586

times higher than for a family at the 10th percentile

• even if families are compositionally identical on covariates, we can assess the hypothetical differential in risks
Growth Curve Models
• growth models
• individuals are level-2 units
• repeated measures over time on individuals (level-1)
• models imply that logits vary across individuals
• intercept (conditional average logit) varies
• slope (conditional average effect of time) varies
• change is usually assumed to be linear
• use GLMM
• complications due to dimensionality
• intercept and slope may co-vary (necessitating a more complex model) and more
Growth Curve Models
• multilevel logit model for change over time
• T is time (strictly increasing)
• fixed and random coefficients (with covariates)

assume that u0 and u1 are bivariate normal

Multilevel Logit Models for Change
• Example: Log odds of employment of black men in the U.S. 1982-1988 (NLSY)

(consider 5 years in this period)

• time is coded 0, 1, 3, 4, 6
• dependent variable is: not-working, not-in-school
• unconditional growth (no covariates except T)
• note: cross-level interactions implied by composite model
Fitting Multilevel Model for Change
• programming
• Stata (unconditional growth)
• Stata (conditional growth)

xtmelogit y year || id: year, var cov(un)

xtmelogit y year south unem unemyr inc hs ||id: year, var cov(un)

Logits: Observed, Conditional, and Marginal

the log odds of idleness decreases with time and shows variation in level and change

Composite Residuals in a Growth Model
• composite residual
• composite residual variance
• covariance of composite residual
Model
• covariance term is 0 (from either model)
• results in simplified interpretation
• easier estimation via variance components (default option)
• significant variation in slopes and initial levels
• other results:
• log odds of idleness decrease over time (negative slope)
• other covariates except county unemployment have significant effects on the odds of idleness
• the main effects are interpreted as effects on initial logits at time 1 or t = 0 or the 1982 baseline)
• interaction of time and unemployment rate captures the effect of county unemployment rate in 1982 on the change log odds of idleness
• the positive effect implies that higher county unemployment tends to dampen change in odds
IRT Models
• IRT models
• Item Response Theory
• models account for an individual-level random effect on a set of items (i.e., ability)
• items are assumed to tap a single latent construct (aptitude on a specific subject)
• item difficulty
• test items are assumed to be ordered on a difficulty scale
• easier  harder
• expected patterns emerge whereby if a more difficult item is answered correctly the easier items are likely to have been answered correctly
IRT Models
• IRT models
• 1-parameter logistic (Rasch) model
• pij individual i’s probability of a correct response on the jth item
• θ individual i’s ability
• b item j’s difficulty
• properties
• an individual’s ability parameter is invariant with respect to the item
• the difficulty parameter is invariant with respect to individual’s ability
• higher ability or lower item difficulty lead to a higher probability of a correct response
• both ability and difficulty are measured on the same scale
ICC
• item characteristics curve (item response curve)
• depicts the probability of a correct response as a function of an examinee’s ability or trait level
• curves are shifted rightward with increasing item difficulty
• assume that item 3 is more difficult than item 2 and item 2 is more difficult than item 1
• probability of a correct response decreases as the threshold θ = bj is crossed, reflecting increasing item difficulty
IRT Models: ICC (3 Items)

slopes of item characteristics curves are equal when ability = item difficulty

Estimation as GLMM
• specification:
• set up a person-item data structure
• define x as a set of dummy variables
• change signs on β to reflect “difficulty”
• fit model without intercept to estimate all item difficulties
• normalization is common
PL1 Estimation
• Stata (data set up )

clear

set memory 128m

infile junk y1-y5 f using LSAT.dat

drop if junk==11 | junk==13

expand f

drop f junk

gen cons = 1

collapse (sum) wt2=cons, by(y1-y5)

gen id = _n

sort id

reshape long y, i(id) j(item)

PL1 Estimation
• Stata (model set up )

gen i1 = 0

gen i2 = 0

gen i3 = 0

gen i4 = 0

gen i5 = 0

replace i1 = 1 if item == 1

replace i2 = 1 if item == 2

replace i3 = 1 if item == 3

replace i4 = 1 if item == 4

replace i5 = 1 if item == 5

*

* 1PL

* constrain sd=1

cons 1 [id1]_cons = 1

PL1 Estimation
• Stata (output )
PL1 Estimation
• Stata (parameter normalization)

* normalized solution

*[1 -- standard 1PL]

*[2 -- coefs sum to 0] [var = 1]

mata

bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

mb = mean(b')

bs = b:-mb

("MML Estimates", "IRT parameters", "B-A Normalization")

(-b', b', bs')

end

PL1 Estimation
• Stata (normalized solution)
IRT: Extensions

item discrimination parameters

• 2-parameter logistic (2PL) model
IRT: Extensions
• 2-parameter logistic (2PL) model
• item discrimination parameters
• reveal differences in item’s utility to distinguish different ability levels among examinees
• high values denote items that are more useful in terms of separating examinees into different ability levels
• low values denote items that are less useful in distinguishing examinees in terms of ability
• ICCs corresponding to this model can intersect as they differ in location and slope
• steeper slope of the ICC is associated with a better discriminating item
IRT: Extensions
• 2-parameter logistic (2PL) model
IRT: Extensions
• 2-parameter logistic (2PL) model
• Stata (estimation)

eq id: i1 i2 i3 i4 i5

cons 1 [id1_1]i1 = 1

matrix list e(b)

*normalized solutions

*1 standard 2PL)

mata

bALL = st_matrix("e(b)")

b = bALL[1,1..5]

c = bALL[1,6..10]

a = -b:/c

("MML Estimates-Dif", "IRT Parameters")

(b', a')

("MML Discrimination Parameters")

(c')

end

IRT: Extensions
• 2-parameter logistic (2PL) model
• Stata (estimation)

* Bock and Aitkin Normalization (p. 164 corrected)

mata

bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

c = bALL[1,6..10]

lc = ln(c)

mb = mean(b')

mc = mean(lc')

bs = b:-mb

cs = exp(lc:-mc)

("B-A Normalization DIFFICULTY", "B-A Normalization DISCRIMINATION")

(bs', cs')

end

IRT: 2PL (2) Bock-Aitkin Normalization

item 3 has highest difficulty and greatest discrimination

Binary Response Models for Event Occurrence
• discrete-time event-history models
• purpose:
• model the probability of an event occurring at some point in time
• Pr(event at t | event has not yet occurred by t)
• life table
• events & trials
• observe the number of events occurring to those who are at remain at risk as time passes
• takes account of the changing composition of the sample as time passes
Life Table
• observe
• Rj number at risk in time interval j (R0 = n), where the number at risk in interval j is adjusted over time
• Djevents in time interval j (D0 = 0)
• Wjremoved from risk (censored) in time interval j (W0 = 0)

(removed from risk due to other unrelated causes)

Life Table
• other key quantities
• discrete-time hazard (event probability in interval j)
• surviving fraction (survivor function in interval j)
Discrete-Time Hazard Models
• statistical concepts
• discrete random variable Ti (individual’s event or censoring time)
• pdf of T (probability that individual i experiences event in period j)
• cdf of T (probability that individual i experiences event in period j or earlier)
• survivor function (probability that individual i survives past period j)
Discrete-Time Hazard Models
• statistical concepts
• discrete hazard
• the conditional probability of event occurrence in interval j for individual i given that an event has not already occurred to that individual by interval j
Discrete-Time Hazard Models
• equivalent expression using binary data
• binary data dij = 1 if individual i experiences an event in interval j, 0 otherwise
• use the sequence of binary values at each interval to form a history of the process for individual i up to the time the event occurs
• discrete hazard
Discrete-Time Hazard Models
• modeling (complementary log –log link)
• non-proportional effects
Data Structure
• person-level data person-period form
Data Structure
• binary sequences
Estimation
• contributions to likelihood
• contribution to log L for individual with event in period j
• contribution to log L for individual censored in period j
• combine
Example:
• dropping out of Ph.D. programs (large US university)
• data: 6,964 individual histories spanning 20 years
• dropout cannot be distinguished from other types of leaving (transfer to other program etc.)
• model the logit hazard of leaving the originally-entered program as a function of the following:
• time in program (the time-dependent) baseline hazard)
• female and percent female in program
• race/ethnicity (black, Hispanic, Asian)
• marital status
• GRE score
• also add a program-specific random effect (multilevel)
Example:

clear

set memory 512m

infile CID devnt I1-I5 female pctfem black hisp asian married gre using DT28432.dat

logit devnt I1-I5, nocons or

est store m1

logit devnt I1-I5 female pctfem, nocons or

est store m2

logit devnt I1-I5 female pctfem black hisp asian , nocons or

est store m3

logit devnt I1-I5 female pctfem black hisp asian married, nocons or

est store m4

logit devnt I1-I5 female pctfem black hisp asian married gre , nocons or