- By
**corin** - Follow User

- 138 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Categorical Data Analysis' - corin

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Binary Response Models

- binary and binomial responses
- binary: y assumes values of 0 or 1
- binomial: y is number of “successes” in n “trials”
- distributions
- Bernoulli:
- Binomial:

Transformational Approach

- linear probability model
- use grouped data (events/trials):
- “identity” link:
- linear predictor:
- problems of prediction outside [0,1]

The Logit Model

- logit transformation:
- inverse logit:
- ensures that p is in [0,1] for all values of x and .

The Logit Model

- odds and odds ratios are the key to understanding and interpreting this model
- the log odds transformation is a “stretching” transformation to map probabilities to the real line

Odds, Odds Ratios, and Relative Risk

- odds of “success” is the ratio:
- consider two groups with success probabilities:
- odds ratio (OR) is a measure of the odds of success in group 1 relative to group 2

Odds Ratio

Y

0 1

- 2 X 2 table:
- OR is the cross-product ratio (compare x = 1 group to x = 0 group)
- odds of y = 1 are 4 times higher when x =1 than when x = 0

0

1

X

Odds Ratio

- equivalent interpretation
- odds of y = 1 are 0.225 times higher when x = 0 than when x = 1
- odds of y = 1 are 1-0.225 = .775 times lower when x = 0 than when x = 1
- odds of y = 1 are 77.5% lower when x = 0 than when x = 1

Log Odds Ratios

- Consider the model:
- D is a dummy variable coded 1 if group 1 and 0 otherwise.
- group 1:
- group 2:
- LOR: OR:

Relative Risk

- similar to OR, but works with rates
- relative risk or rate ratio (RR) is the rate in group 1 relative to group 2
- OR RR as .

Tutorial: odds and odds ratios

- consider the following data

Tutorial: odds and odds ratios

- read table:

clear

input educ psex f

0 0 873

0 1 1190

1 0 533

1 1 1208

end

label define edlev 0 "HS or less" 1 "Col or more"

label val educ edlev

label var educ education

Tutorial: odds and odds ratios

- stat facts:
- variances of functions
- use in statistical significance tests and forming confidence intervals
- basic rule for variances of linear transformations
- g(x) = a + bx is a linear function of x, then
- this is a trivial case of the delta method applied to a single variable
- the delta method for the variance of a nonlinear function g(x) of a single variable is

Tutorial: odds and odds ratios

- stat facts:
- variances of odds and odds ratios
- we can use the delta method to find the variance in the odds and the odds ratios
- from the asymptotic (large sample theory) perspective it is best to work with log odds and log odds ratios
- the log odds ratio converges to normality at a faster rate than the odds ratio, so statistical tests may be more appropriate on log odds ratios (nonlinear functions of p)

Tutorial: odds and odds ratios

- stat facts:
- the log odds ratio is the difference in the log odds for two groups
- groups are independent
- variance of a difference is the sum of the variances

Tutorial: odds and odds ratios

- data structures: grouped or individual level
- note:
- use frequency weights to handle grouped data
- or we could “expand” this data by the frequency weights resulting in individual-level data
- model results from either data structures are the same
- expand the data and verify the following results

expand f

Tutorial: odds and odds ratios

- statistical modeling
- logit model (glm):
- logit model (logit):

glm psex educ [fw=f], f(b) eform

logit psex educ [fw=f], or

Tutorial: odds and odds ratios

- statistical modeling (#1)
- logit model (glm):

Tutorial: odds and odds ratios

- statistical modeling (#2)
- some ideas from alternative normalizations
- what parameters will this model produce?
- what is the interpretation of the “constant”

gen cons = 1

glm psex cons educ [fw=f], nocons f(b) eform

Tutorial: odds and odds ratios

- statistical modeling (#2)

Tutorial: odds and odds ratios

- statistical modeling (#3)
- what parameters does this model produce?
- how do you interpret them?

gen lowed = educ == 0

gen hied = educ == 1

glm psex lowed hied [fw=f], nocons f(b) eform

Tutorial: prediction

- fitted probabilities (after most recent model)

predict p, mu

tab educ [fw=f], sum(p) nostandard nofreq

Probit Model

- inverse probit is the CDF for a standard normal variable:
- link function:

Interpretation

- probit coefficients
- interpreted as a standard normal variables (no log odds-ratio interpretation)
- “scaled” versions of logit coefficients
- probit models
- more common in certain disciplines (economics)
- analogy with linear regression (normal latent variable)
- more easily extended to multivariate distributions

Swedish Historical Mortality Data

- predictions

Programming

- Stata: generalized linear model (glm)

glm y A2 A3 P2, family(b n) link(probit)

glm y A2 A3 P2, family(b n) link(logit)

- idea of glm is to make model linear in the link.
- old days: Iteratively Reweighted Least Squares
- now: Fisher scoring, Newton-Raphson
- both approaches yield MLEs

Generalized Linear Models

- applies to a broad class of models
- iterative fitting (repeated updating)except for linear model
- update parameters, weights W, and predicted values m
- models differ in terms of W and m and assumptions about the distribution of y
- common distributions for yinclude: normal, binomial, and Poisson
- common links include: identity, logit, probit, and log

Latent Variable Approach

- example: insect mortality
- suppose a researcher exposes insects to dosage levels (u) of an insecticide and observes whether the “subject” lives or dies at that dosage.
- the response is expected to depend on the insect’s tolerance (c) to that dosage level.
- the insect dies if u > c and survives if u < c
- tolerance is not observed (survival is observed)

Latent Variables

- u and c are continuous latent variables
- examples:
- women’s employment: u is the market wage and c is the reservation wage
- migration: u is the benefit of moving and c is the cost of moving.
- observed outcome y =1 or y = 0 reveals the individual’s preference, which is assumed to maximize a rational individual’s utility function.

Latent Variables

- Assume linear utility and criterion functions
- over-parameterization = identification problem
- we can identify differences in components but not the separate components

Latent Variables

- constraints:
- Then:
- where F(.) is the CDF of ε

Latent Variables and Standardization

- Need to standardize the mean and variance of ε
- binary dependent variables lack inherent scales
- magnitude of βis only in reference to the mean and variance of ε which are unknown.
- redefine ε to a common standard
- where a and b are two chosen constants.

Standardization for Logit and Probit Models

- standardization implies
- F*() is the cdf of ε*
- location a and scale b need to be fixed
- setting
- and

Standardization for Logit and Probit Models

- distribution of ε is standardized
- standard normal probit
- standard logistic logit
- both distributions have a mean of 0
- variances differ

Extending the Latent Variable Approach

- observed y is a dichotomous (binary) 0/1 variable
- continuous latent variable:
- linear predictor + residual
- observed outcome

Notation

- conditional means of latent variables obtained from index function:
- obtain probabilities from inverse link functions

logit model:

probit model:

ML

- likelihood function
- where if data are binary
- log-likelihood function

Assessing Models

- definitions:
- L null model (intercept only):
- L saturated model (a parameter for each cell):
- L current model:
- grouped data (events/trials)
- deviance (likelihood ratio statistic)

Deviance

- grouped data:
- if cell sizes are reasonably large deviance is distributed as chi-square
- individual-level data: Lf=1 and log Lf=0
- deviance is not a “fit” statistic

Deviance

- deviance is like a residual sum of squares
- larger values indicate poorer models
- larger models have smaller deviance
- deviance for the more constrained model (Model 1)
- deviance for the less constrained model (Model 2)
- assume that Model 1 is a constrained version of Model 2.

Difference in Deviance

- evaluate competing “nested” models using a likelihood ratio statistic
- model chi-square is a special case
- SAS, Stata, R, etc. report different statistics

Other Fit Statistics

- BIC & AIC (useful for non-nested models)
- basic idea of IC : penalize log L for the number of parameters (AIC/BIC) and/or the size of the sample (BIC)
- AIC s=1
- BIC s= ½ log n (sample size)
- dfmis the number of model parameters

Hypothesis Tests/Inference

- single parameter:
- MLE are asymptotically normal Z-test
- multi-parameter:
- likelihood ratio tests (after fitting)
- Wald tests (test constraints from current model)

Hypothesis Tests/Inference

- Wald test (tests a vector of restrictions)
- a set of r parameters are all equal to 0
- a set of r parameters are linearly restricted

restriction matrix

constraint vector

parameter subset

Interpreting Parameters

- odds ratios: consider the model where x is a continuous predictor and d is a dummy variable
- suppose that d denotes sex and x denotes income and the problem concerns voting, where y* is the propensity to vote
- results: logit(pi) = -1.92 + 0.012xi + 0.67di

Interpreting Parameters

- for d(dummy variable coded 1 for female) the odds ratio is straightforward
- holding income constant, women’s odds of voting are nearly twice those of men

Interpreting Parameters

- for x(continuous variable for income in thousands of dollars) the odds ratio is a multiplicative effect
- suppose we increase income by 1 unit ($1,000)
- suppose we increase income by c units (cх $1,000$

Interpreting Parameters

- if income is increased by $10,000, this increases the odds of voting by about 13%
- a note on percent change in odds:
- if estimate of β > 0 then percent increase in odds for a unit change in x is
- if estimate of β < 0 then percent decrease in odds for a unit change in x is

Marginal Effects

- marginal effect:
- effect of change in x on change in probability
- pdf cdf
- often we evaluate f(.) at the mean of x.

Marginal Effect of a Change in a Dummy Variable

- if x is a continuous variable and z is a dummy variable
- marginal effect of change in z from 0 to 1 is the difference

Example

- logit models for high school graduation
- odds ratios (constant is baseline odds)

LR Test

- Model 3 vs. 2

Wald Test

- Test equality of parental education effects

logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

test mhs=fhs

test mcol=fcol

cannot reject H of equal parental education effects on HS graduation

Basic Estimation Commands (Stata)

estimation commands

model tests

* model 0 - null model

qui logit hsg

est store m0

* model 1 - race, sex, family structure

qui logit hsg blk hsp female nonint

est store m1

* model 1a - race X family structure interactions

qui xi: logit hsg blk hsp female nonint i.nonint*i.blk i.nonint*i.hsp

est store m1a

lrtest m1 m1a

* model 2 - SES

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol

est store m2

* model 3 - Indiv

qui xi: logit hsg blk hsp female nonint inc nsibs mhs mcol fhs fcol wtest

est store m3

lrtest m2 m3

Fit Statistics etc.

* some 'hand' calculations with saved results

scalar ll = e(ll)

scalar npar = e(df_m)+1

scalar nobs = e(N)

scalar AIC = -2*ll + 2*npar

scalar BIC = -2*ll + log(nobs)*npar

scalar list AIC

scalar list BIC

* or use automated fitstat routine

fitstat

*output as a table

estout1 m0 m1 m2 m3 using modF07, replace star stfmt(%9.2f %9.0f %9.0f) ///

stats(ll N df_m) eform

qui sum adjinc, det

* quartiles for income distribution

gen incQ1 = adjinc < r(p25)

gen incQ2 = adjinc >= r(p25) & adjinc < r(p50)

gen incQ3 = adjinc >= r(p50) & adjinc < r(p75)

gen incQ4 = adjinc >= r(p75)

gen incQ = 1 if incQ1==1

replace incQ = 2 if incQ2==1

replace incQ = 3 if incQ3==1

replace incQ = 4 if incQ4==1

tab incQ

calculate predictions

* look at marginal effects of test score on graduation by selected groups

* (1) model (income quartiles)

local i = 1

while `i' < 5 {

logit hsg blk female mhs nonint nsibs urban so wtest if incQ ==`i'

margeff

cap drop wm*

cap drop bm*

prgen wtest, x(blk=0 female=0 mhs=1 nonint=0) gen(wmi) from(-3) to(3)

prgen wtest, x(blk=0 female=0 mhs=1 nonint=1) gen(wmn) from(-3) to(3)

label var wmip1 "white/intact"

label var wmnp1 "white/nonintact"

prgen wtest, x(blk=1 female=0 mhs=1 nonint=0) gen(bmi) from(-3) to(3)

prgen wtest, x(blk=1 female=0 mhs=1 nonint=1) gen(bmn) from(-3) to(3)

label var bmip1 "black/intact"

label var bmnp1 "black/nonintact"

set scheme s2mono

twoway (line wmip1 wmix, sort xtitle("Test Score") ytitle("Pr(y=1)")) ///

(line wmnp1 wmix, sort) (line bmip1 wmix, sort) (line bmnp1 wmix, sort), ///

subtitle("Marginal Effect of Test Score on High School Graduation" ///

"Income Quartile `i'" ) saving(wtgrph`i', replace)

graph export wtgrph`i'.eps, as(eps) replace

local i = `i' + 1

}

Fitted Probabilities

- predicted values
- evaluate fitted probabilities at the sample mean values of x (or other fixed quantities)
- averaging fitted probabilities over subgroup-specific models will produce marginal probabilities

Alternative Probability Model

- complementary log –log (cloglog or CLL)
- standard extreme-value distribution for u:
- cloglog model:
- cloglog link function:

Extreme-Value Distribution

- properties
- mean of u (Euler’s constant):
- variance of u:
- difference in two independent extreme value variables yields a logistic variable

CLL Model

- no “practical” differences from logit and probit models
- often suited for survival data and other applications
- interpretation of coefficients:
- exp(β) is a relative risk or hazard ratio not an OR
- glm: binomial distribution for y with a cloglog link
- cloglog: use the cloglog command directly

Extensions: Multilevel Data

- what is multilevel data?
- individuals are “nested” in a larger context:
- children in families, kids in schools etc.

context 1

context 2

context 3

Multilevel Data

- i.i.d. assumptions?
- the outcomes for units in a given context could be associated
- standard model would treat all outcomes (regardless of context) as independent
- multilevel methods account for the within-cluster dependence
- a general problem with binomial responses
- we assume that trials are independent
- this might not be realistic
- non-independence will inflate the variance (overdispersion)

Multilevel Data

- example (in book):
- 40 universities as units of analysis
- for each university we observe the number of graduates (n) and the number receiving post-doctoral fellowships (y)
- we could compute proportions (MLEs)
- some proportions would be “better” estimates as they would have higher precision or lower variance
- example: the data y1/n1 = 2/5 and y2/n2 = 20/50 give identical estimates of p but variances of 0.048 and 0.0048 respectively
- the 2nd estimate is more precise than the 1st

Multilevel Data

- multilevel models allow for improved predictions of individual probabilities
- MLE estimate is unaltered if it is precise
- MLE estimate moved toward average if it is imprecise (shrinkage)
- multilevel estimate of p would be a weighted average of the MLE and the average over all MLEs (weight (w) is based on the variance of each MLE and the variance over all the MLEs)
- we are generally less interested in the p’s and more interested in the model parameters and variance components

Shrinkage Estimation

- primitive approach
- assume we have a set of estimates (MLEs)
- our best estimate of the variance of each MLE is
- this is the within variance (no pooling)
- if this is large, then the MLEis a poor estimate
- a better estimate might be the average of the MLEs in this case (pooling the estimates)
- we can average the MLEsand estimate the between variance as

Shrinkage Estimation

- primitive approach
- we can then estimate a weight wi
- a revised estimate of pi would take account of the precision to for a precision-weighted average
- precision is a function of ni
- more weight is given to more precise MLE’s

Shrinkage

results from full Bayesian (multilevel) Analysis

Extension: Multilevel Models

- assumptions
- within-context and between-context variation in outcomes
- individuals within the same context share the same “random error” specific to that context
- models are hierarchical
- individuals (level-1)
- contexts (level-2)

Multilevel Models: Background

- linear mixed model for continuous y

(multilevel, random coefficients, etc.)

- level-1 model and level-2 sub-models (hierarchical)

Multilevel Models: Background

- linear mixed model assumptions
- level-1 and level-2 residuals

Multilevel Models: Background

composite residual

- composite form

fixed effects

cross-level interaction

random effects (level-2)

Multilevel Models: Background

- variance components

Multilevel Models: Background

- general form (linear mixed model)

variables associated with fixed coefficients

variables associated with random coefficients

Multilevel Models: Logit Models

- binomial model (random effect)
- assumptions
- u increases or decreases the expected response for individual j in context i independently of x
- all individuals in context i share the same value of u
- also called a random intercept model

Multilevel Models

- a hierarchical model:
- z is a level-1 variable; x is a level-2 variable
- random intercept varies among level-2 units
- note: level-1 residual variance is fixed (why?)

Multilevel Models

- a general expression
- x are variables associated with “fixed” coefficients
- zare variables associated with “random” coefficients
- u is multivariate normal vector of level-2 residuals
- mean of u is 0; covariance of u is

Multilevel Models

- random effects vs. random coefficients
- random effects u
- random coefficients β + u
- variance components
- interested in level-2 variation in u
- prediction
- E(y) is not equal to E(y|u)
- model based predictions need to consider random effects

Multilevel Models: Generalized Linear Mixed Models (GLMM)

Conditional Expectation

Marginal Expectation

requires numerical integration or simulation

Data Structure

- multilevel data structure
- requires a “context” id to identify individuals belonging to the same context
- NLSY sibling data contains a “family id” (constructed by researcher)
- data are unbalanced (we do not require clusters to be the same size)
- small clusters will contribute less information to the estimation of variance components than larger clusters
- it is OK to have clusters of size 1

(i.e., an individual is a context unto themselves)

- clusters of size 1 contribute to the estimation of fixed effects but not to the estimation of variance components

Example: clustered data

- siblings nested in families
- y is 1st premarital birth for NLSY women
- select sib-ships of size > 2
- null model (random intercept):

xtlogit fpmbir, i(famid)

or

xtmelogit fpmbir || famid:

Example: clustered data

random intercept: xtlogit

Example: clustered data

random intercept: xtmelogit

Variance Component

- add predictors (mostly level-2)

Variance Component

- conditional variance in u is 2.107
- proportionate reduction in error (PRE)
- a 31% reduction in level-2 variance when level-2 predictors are accounted for

Random Effects

- we can examine the distribution of random effects

Random Effects

- we can examine the distribution of random effects

Random Effects Distribution

- 90th percentile u90 = 1.338
- 10th percentile u10 = 0.388
- the risk for family at 90th percentile is

exp(1.338 – 0.388) = 2.586

times higher than for a family at the 10th percentile

- even if families are compositionally identical on covariates, we can assess the hypothetical differential in risks

Growth Curve Models

- growth models
- individuals are level-2 units
- repeated measures over time on individuals (level-1)
- models imply that logits vary across individuals
- intercept (conditional average logit) varies
- slope (conditional average effect of time) varies
- change is usually assumed to be linear
- use GLMM
- complications due to dimensionality
- intercept and slope may co-vary (necessitating a more complex model) and more

Growth Curve Models

- multilevel logit model for change over time
- T is time (strictly increasing)
- fixed and random coefficients (with covariates)

assume that u0 and u1 are bivariate normal

Multilevel Logit Models for Change

- Example: Log odds of employment of black men in the U.S. 1982-1988 (NLSY)

(consider 5 years in this period)

- time is coded 0, 1, 3, 4, 6
- dependent variable is: not-working, not-in-school
- unconditional growth (no covariates except T)
- conditional growth (add covariates)
- note: cross-level interactions implied by composite model

Fitting Multilevel Model for Change

- programming
- Stata (unconditional growth)
- Stata (conditional growth)

xtmelogit y year || id: year, var cov(un)

xtmelogit y year south unem unemyr inc hs ||id: year, var cov(un)

Logits: Observed, Conditional, and Marginal

the log odds of idleness decreases with time and shows variation in level and change

Composite Residuals in a Growth Model

- composite residual
- composite residual variance
- covariance of composite residual

Model

- covariance term is 0 (from either model)
- results in simplified interpretation
- easier estimation via variance components (default option)
- significant variation in slopes and initial levels
- other results:
- log odds of idleness decrease over time (negative slope)
- other covariates except county unemployment have significant effects on the odds of idleness
- the main effects are interpreted as effects on initial logits at time 1 or t = 0 or the 1982 baseline)
- interaction of time and unemployment rate captures the effect of county unemployment rate in 1982 on the change log odds of idleness
- the positive effect implies that higher county unemployment tends to dampen change in odds

IRT Models

- IRT models
- Item Response Theory
- models account for an individual-level random effect on a set of items (i.e., ability)
- items are assumed to tap a single latent construct (aptitude on a specific subject)
- item difficulty
- test items are assumed to be ordered on a difficulty scale
- easier harder
- expected patterns emerge whereby if a more difficult item is answered correctly the easier items are likely to have been answered correctly

IRT Models

- IRT models
- 1-parameter logistic (Rasch) model
- pij individual i’s probability of a correct response on the jth item
- θ individual i’s ability
- b item j’s difficulty
- properties
- an individual’s ability parameter is invariant with respect to the item
- the difficulty parameter is invariant with respect to individual’s ability
- higher ability or lower item difficulty lead to a higher probability of a correct response
- both ability and difficulty are measured on the same scale

ICC

- item characteristics curve (item response curve)
- depicts the probability of a correct response as a function of an examinee’s ability or trait level
- curves are shifted rightward with increasing item difficulty
- assume that item 3 is more difficult than item 2 and item 2 is more difficult than item 1
- probability of a correct response decreases as the threshold θ = bj is crossed, reflecting increasing item difficulty

IRT Models: ICC (3 Items)

slopes of item characteristics curves are equal when ability = item difficulty

Estimation as GLMM

- specification:
- set up a person-item data structure
- define x as a set of dummy variables
- change signs on β to reflect “difficulty”
- fit model without intercept to estimate all item difficulties
- normalization is common

PL1 Estimation

- Stata (data set up )

clear

set memory 128m

infile junk y1-y5 f using LSAT.dat

drop if junk==11 | junk==13

expand f

drop f junk

gen cons = 1

collapse (sum) wt2=cons, by(y1-y5)

gen id = _n

sort id

reshape long y, i(id) j(item)

PL1 Estimation

- Stata (model set up )

gen i1 = 0

gen i2 = 0

gen i3 = 0

gen i4 = 0

gen i5 = 0

replace i1 = 1 if item == 1

replace i2 = 1 if item == 2

replace i3 = 1 if item == 3

replace i4 = 1 if item == 4

replace i5 = 1 if item == 5

*

* 1PL

* constrain sd=1

cons 1 [id1]_cons = 1

gllamm y i1-i5, i(id) weight(wt) nocons family(binom) cons(1) link(logit) adapt

PL1 Estimation

- Stata (output )

PL1 Estimation

- Stata (parameter normalization)

* normalized solution

*[1 -- standard 1PL]

*[2 -- coefs sum to 0] [var = 1]

mata

bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

mb = mean(b')

bs = b:-mb

("MML Estimates", "IRT parameters", "B-A Normalization")

(-b', b', bs')

end

PL1 Estimation

- Stata (normalized solution)

IRT: Extensions

- 2-parameter logistic (2PL) model
- item discrimination parameters
- reveal differences in item’s utility to distinguish different ability levels among examinees
- high values denote items that are more useful in terms of separating examinees into different ability levels
- low values denote items that are less useful in distinguishing examinees in terms of ability
- ICCs corresponding to this model can intersect as they differ in location and slope
- steeper slope of the ICC is associated with a better discriminating item

IRT: Extensions

- 2-parameter logistic (2PL) model

IRT: Extensions

- 2-parameter logistic (2PL) model
- Stata (estimation)

eq id: i1 i2 i3 i4 i5

cons 1 [id1_1]i1 = 1

gllamm y i1-i5, i(id) weight(wt) nocons family(binom) link(logit) frload(1) eqs(id) cons(1) adapt

matrix list e(b)

*normalized solutions

*1 standard 2PL)

mata

bALL = st_matrix("e(b)")

b = bALL[1,1..5]

c = bALL[1,6..10]

a = -b:/c

("MML Estimates-Dif", "IRT Parameters")

(b', a')

("MML Discrimination Parameters")

(c')

end

IRT: Extensions

- 2-parameter logistic (2PL) model
- Stata (estimation)

* Bock and Aitkin Normalization (p. 164 corrected)

mata

bALL = st_matrix("e(b)")

b = -bALL[1,1..5]

c = bALL[1,6..10]

lc = ln(c)

mb = mean(b')

mc = mean(lc')

bs = b:-mb

cs = exp(lc:-mc)

("B-A Normalization DIFFICULTY", "B-A Normalization DISCRIMINATION")

(bs', cs')

end

IRT: 2PL (2) Bock-Aitkin Normalization

item 3 has highest difficulty and greatest discrimination

Binary Response Models for Event Occurrence

- discrete-time event-history models
- purpose:
- model the probability of an event occurring at some point in time
- Pr(event at t | event has not yet occurred by t)
- life table
- events & trials
- observe the number of events occurring to those who are at remain at risk as time passes
- takes account of the changing composition of the sample as time passes

Life Table

- observe
- Rj number at risk in time interval j (R0 = n), where the number at risk in interval j is adjusted over time
- Djevents in time interval j (D0 = 0)
- Wjremoved from risk (censored) in time interval j (W0 = 0)

(removed from risk due to other unrelated causes)

Life Table

- other key quantities
- discrete-time hazard (event probability in interval j)
- surviving fraction (survivor function in interval j)

Discrete-Time Hazard Models

- statistical concepts
- discrete random variable Ti (individual’s event or censoring time)
- pdf of T (probability that individual i experiences event in period j)
- cdf of T (probability that individual i experiences event in period j or earlier)
- survivor function (probability that individual i survives past period j)

Discrete-Time Hazard Models

- statistical concepts
- discrete hazard
- the conditional probability of event occurrence in interval j for individual i given that an event has not already occurred to that individual by interval j

Discrete-Time Hazard Models

- equivalent expression using binary data
- binary data dij = 1 if individual i experiences an event in interval j, 0 otherwise
- use the sequence of binary values at each interval to form a history of the process for individual i up to the time the event occurs
- discrete hazard

Discrete-Time Hazard Models

- modeling (logit link)
- modeling (complementary log –log link)
- non-proportional effects

Data Structure

- person-level data person-period form

Data Structure

- binary sequences

Estimation

- contributions to likelihood
- contribution to log L for individual with event in period j
- contribution to log L for individual censored in period j
- combine

Example:

- dropping out of Ph.D. programs (large US university)
- data: 6,964 individual histories spanning 20 years
- dropout cannot be distinguished from other types of leaving (transfer to other program etc.)
- model the logit hazard of leaving the originally-entered program as a function of the following:
- time in program (the time-dependent) baseline hazard)
- female and percent female in program
- race/ethnicity (black, Hispanic, Asian)
- marital status
- GRE score
- also add a program-specific random effect (multilevel)

Example:

clear

set memory 512m

infile CID devnt I1-I5 female pctfem black hisp asian married gre using DT28432.dat

logit devnt I1-I5, nocons or

est store m1

logit devnt I1-I5 female pctfem, nocons or

est store m2

logit devnt I1-I5 female pctfem black hisp asian , nocons or

est store m3

logit devnt I1-I5 female pctfem black hisp asian married, nocons or

est store m4

logit devnt I1-I5 female pctfem black hisp asian married gre , nocons or

Download Presentation

Connecting to Server..