CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais

1 / 65

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais - PowerPoint PPT Presentation

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Aisenbrey, Yale University. Goals for the workshop: -Intro to stata -Modeling Change over time: Panel Regression Models (fixed, between and random)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais' - betty_james

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysisSilke Aisenbrey, Yale University

Goals for the workshop:

-Intro to stata

-Modeling Change over time:

Panel Regression Models

(fixed, between and random)

-Modeling whether and/or when events occur:

Event History Analysis

(Data management for event history data,

kaplan-meier, cox, piecewise constant)

open stata:

VARIABLES

of open file

RESULTS

results and syntax

REVIEW

COMMAND

to see real data

to make changes directly in data

erase variables, cases, make single changes in cases

-->

basic descriptive commands

• relational and logical operators in stata:

== is equal to

~= is not equal (also !=)

> greater than

< less than

>= greater than or equal

<= less than or equal

& and

| or

~ not (also!)

basic descriptive commands

• sum var
• tab var1 var2
• tab var1 var2, col
• combine with: …… if var1==2 & var3>0
• by var1: ……………
• sort …………
• exercise:
• e.g.:
• tab abitur sex, col
• tab abitur sex if cohort==1930, col
• sort cohort
• by cohort: tab abitur sex

basic commands for data management

help “command”

gen var1 = var2

recode var1 (0=.) (1/8=2) (9=3)

rename var1 var100

**use the following variables:

cohort (indicator of cohort membership)

sex (1=male, 2=female)

agemaryc (age @ first marriage)

exercise:

e.g.:

sum agemaryc

recode age @ married in groups

-generate a new variable

-recode new variable into groups

-recode if marcens==0

Intro to panel regression with stata:

-panel data

-fixed effects

-between effects

-random effects

-fixed or random?

Panel data:

Panel data, also called cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods.

Cross-sectional data: only information about variance between subjects

Panel data: two kinds of informationbetween and within subjects

--> two sources of variance

cross sectional vs. panel analysesopen panelex1.dtaignore the fact that we have repeated measures:

regress childrn income

conclusion: more children --> higher income

Fixed effects model

Answers the question: What is the effect of x when x changes within persons over time e.g.

Person A has two children at first point of time and three children at second, what effect has this change on income?

Information used: fixed effects estimates using the time-series information in the data

Variance analyzed: within

Problems: only time variant variables

Fixed effects exercise:separate regression for each unit and then average it:

regress income childrn if id==1

regress income childrn if id==2

)

+

(

_____________________________

2

= - 2.5

conclusion: more children --> lower income

exercise: generate dummy variable for person and regress with dummy variable

tab id, g(iddum)

reg income childrn iddum1 iddum2

Fixed effects-define data set as panel data tsset id t-regression with fixed effects commandxtreg income chldrn, fe

Between effects modelAnswers the question: What is the effect of x when x is different (changes) between persons: Person A has “on the average” three children and Person B has “on the average” five children, what effect has this difference on their income? In the between effects model we model the mean response, where the means are calculated for each of the units.Information used: cross-sectional information (between subjects)Variance analyzed: between varianceTime variant and time invariant variables

Between effects

average

--->

regress income childrn

conclusion: more children --> more income

define data as panel data

xtreg dependent independent, be

Random effects model:Assumption: no difference between the two answers to the questions:1) what is the effect of x when x changes within the person: Person A has two children at first point of time and three children at second, what effect does this change have on their income?2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income? Information used: panel and cross-sectional (between and within subjects)Variance analyzed: between variance and within varianceTime variant and time invariant variables

Random effects model:-matrix-weighted average of the fixed and the between estimates. -assumes b1 has the same effect in the cross section as in the time-series -requires that individual error terms treated as random variables and follow the normal distribution.use:xtreg dependent independent if var==x, re

tell stata the structure of the data:

tsset X Y

X= caseid

Y=time/wave

summary statistics:

xtdesxtsum

use the effectsxtreg dependent independent if sex==1, fextreg dependent independent if sex==1, bextreg dependent independent if sex==1, reexercise: compare/discuss modelse.g.: xtreg indvar1 indvar2 … if sex==1, fetry to include time invariant variablestry to make theoretical/empirical argument why you use which model

Problems/Tests/Solutions:

What’s the right model: fixed or random effects?

Test: Hausman Test

Null hypothesis:

Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator.

If same (insignificant P-value, Prob>chi2 larger than .05) --> safe to use random effects.

If significant P-value --> use fixed effects.

xtreg y x1 x2 x3 ... , fe estimates store fixed xtreg y x1 x2 x3 ... , re estimates store random hausman fixed random

Problems/Tests/Solutions:

Autocorrelation?

What is autocorrelation:

Last time period’s values affect current values

test: xtserial

Install user-written program, type

findit xtserial or net search xtserial

xtserial depvar indepvars

Solution: use model correcting for autocorrelation

different data structure

panel

-waves

-number of children @ wave1 / 2/ 3/ 4

-employed @ wave1 / 2/ 3/ 4

-income @ wave1 / 2/ 3/ 4

regression models: dependent variable continuous

event

-dates of events

-birth of first child @ 1963

-birth of second child @ 1966…

-start of first employment @…

-start of unemployment @…

-start of second employment @…

time information in event data more precise: dependent variable event happens 0/1

Types of censoring
• Subject does not experience event of interest
• Incomplete follow-up
• Lost to follow-up
• Withdraws from study
• Left or right censored

tell stata that our data is “survival data”

• stset

stset X, failure(Y) id(Z)

X= time at which event happens or right censored, this is always needed

Y= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failure

• Z= id
• three examples:
• stset ageendsch
• event: end of school
• time: age @ end of school
• stset agemaryc, failure (marcens) id (caseid)

event: marriage

• stset agestjob, failure (stjob) id (caseid)

event: first job

Survivor function, S(t) defines the probability of surviving longer than time t

Survivor and hazard functions can be converted into each other

Hazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time interval (Δ(t)) at time t, given that the subject did not experience the event before that time

survivor function and hazard function

non-parametric: kaplan-meier

List the Kaplan-Meier survivor function . sts list . sts list, by(sex) compare

Graph the Kaplan-Meier survivor function . sts graph . sts graph, by(sex)

non-parametric: kaplan-meier

exercise:

stset your data for marriage, endschool or first job

e.g.:

1) sts list

2) sts graph

3) sts list, by (…) compare

4) sts graph, by (..)

non-parametric: Nelson-Aalen

List the Nelson-Aalen cumulative hazard function . sts list, na . sts list, na by(sex) compare

Graph the Nelson-Aalen cumulative hazard function . sts graph, na . sts graph, na by(sex)

non-parametric: Nelson-Aalen

exercise:

stset your data for marriage, endschool or first job

1) sts list, na

2) sts graph, na

3) sts list, na by (…) compare

4) sts graph, na by (..)

non-parametric: kaplan-meier

• Comparing Kaplan-Meier curves
• Log-rank test can be used to compare survival curves

Hypothesis test (test of significance)

• H0: the curves are statistically the same
• H1: the curves are statistically different

Compares observed to expected cell counts

for age@marr:

non-parametric: kaplan-meier

Comparing Kaplan-Meier curves

exercise:

Test equality of survivor functions

e.g.: sts test abitur

What happens when you have several covariates that you believe contribute to survival?

Example

Education, marital status, children, gender contribute to job change

Can use K-M curves – for 2 or maybe 3 covariates

Need another approach – multivariate Cox proportional hazards model is most common -- for many covariates

Limit of Kaplan-Meier curves

non-parametric: kaplan-meier

Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio

Assumes multiplicative risk -

-->proportional hazard assumption

Cox proportional hazards model

semi-parametric models: cox

semi-parametric models: cox

example age of first marriage stcox sex

Interpretation:

because the cox model does not estimate a baseline, there is no intercept in the output.

sex (male=1) (female=2)

whatever the hazard rate at a particular time is for men, it is 1.5 times higher for women

what does this mean in our case?

women get married younger than men do.

An estimated hazard rate ratio greater than 1 indicates the covariate is associated with an increased hazard of experiencing the event of interest

An estimated hazard rate ratio less than 1 indicates the covariate is associated with a decreased hazard of experiencing the event of interest

Estimated hazard rate ratio of 1 indicates no association between covariate and hazard.

Interpretation of the regression coefficients

semi-parametric models: cox

Graphically: estimates for functions:

stcox sex, basehc (H0)

stcurve, hazard at1(sex=0) at2(sex=1)

stcox sex, basesurv (S0)

stcurve, surviv at1(sex=0) at2(sex=1)

exercise:

and estimate the hazard and survival

Proportional assumption: covariates are independent with respect to time and their hazards are constant over time

Three general ways to examine model adequacy

Graphically: Do survival curves intersect?

Mathematically: Schoenfeld test

Computationally: Time-dependent variables (extended model)

compare with kaplan maier:

stcoxkm, by (sex)

exercise: do this with one of your estimates

"log-log" plots

stphplot, by (sex)

exercise: do this with one of your estimates, stphplot can be adjusted

--> look in stphplot help

Mathematically: Schoenfeld Test

tests if the log hazard function is constant over time, thus a rejection of the null hypothesis indicates a deviation from the proportional hazard assumption

stcox sex, schoenfeld(sch*) scaledsch(sca*)

estat phtest (if more var estat phtest, detail)

exercise: do this with your model, try to find a model which fits

Handles censored data well

Survival and hazard can be mathematically converted to each other

Kaplan-Meier survival curves can be compared graphically

Cox proportional hazards models help distinguish individual contributions of covariates to survival, provided certain assumptions are met.

Summary
The proportional hazards model as shown only works when the time to event data is relatively simple

Complications

non proportional hazard rates

time dependent covariates

competing risks

multiple failures

non-absorbing events

etc.

Extensive literature for these situations and software is available to handle them.

It can get a lot more complicated than this
Semi-parametric models: Piecewise constant

-transition rate assumed to be not constant over observed time

-splits data in user defined time pieces,

-transition rates constant in each “time piece”

-but: transition rates change between time pieces

Semi-parametric models: piecewise constant

in STATA a user written command, an “ado file” by J. Sorensen: stpiece

net search stpiece

install file

stpiece abitur, tp(20 30 40) tv(sex)

tp: time pieces, intervals

tv: covariates whose influence might vary over time pieces