CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysisSilke Aisenbrey, Yale University

Goals for the workshop: -Intro to stata -Modeling Change over time: Panel Regression Models (fixed, between and random) -Modeling whether and/or when events occur: Event History Analysis (Data management for event history data, kaplan-meier, cox, piecewise constant)

open stata: VARIABLES of open file RESULTS results and syntax REVIEW of syntax: commands or menu COMMAND

open data, with menu (stata data--> eventex.dta)

to see real data to make changes directly in data erase variables, cases, make single changes in cases -->

basic descriptive commands • relational and logical operators in stata: == is equal to ~= is not equal (also !=) > greater than < less than >= greater than or equal <= less than or equal & and | or ~ not (also!)

basic descriptive commands • sum var • tab var1 var2 • tab var1 var2, col • combine with: …… if var1==2 & var3>0 • by var1: …………… • sort ………… • exercise: • e.g.: • tab abitur sex, col • tab abitur sex if cohort==1930, col • sort cohort • by cohort: tab abitur sex

basic commands for data management help “command” gen var1 = var2 recode var1 (0=.) (1/8=2) (9=3) rename var1 var100 **use the following variables: cohort (indicator of cohort membership) sex (1=male, 2=female) agemaryc (age @ first marriage) exercise: e.g.: sum agemaryc recode age @ married in groups -generate a new variable -recode new variable into groups -recode if marcens==0

possible break

Intro to panel regression with stata: -panel data -fixed effects -between effects -random effects -fixed or random?

panel data (panelex1.dta)

Panel data: Panel data, also called cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods. Cross-sectional data: only information about variance between subjects Panel data: two kinds of informationbetween and within subjects --> two sources of variance

Janet: Basics of panel regression models

cross sectional vs. panel analysesopen panelex1.dtaignore the fact that we have repeated measures: regress childrn income conclusion: more children --> higher income

Fixed effects model Answers the question: What is the effect of x when x changes within persons over time e.g. Person A has two children at first point of time and three children at second, what effect has this change on income? Information used: fixed effects estimates using the time-series information in the data Variance analyzed: within Problems: only time variant variables

Fixed effects exercise:separate regression for each unit and then average it: regress income childrn if id==1 regress income childrn if id==2

) + ( _____________________________ 2 = - 2.5 conclusion: more children --> lower income exercise: generate dummy variable for person and regress with dummy variable tab id, g(iddum) reg income childrn iddum1 iddum2

Fixed effects-define data set as panel data tsset id t-regression with fixed effects commandxtreg income chldrn, fe

Between effects modelAnswers the question: What is the effect of x when x is different (changes) between persons: Person A has “on the average” three children and Person B has “on the average” five children, what effect has this difference on their income? In the between effects model we model the mean response, where the means are calculated for each of the units.Information used: cross-sectional information (between subjects)Variance analyzed: between varianceTime variant and time invariant variables

Between effects average ---> regress income childrn conclusion: more children --> more income define data as panel data xtreg dependent independent, be

Random effects model:Assumption: no difference between the two answers to the questions:1) what is the effect of x when x changes within the person: Person A has two children at first point of time and three children at second, what effect does this change have on their income?2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income? Information used: panel and cross-sectional (between and within subjects)Variance analyzed: between variance and within varianceTime variant and time invariant variables

Random effects model:-matrix-weighted average of the fixed and the between estimates. -assumes b1 has the same effect in the cross section as in the time-series -requires that individual error terms treated as random variables and follow the normal distribution.use:xtreg dependent independent if var==x, re

possible break

open data: panelex2.dtavarlist:

tell stata the structure of the data: tsset X Y X= caseid Y=time/wave summary statistics: xtdesxtsum

use the effectsxtreg dependent independent if sex==1, fextreg dependent independent if sex==1, bextreg dependent independent if sex==1, reexercise: compare/discuss modelse.g.: xtreg indvar1 indvar2 … if sex==1, fetry to include time invariant variablestry to make theoretical/empirical argument why you use which model

Problems/Tests/Solutions: What’s the right model: fixed or random effects? Test: Hausman Test Null hypothesis: Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator. If same (insignificant P-value, Prob>chi2 larger than .05) --> safe to use random effects. If significant P-value --> use fixed effects. xtreg y x1 x2 x3 ... , fe estimates store fixed xtreg y x1 x2 x3 ... , re estimates store random hausman fixed random

Problems/Tests/Solutions: Autocorrelation? What is autocorrelation: Last time period’s values affect current values test: xtserial Install user-written program, type findit xtserial or net search xtserial xtserial depvar indepvars

Significant test statistic indicates presence of serial correlation. Solution: use model correcting for autocorrelation xtregar instead of xtreg

possible break

different data structure panel -waves -number of children @ wave1 / 2/ 3/ 4 -employed @ wave1 / 2/ 3/ 4 -income @ wave1 / 2/ 3/ 4 regression models: dependent variable continuous event -dates of events -birth of first child @ 1963 -birth of second child @ 1966… -start of first employment @… -start of unemployment @… -start of second employment @… time information in event data more precise: dependent variable event happens 0/1

Different Faces of Event History Data

Types of censoring • Subject does not experience event of interest • Incomplete follow-up • Lost to follow-up • Withdraws from study • Left or right censored

open data eventex.dta

tell stata that our data is “survival data” • stset stset X, failure(Y) id(Z) X= time at which event happens or right censored, this is always needed Y= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failure • Z= id • three examples: • stset ageendsch • event: end of school • time: age @ end of school • stset agemaryc, failure (marcens) id (caseid) event: marriage • stset agestjob, failure (stjob) id (caseid) event: first job

DATA MANGAGEMENT HANNAH

Different Models of Event History

Survivor function, S(t) defines the probability of surviving longer than time t Survivor and hazard functions can be converted into each other Hazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time interval (Δ(t)) at time t, given that the subject did not experience the event before that time survivor function and hazard function

non-parametric: kaplan-meier List the Kaplan-Meier survivor function . sts list . sts list, by(sex) compare Graph the Kaplan-Meier survivor function . sts graph . sts graph, by(sex)

non-parametric: kaplan-meier exercise: stset your data for marriage, endschool or first job e.g.: 1) sts list 2) sts graph 3) sts list, by (…) compare 4) sts graph, by (..)

non-parametric: Nelson-Aalen List the Nelson-Aalen cumulative hazard function . sts list, na . sts list, na by(sex) compare Graph the Nelson-Aalen cumulative hazard function . sts graph, na . sts graph, na by(sex)

non-parametric: Nelson-Aalen exercise: stset your data for marriage, endschool or first job 1) sts list, na 2) sts graph, na 3) sts list, na by (…) compare 4) sts graph, na by (..)

non-parametric: kaplan-meier • Comparing Kaplan-Meier curves • Log-rank test can be used to compare survival curves Hypothesis test (test of significance) • H0: the curves are statistically the same • H1: the curves are statistically different Compares observed to expected cell counts for age@marr:

non-parametric: kaplan-meier Comparing Kaplan-Meier curves exercise: Test equality of survivor functions e.g.: sts test abitur

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais

Presentation Transcript

Data Envelopment Analysis in Stata

NetFlow Analysis with MapReduce

Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)?

Introduction to Medical Decision Making and Decision Analysis

Missing Data: Analysis and Design

Econometric Analysis of Panel Data

Introduction to Social Network Analysis Duke University May 2012 James Moody Duke University

Basic Nodal and Mesh Analysis

Job Analysis

Quantitative Data Analysis

Probe analysis and data preprocessing

Dose-response analysis

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

Analysis of Complex Survey Data and Survival Analysis

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Analysis of Demographic and Social Data

Survival Analysis with STATA

Introduction to Applied Behavior Analysis

Introduction to Real-Time Spectrum Analysis.

Missing Data: Analysis and Design

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

fMRI Data Analysis