Dynamic analysis of binary longitudinal data

Dynamic analysis of binary longitudinal data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Rosemeire L. Fiaccone, Robin Henderson and Mauricio L. Barreto

Outline: - An example of binary longitudinal data: The Blue Bay project - Modelling missingness for longitudinal binary data (including the relation to independent censoring in event history analysis) - An additive model for longitudinal binary data - Dynamic covariates - Martingale residual processes - Concluding comments

Blue Bay project: Bahia State, Brazil (size of France) State capital Salvador (pop: 2.5 mill.)

Public works and education in the areas of sanitation and environment executed by the Bahia State Government since 1997 Cost: more than $1 billion Belgica 2002 Belgica 1996

Data: Daily data on diarrhoeafor almost a thousand children (one per family) Collected at home visits Oct 2000 to Jan 2002 Children less than 3 years of age at entry Diarrhoea: three or more fluid motions a day Episode of diarrhoea: sequence of days with diarrhea until at least two consecutive clear days

The reduced prevalence/incidence over time may reflect improved health over the study period, or may be an artefact due to ageing of the cohort

Social, demographic and economic characteristics collected at entry to the study:

Follow-up information on 10 children: Under observation: New episode: X Ongoing episode: X Drop-out: O

Pattern of missing observations for all 926 children: Non-available data collector Police strike Carnival St. John's day Christmas Day

Three types of missingness: - Late entries (16% of children) - Drop-outs (21% of children) - Intermittent missingness (20% of observations)

Features of the data: Longitudinal binary data Four time scales: calendar, age, study, episode Calendar time used as basic time scale Aims: Study factors of importance for incidence and prevalence of diarrhoea and how diarrhoea incidence and prevalence vary over calendar time Ignored (for this talk): Spatial associations Other non-independence

Conditions on the missingness are defined for this model Modelling missingness: Joint model for binary data and missingness Model for binary data without missingness Model for observed data Parameters of interest are defined for this model Statistical methods are derived and studied for this model We need to relate the models for the three situations (starting with models for one individual)

Model without missingness Observations for child i is a binary time series Here if the child starts a new episode of diarrhea at day t(has diarrhoea at day t) Let be the s-algebra generated by the fixed and external time-varying covariates for child i is the information that had been available on child i by day t had there been no missingness

Introduce the conditional probabilities The aim for our analysis is to study how the vary over time and how they depend on covariates, including dynamic covariates that are functions of for s < t This differs from the common approach in longitudinal data analysis, where the focus is on the marginal probabilities

Joint model for binary longitudinal data and missingness Introduce the observation process for individual i We need to consider the larger filtration: where is generated by and external aspects of the observation process for child i

We make two assumption on the missingness: • These assumption correspond to: • sequential MAR in longitudinal data analysis • independent censoring in event history analysis

Modelling the observable data Binary observations for individual i : Observed filtration: (Note that we for convenience have included in the definition of )

Then: We will assume that is predictable, implying that the time-dependent dynamic covariates used for regression modelling depend only on observables Thus:

Intoduce The are martingale differences is a discrete time martingale Predictable variation process:

Modelling the relation between individuals Denote by Ftthe information available to the researcher on all children by day t We impose the following assumptions: (i) (ii) The assumptions are weaker than independence Nevertheless they are debatable [(i) in particular] for the diarrhoea data Note that (ii) implies that the martingales and are orthogonal

An additive model for longitudinal binary data Have the decomposition Let xi1t ,…, xiptbe predictable covariates for child i at day t Consider the model

Conditional on "the past" Ft-1 we at day t have i.e. a linear regression model We may estimate the by ordinary least squares at each day t (quick!) The estimates for each day will be quite unstable, but they may be accumulated over time to get stable estimates for the cumulative regression coefficients

Some estimated cumulative regression coefficients for a model for incidence with fixed covariates (may be interpreted as expected numbers)

We have (using "obvious" matrix notation) martingale transformation Properties may be derived using martingale methods as for Aalen's additive hazards model for time-continuous event history data. In particular is approximately multivariate normal with a covariance matrix that may be estimated by

Dynamic covariates How can past episodes of diarrhoea be used to predict future episodes?

Consider dynamic covariates of the form: with Yisincidence (prevalence) of diarrhoea Use t = 30 days and r = 0.01 below

A dynamic covariate may be on the causal pathway between a fixed covariate and the event process The inclusion of dynamic covariates in the analysis may distort the estimation of the effects of the fixed covariates To avoid such distortion we at each time t regress the dynamic covariates on the fixed covariates and use the residuals from these fits as new covariates This procedure keeps the effect of the fixed covariates the same as in the model without the dynamic covariates

Cumulative regression coefficients for incidence: Average number of days with diarrhoea Average number of diarrhoea episodes Also: male, 3 or more per bedroom, contaminated water source, open sewerage, rain affected accommodation, young mother

Martingale residual processes martingale transformation Examples of standardized martingale residual processes (standardized by model based SDs)

Empirical standard deviations of the martingale residual processes:

Cumulative regression coefficients for prevalence: Diarrhoea previous day (lag 1) Average number of days with diarrhoea Baseline Lag 2 (residual effect) Lag 3 (residual effect) Lag 4 (residual effect) Also: male, age, 3 or more per bedroom, poor street, contaminated water storage and source, standing water, open sewerage, rain affected accommodation, young mother

Prevalence: empirical standard deviations of the martingale residual processes

Not Markovian!

Concluding comments: A dynamic additive model provides a flexible framework for analyzing longitudinal binary data The method illustrate how ideas and approaches from event history analysis may be useful for analysis of longitudinal data Advantage: method is computationally very quick Drawback: incidence and prevalence are not restricted to the range 0 to 1 Methodological work is needed, in particular on methods for model selection and goodness-of-fit

Dynamic analysis of binary longitudinal data

Dynamic analysis of binary longitudinal data

Presentation Transcript

Five Approaches to Longitudinal Data Analysis

Longitudinal Analysis of Market Utilization / Pharmaceutical Sales Data

Longitudinal Data Analysis in Stata

Analysis of Clustered and Longitudinal Data

Longitudinal data

dynamic Data tainting and analysis

Developmental Models/ Longitudinal Data Analysis

Dynamic Binary Translation

Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data

Longitudinal Analysis. Setting up your Data Mart for Cohort-Based Longitudinal Analysis.

Longitudinal Analysis

Potential of Dynamic Binary Parallelization

Dynamic Binary Optimization

Dynamic Binary Optimization

Longitudinal Analysis

Longitudinal data

Longitudinal Data Analysis LDA Books (1)

Analysis of Clustered and Longitudinal Data

Confirmatory Factor Analysis of Longitudinal Data

Longitudinal data analysis in HLM

Robust Analysis of Incomplete Longitudinal Data in Clinical Trials

Analysis of Longitudinal Data Continuous Response: Part 1