Using martingale residuals to assess goodness of fit for sampled risk set data

Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Bryan Langholz

Outline: • Example: Uranium miners cohort • Cohort model, data and martingale residuals • Risk set sampling • Martingale residuals and goodness-of-fit tests for sampled risk set data • Concluding remarks

Uranium miners cohort: (e.g. Langholz & Goldstein, 1996) • 3347 uranium miners from Colorado Plateau included in study cohort 1950-60 • Followed-up until end of 1982 • 258 lung cancer deaths • Interested in effect of radon and smoking exposure on the risk of lung cancer death • Have exposure information for the full cohort. Will sample from the risk sets for illustration

Relative risk regression models Hazard rate for individual i relative risk baseline hazard Relative risk for individual i depends on covariates xi1, xi2 , … , xip(possibly time-dependent) Cox: Excess relative risk:

Cohort data: (arrows are censored observations) Study time individuals at risk

t1< t2 < t3 < ….times of failures ijindividual failing at tj ("case") Counting process for individual i : Intensity processli(t) is given by

at risk indicator hazard rate Cumulative intensity processes: Martingales: Martingale residual processes:

Martingal residual processes may be used to assess goodness of fit: • Plot individualmartingale residuals versus covariates (Therneau, Grambsch & Flemming,1990) • Plot groupedmartingale residual processes versus time (Aalen,1993; Grønnesby & Borgan,1996) The latter may be extended to sampled risk set data

Risk set sampling • Cohort studies need information on covariates for all individuals at risk • Expensive to collect and check (!) this information for all individuals in large cohorts • For risk set sampling designs one only needs to collect covariate information for the cases and a few controls sampled at the times of the failure

Select m –1 controls among the n(t) – 1 non-failuresat risk if a case occurs at time t, i.e. match on study time Illustration for m = 2 case control

A sampled risk set consists of the case ijand its controls A sampling design for the controls is described by its sampling distribution A number of sampling designs are available The classical nested case-control design:If individual i fails at time tthe probability of selecting the set ras the sampled risk set is (we assume that r is a subset of the risk set, that r is of size m and that i is in r)

Inference on the regression coefficients can be based on the partial likelihood The partial likelihood enjoys usual likelihood properties (Borgan, Goldstein & Langholz1995) For the classical nested case-control design, the partial likelihood simplifies

Martingale residuals and goodness-of-fit tests for sampled risk set data Introduce the counting processes Intensity processes take the form:

Corresponding martingales: Martingale residual processes: The are of little practical use on their own, but they may be aggregated over groups of individuals to produce useful plots

For group g May be interpreted as "observed _ expected" number of failures in group g Simplifies for classical nested case-control Asymptotic distribution may be derived using counting process methods

Ilustration: uranium miners cohort Fit excess relative risk model: xi1 = cumulative radon (100 WLMs) xi2 = cumulative smoking (1000 packs) For classical nested case-control with three controls per case:

Aggregate martingale residual processes in three groups according to cumulative radon exposure: Groups: I: < 500 WLMs II: 500-1500 WLMs III: > 1500 WLMs There are indications for an interaction between cumulative radon exposure and age

Observed and expected number of failures in the groups for ages below and above 60 years: Chi-squared statistic with 2(3 – 1) = 4 df takes the value 10.5 (P-value 3.2%)

Concluding remarks The counting process formulation of nested case-control studies: • Introduces a time aspect that is usually disregarded for sample risk set data • Gives a similar model formulation as for cohort data and thereby opens up for similar methodo-logical developments as for cohort studies • Grouped martingale residual processes is one example of this. They allow to check for time-dependent effects and other deviations from the model

Questions and further develoments of grouped martingale residual plots and related goodness-of-fit methods • How should the grouping be performed? • How do specific deviations from the model turn up in the plots? • Kolmogorov-Smirnov and Cramer von Mises type tests? (Durbin’s approximation, Lin et al’s simultation trick)

Using martingale residuals to assess goodness of fit for sampled risk set data

Using martingale residuals to assess goodness of fit for sampled risk set data

Presentation Transcript

Goodness Of Fit

Bootstrap for Goodness of Fit

Nonparametrics and goodness of fit

Goodness-of-Fit Tests with Censored Data

15.1 Goodness-of-Fit Tests

Using Physics to Help Assess Hurricane Risk

Using Applied Mathematics to Assess Hurricane Risk

Goodness-of-Fit Tests

Goodness of Fit (GoF)

Using PM Data to Assess Visibility

Using Physics to Assess Hurricane Risk

GOODNESS OF FIT

14.1 Goodness of Fit

Goodness-of-fit tests for particular distributions

Goodness of Fit

Goodness of Fit using Bootstrap

Goodness of Fit Tests

Goodness of Fit Tests

Test of Goodness of Fit

Goodness of Fit using Bootstrap

Test of Goodness of Fit

Goodness of Fit Tests