1 / 46

Designing longitudinal studies in epidemiology - PowerPoint PPT Presentation

Designing longitudinal studies in epidemiology. Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics [email protected] Xavier Basagana Doctoral Student Department of Biostatistics, Harvard School of Public Health. Background.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Designing longitudinal studies in epidemiology' - laasya

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Designing longitudinal studiesin epidemiology

Donna Spiegelman

Professor of Epidemiologic Methods

Departments of Epidemiology and Biostatistics

Xavier Basagana

Doctoral StudentDepartment of Biostatistics,

Harvard School of Public Health

• We develop methods for the design of longitudinal studies for the most common scenarios in epidemiology

• There already exist some formulas for power and sample size calculations in this context.

• All prior work has been developed for clinical trials applications

Based on clinical trials:

• Some are based on test statistics that are not valid or less efficient in an observational context, where (e.g. ANCOVA).

• Based on clinical trials:

• In clinical trials:

• The time measure of interest is time from randomization  everyone starts at the same time. We consider situations where, for example, age is the time variable of interest, and subjects do not start at the same age.

• Time-invariant exposures

• Exposure (treatment) prevalence is 50% by design

Xavier Basagaña’s Thesis

• Derive study design formulas based on tests that are valid and efficient for observational studies, for two reasonable alternative hypotheses.

• Comprehensively assess the effect of all parameters on power and sample size.

• Extend the formulas to a context where not all subjects enter the study at the same time.

• Extend formulas to the case of time-varying covariates, and compare it to the time-invariant covariates case.

Xavier Basagaña’s Thesis

• Derive the optimal combination of number of subjects (n) and number of repeated measures (r+1) when subject to a cost constraint.

• Create a computer program to perform design computations. Intuitive parameterization and easy to use.

Notation and Preliminary Results

• We study two alternative hypotheses:

• the mean response at baseline (or at the mean initial time) in the unexposed group, where

• the percent difference between exposed and unexposed groups at baseline (or at the mean initial time), where

• : the percent change from baseline (or from the mean initial time) to end of follow-up (or to the mean final time) in the unexposed group, where

When is not fixed, is defined at time s instead of at time

• : the percent difference between the change from baseline (or from the mean initial time) to end of follow-up (or mean final time) in the exposed group and the unexposed group, where

When , will be defined as the percent change from baseline (or from the mean initial time) to the end of follow-up (or to the mean final time) in the exposed group, i.e.

• We consider studies where the interval between visits (s) is fixed but the duration of the study is free (e.g. participants may respond to questionnaires every two years)

• Increasing r involves increasing the duration of the study

• We also consider studies where the duration of the study, , is fixed, but the interval between visits is free (e.g. the study is 5 years long)

• Increasing r involves increasing the frequency of the measurements, s

•  = s r.

• Model

• The generalized least squares (GLS) estimator of B is

• Power formula

• Let lm be the (l,m)th element of -1

• Assuming that the time distribution is independent of exposure group.

• Then, under CMD

• Under LDD

• We consider three common correlation structures:

• Compound symmetry (CS).

• Damped Exponential (DEX)

 = 0: CS

 = 0.3: CS

 = 1: AR(1)

• Random intercepts and slopes (RS).

• Reparameterizing:

• is the reliability coefficient at baseline

• is the slope reliability at the end of follow-up ( =0 is CS; =1 all variation in slopes is between subjects).

• With this correlation structure, the variance of the response changes with time, i.e. this correlation structure gives a heteroscedastic model.

• Goal is to investigate the effect of indicators of socioeconomic status and post-menopausal hormone use on cognitive function (CMD) and cognitive decline (LDD)

• “Pilot study” by Lee S, Kawachi I, Berkman LF, Grodstein F (“Education, other socioeconomic indicators, and cognitive function. Am J Epidemiol 2003; 157: 712-720). Will denote as Grodstein.

• Design questions include power of the published study to detect effects of specified magnitude, the number and timing of additional tests in order to obtain a study with the desired power to detect effects of specified magnitude, and the optimal number of participants and measurements needed in a de novo study of these issues

• At baseline and at one time subsequently, six cognitive tests were administered to 15,654 participants in the Nurses’ Health Study

• Outcome: Telephone Interview for Cognitive Status (TICS)

• 00=32.7 (4);

• Implies model

• = 1 point/10 years of age

• points

• Exposure: Post-menopausal hormone use (CURRHORM)

• Corr(CURRHORM, age)=-0.06

• points

• Time: age (years) is the best choice, not questionnaire cycle or calendar year of test

• The mean age was 74 and V(t0)4.

• The estimated covariance parameters were

• SAS code to fit the LDD model with CS covariance

proc mixed;

class id;

random id;

• SAS code to fit the LDD model with RS covariance

proc mixed;

class id;

Random intercept age/type=un subject=id;

Program optitxs.r makes it all possible

Illustration of use of softwareoptitxs.r

• We’ll calculate the power of the Grodstein’s published study to detect the observed 70% difference in rates of decline between those with more than high school vs. others

• Recall that 6.2% of NHS had more than high school; there was a –0.3% decline in cognitive function per year

Press <Esc> to quit

Constant mean difference (CMD) or Linearly divergent difference (LDD)? ldd

The alternative is LDD.

Enter the total sample size (N): 15000

Enter the number of post-baseline measures (r>0): 1

Enter the time between repeated measures (s): 2

Enter the exposure prevalence (pe) (0<=pe<=1): 0.062

Enter the variance of the time variable at baseline, V(t0)

(enter 0 if all participants begin at the same time): 4

Enter the correlation between the time variable at baseline and exposure, rho[e,t0]

(enter 0 if all participants begin at the same time): -0.01

Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1)

or the relative (percent) scale (2)? 2

The alternative hypothesis will be specified on the relative (percent) change scale.

Enter the percent change from baseline to end of follow-up among unexposed (p2)

(e.g. enter 0.10 for a 10% change): -0.006

Enter the percent difference between the change from baseline to

end of follow-up in the exposed group and the unexposed group (p3) (e.g. enter 0.10 for a 10% difference): 0.7

Which covariance matrix are you assuming: compound symmetry (1),

damped exponential (2) or random slopes (3)? 2

You are assuming DEX covariance

Enter the residual variance of the response given the assumed model covariates (sigma2): 12

Enter the correlation between two measures of the same subject separated by one unit (rho): 0.3

Enter the damping coefficient (theta): 0.10

Power = 0.4206059

• To detect the observed 70% difference in cognitive decline by GRAD

• CS: 44%

• RS: 35%

• DEX : 42%

• To detect a hypothesized ±10% difference in cognitive decline by current hormone use

• CS & DEX: 7%

• RS: 6%

How many additional measurements are needed when tests are administered every 2 years how many more years of follow-up are needed...

• To detect the observed 70% difference in cognitive decline by GRAD with 90% power?

• CS, DEX , RS: 3 post-baseline measurements =6

• one more 5 year grant cycle

• To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power?

• CS, DEX : 6 post-baseline measurements =12

• More than two 5 year grant cycles

N=15,000 for these calculations

How many more measurements should be taken in four (1 NIH grant cycle) and eight years of follow-up (two NIH grant cycles)...

• To detect the observed

70% difference in cognitive

power?

• To detect a hypothesized

± 20% difference in cognitive

decline by current hormone

use with 90% power?

Optimize grant cycle) and eight years of follow-up (N,r) in a new study of cognitive decline

• Assume

• 4 years of follow-up (1 NIH grant cycle);

• cost of recruitment and baseline measurements are twice that of subsequent measurements

• (N,r)=(26,795; 1) CS

• =(26,930;1) DEX

• =(28,945;1) RS

• CURRHORM:

• (N,r)=(97,662; 1) CS

• =(98,155; 1) DEX

• =(105,470;1) RS

Conclusions grant cycle) and eight years of follow-up

• Re: Constant Mean Difference (CMD)

Conclusions grant cycle) and eight years of follow-up

• CMD:

• If all observations have the same cost, one would not take repeated measures.

• If subsequent measures are cheaper, one would take no repeated measures or just a small number if the correlation between measures is large.

• If deviations from CS exist, it is advisable to take more repeated measures.

• Power increases as and as

• Power increases as Var( ) goes to 0

Conclusions grant cycle) and eight years of follow-up

• LDD:

• If the follow-up period is not fixed, choose the maximum length of follow-up possible (except when RS is assumed).

• If the follow-up period fixed, one would take more than one repeated measure only when the subsequent measures are more than five times cheaper. When there are departures from CS, values of  around 10 or 20 are needed to justify taking 3 or 4 measures.

• Power increases as , as , as slope reliability goes to 0, as Var( ) increases, and as the correlation between and exposure goes to 0

Conclusions grant cycle) and eight years of follow-up

• LDD:

• The optimal (N,r) and the resulting power can strongly depend on the correlation structure. Combinations that are optimal for one correlation may be bad for another.

• All these decisions are based on power considerations alone. There might be other reasons to take repeated measures.

• Sensitivity analysis. Our program.

Future work grant cycle) and eight years of follow-up

• Develop formulas for time-varying exposure.

• Include dropout

• For sample size calculations, simply inflate the sample size by a factor of 1/(1-f).

• However, dropout can alter the relationship between N and r.