Designing longitudinal studies in epidemiology

1 / 47

# Designing longitudinal studies in epidemiology - PowerPoint PPT Presentation

Designing longitudinal studies in epidemiology. Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu Xavier Basagana Doctoral Student Department of Biostatistics, Harvard School of Public Health . Background.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Designing longitudinal studies in epidemiology' - suchin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Designing longitudinal studiesin epidemiology

Donna Spiegelman

Professor of Epidemiologic Methods

Departments of Epidemiology and Biostatistics

stdls@channing.harvard.edu

Xavier Basagana

Doctoral StudentDepartment of Biostatistics,

Harvard School of Public Health

Background
• We develop methods for the design of longitudinal studies for the most common scenarios in epidemiology
• There already exist some formulas for power and sample size calculations in this context.
• All prior work has been developed for clinical trials applications
Background

Based on clinical trials:

• Some are based on test statistics that are not valid or less efficient in an observational context, where (e.g. ANCOVA).

Background

• Based on clinical trials:
• In clinical trials:
• The time measure of interest is time from randomization  everyone starts at the same time. We consider situations where, for example, age is the time variable of interest, and subjects do not start at the same age.
• Time-invariant exposures
• Exposure (treatment) prevalence is 50% by design

Xavier Basagaña’s Thesis

• Derive study design formulas based on tests that are valid and efficient for observational studies, for two reasonable alternative hypotheses.
• Comprehensively assess the effect of all parameters on power and sample size.
• Extend the formulas to a context where not all subjects enter the study at the same time.
• Extend formulas to the case of time-varying covariates, and compare it to the time-invariant covariates case.

Xavier Basagaña’s Thesis

• Derive the optimal combination of number of subjects (n) and number of repeated measures (r+1) when subject to a cost constraint.
• Create a computer program to perform design computations. Intuitive parameterization and easy to use.

### Notation and Preliminary Results

Constant Mean Difference (CMD).

• We study two alternative hypotheses:
Intuitive parameterization of the alternative hypothesis
• the mean response at baseline (or at the mean initial time) in the unexposed group, where
• the percent difference between exposed and unexposed groups at baseline (or at the mean initial time), where
Intuitive parameterization of the alternative hypothesis (2)
• : the percent change from baseline (or from the mean initial time) to end of follow-up (or to the mean final time) in the unexposed group, where

When is not fixed, is defined at time s instead of at time

• : the percent difference between the change from baseline (or from the mean initial time) to end of follow-up (or mean final time) in the exposed group and the unexposed group, where

When , will be defined as the percent change from baseline (or from the mean initial time) to the end of follow-up (or to the mean final time) in the exposed group, i.e.

Notation & Preliminary Results

• We consider studies where the interval between visits (s) is fixed but the duration of the study is free (e.g. participants may respond to questionnaires every two years)
• Increasing r involves increasing the duration of the study
• We also consider studies where the duration of the study, , is fixed, but the interval between visits is free (e.g. the study is 5 years long)
• Increasing r involves increasing the frequency of the measurements, s
•  = s r.
Notation & Preliminary Results
• Model
• The generalized least squares (GLS) estimator of B is
• Power formula

Notation & Preliminary Results

• Let lm be the (l,m)th element of -1
• Assuming that the time distribution is independent of exposure group.
• Then, under CMD
• Under LDD

Correlation structures

• We consider three common correlation structures:
• Compound symmetry (CS).

Correlation structures

• Damped Exponential (DEX)

 = 0: CS

 = 0.3: CS

 = 1: AR(1)

Correlation structures

• Random intercepts and slopes (RS).
• Reparameterizing:
• is the reliability coefficient at baseline
• is the slope reliability at the end of follow-up ( =0 is CS; =1 all variation in slopes is between subjects).
• With this correlation structure, the variance of the response changes with time, i.e. this correlation structure gives a heteroscedastic model.

Example

• Goal is to investigate the effect of indicators of socioeconomic status and post-menopausal hormone use on cognitive function (CMD) and cognitive decline (LDD)
• “Pilot study” by Lee S, Kawachi I, Berkman LF, Grodstein F (“Education, other socioeconomic indicators, and cognitive function. Am J Epidemiol 2003; 157: 712-720). Will denote as Grodstein.
• Design questions include power of the published study to detect effects of specified magnitude, the number and timing of additional tests in order to obtain a study with the desired power to detect effects of specified magnitude, and the optimal number of participants and measurements needed in a de novo study of these issues

Example

• At baseline and at one time subsequently, six cognitive tests were administered to 15,654 participants in the Nurses’ Health Study
• Outcome: Telephone Interview for Cognitive Status (TICS)
• 00=32.7 (4);
• Implies model
• = 1 point/10 years of age

Example

• points
• Exposure: Post-menopausal hormone use (CURRHORM)
• Corr(CURRHORM, age)=-0.06
• points
• Time: age (years) is the best choice, not questionnaire cycle or calendar year of test
• The mean age was 74 and V(t0)4.

Example

• The estimated covariance parameters were
• SAS code to fit the LDD model with CS covariance

proc mixed;

class id;

random id;

• SAS code to fit the LDD model with RS covariance

proc mixed;

class id;

Random intercept age/type=un subject=id;

Illustration of use of softwareoptitxs.r
• We’ll calculate the power of the Grodstein’s published study to detect the observed 70% difference in rates of decline between those with more than high school vs. others
• Recall that 6.2% of NHS had more than high school; there was a –0.3% decline in cognitive function per year

> long.power()

Press <Esc> to quit

Constant mean difference (CMD) or Linearly divergent difference (LDD)? ldd

The alternative is LDD.

Enter the total sample size (N): 15000

Enter the number of post-baseline measures (r>0): 1

Enter the time between repeated measures (s): 2

Enter the exposure prevalence (pe) (0<=pe<=1): 0.062

Enter the variance of the time variable at baseline, V(t0)

(enter 0 if all participants begin at the same time): 4

Enter the correlation between the time variable at baseline and exposure, rho[e,t0]

(enter 0 if all participants begin at the same time): -0.01

Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1)

or the relative (percent) scale (2)? 2

The alternative hypothesis will be specified on the relative (percent) change scale.

Enter mean response at baseline among unexposed (mu00): 32.7

Enter the percent change from baseline to end of follow-up among unexposed (p2)

(e.g. enter 0.10 for a 10% change): -0.006

Enter the percent difference between the change from baseline to

end of follow-up in the exposed group and the unexposed group (p3) (e.g. enter 0.10 for a 10% difference): 0.7

Which covariance matrix are you assuming: compound symmetry (1),

damped exponential (2) or random slopes (3)? 2

You are assuming DEX covariance

Enter the residual variance of the response given the assumed model covariates (sigma2): 12

Enter the correlation between two measures of the same subject separated by one unit (rho): 0.3

Enter the damping coefficient (theta): 0.10

Power = 0.4206059

Power of current study
• To detect the observed 70% difference in cognitive decline by GRAD
• CS: 44%
• RS: 35%
• DEX : 42%
• To detect a hypothesized ±10% difference in cognitive decline by current hormone use
• CS & DEX: 7%
• RS: 6%
How many additional measurements are needed when tests are administered every 2 years how many more years of follow-up are needed...
• To detect the observed 70% difference in cognitive decline by GRAD with 90% power?
• CS, DEX , RS: 3 post-baseline measurements =6
• one more 5 year grant cycle
• To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power?
• CS, DEX : 6 post-baseline measurements =12
• More than two 5 year grant cycles

N=15,000 for these calculations

How many more measurements should be taken in four (1 NIH grant cycle) and eight years of follow-up (two NIH grant cycles)...
• To detect the observed

70% difference in cognitive

power?

• To detect a hypothesized

± 20% difference in cognitive

decline by current hormone

use with 90% power?

Optimize (N,r) in a new study of cognitive decline
• Assume
• 4 years of follow-up (1 NIH grant cycle);
• cost of recruitment and baseline measurements are twice that of subsequent measurements
• (N,r)=(26,795; 1) CS
• =(26,930;1) DEX
• =(28,945;1) RS
• CURRHORM:
• (N,r)=(97,662; 1) CS
• =(98,155; 1) DEX
• =(105,470;1) RS
Conclusions
• Re: Constant Mean Difference (CMD)

Conclusions

• CMD:
• If all observations have the same cost, one would not take repeated measures.
• If subsequent measures are cheaper, one would take no repeated measures or just a small number if the correlation between measures is large.
• If deviations from CS exist, it is advisable to take more repeated measures.
• Power increases as and as
• Power increases as Var( ) goes to 0

Conclusions

• LDD:
• If the follow-up period is not fixed, choose the maximum length of follow-up possible (except when RS is assumed).
• If the follow-up period fixed, one would take more than one repeated measure only when the subsequent measures are more than five times cheaper. When there are departures from CS, values of  around 10 or 20 are needed to justify taking 3 or 4 measures.
• Power increases as , as , as slope reliability goes to 0, as Var( ) increases, and as the correlation between and exposure goes to 0

Conclusions

• LDD:
• The optimal (N,r) and the resulting power can strongly depend on the correlation structure. Combinations that are optimal for one correlation may be bad for another.
• All these decisions are based on power considerations alone. There might be other reasons to take repeated measures.
• Sensitivity analysis. Our program.

Future work

• Develop formulas for time-varying exposure.
• Include dropout
• For sample size calculations, simply inflate the sample size by a factor of 1/(1-f).
• However, dropout can alter the relationship between N and r.