Designing longitudinal studies in epidemiology
Download
1 / 46

Designing longitudinal studies in epidemiology - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Designing longitudinal studies in epidemiology. Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics [email protected] Xavier Basagana Doctoral Student Department of Biostatistics, Harvard School of Public Health. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Designing longitudinal studies in epidemiology' - laasya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Designing longitudinal studies in epidemiology

Designing longitudinal studiesin epidemiology

Donna Spiegelman

Professor of Epidemiologic Methods

Departments of Epidemiology and Biostatistics

[email protected]

Xavier Basagana

Doctoral StudentDepartment of Biostatistics,

Harvard School of Public Health


Background
Background

  • We develop methods for the design of longitudinal studies for the most common scenarios in epidemiology

  • There already exist some formulas for power and sample size calculations in this context.

  • All prior work has been developed for clinical trials applications


Background1
Background

Based on clinical trials:

  • Some are based on test statistics that are not valid or less efficient in an observational context, where (e.g. ANCOVA).


Background

  • Based on clinical trials:

  • In clinical trials:

    • The time measure of interest is time from randomization  everyone starts at the same time. We consider situations where, for example, age is the time variable of interest, and subjects do not start at the same age.

    • Time-invariant exposures

    • Exposure (treatment) prevalence is 50% by design


Xavier Basagaña’s Thesis

  • Derive study design formulas based on tests that are valid and efficient for observational studies, for two reasonable alternative hypotheses.

  • Comprehensively assess the effect of all parameters on power and sample size.

  • Extend the formulas to a context where not all subjects enter the study at the same time.

  • Extend formulas to the case of time-varying covariates, and compare it to the time-invariant covariates case.


Xavier Basagaña’s Thesis

  • Derive the optimal combination of number of subjects (n) and number of repeated measures (r+1) when subject to a cost constraint.

  • Create a computer program to perform design computations. Intuitive parameterization and easy to use.



  • We study two alternative hypotheses:



Intuitive parameterization of the alternative hypothesis
Intuitive parameterization of the alternative hypothesis

  • the mean response at baseline (or at the mean initial time) in the unexposed group, where

  • the percent difference between exposed and unexposed groups at baseline (or at the mean initial time), where


Intuitive parameterization of the alternative hypothesis 2
Intuitive parameterization of the alternative hypothesis (2)

  • : the percent change from baseline (or from the mean initial time) to end of follow-up (or to the mean final time) in the unexposed group, where

    When is not fixed, is defined at time s instead of at time

  • : the percent difference between the change from baseline (or from the mean initial time) to end of follow-up (or mean final time) in the exposed group and the unexposed group, where

    When , will be defined as the percent change from baseline (or from the mean initial time) to the end of follow-up (or to the mean final time) in the exposed group, i.e.


Notation & Preliminary Results

  • We consider studies where the interval between visits (s) is fixed but the duration of the study is free (e.g. participants may respond to questionnaires every two years)

    • Increasing r involves increasing the duration of the study

  • We also consider studies where the duration of the study, , is fixed, but the interval between visits is free (e.g. the study is 5 years long)

    • Increasing r involves increasing the frequency of the measurements, s

  •  = s r.


Notation preliminary results
Notation & Preliminary Results

  • Model

  • The generalized least squares (GLS) estimator of B is

  • Power formula


Notation & Preliminary Results

  • Let lm be the (l,m)th element of -1

  • Assuming that the time distribution is independent of exposure group.

  • Then, under CMD

  • Under LDD


Correlation structures

  • We consider three common correlation structures:

    • Compound symmetry (CS).


Correlation structures

  • Damped Exponential (DEX)

 = 0: CS

 = 0.3: CS

 = 1: AR(1)


Correlation structures

  • Random intercepts and slopes (RS).

  • Reparameterizing:

    • is the reliability coefficient at baseline

    • is the slope reliability at the end of follow-up ( =0 is CS; =1 all variation in slopes is between subjects).

    • With this correlation structure, the variance of the response changes with time, i.e. this correlation structure gives a heteroscedastic model.


Example

  • Goal is to investigate the effect of indicators of socioeconomic status and post-menopausal hormone use on cognitive function (CMD) and cognitive decline (LDD)

  • “Pilot study” by Lee S, Kawachi I, Berkman LF, Grodstein F (“Education, other socioeconomic indicators, and cognitive function. Am J Epidemiol 2003; 157: 712-720). Will denote as Grodstein.

  • Design questions include power of the published study to detect effects of specified magnitude, the number and timing of additional tests in order to obtain a study with the desired power to detect effects of specified magnitude, and the optimal number of participants and measurements needed in a de novo study of these issues


Example

  • At baseline and at one time subsequently, six cognitive tests were administered to 15,654 participants in the Nurses’ Health Study

  • Outcome: Telephone Interview for Cognitive Status (TICS)

    • 00=32.7 (4);

      • Implies model

    • = 1 point/10 years of age


Example

  • Exposure: Graduate school degree vs. not (GRAD)

    • Corr(GRAD, age)=-0.01

    • points

  • Exposure: Post-menopausal hormone use (CURRHORM)

    • Corr(CURRHORM, age)=-0.06

    • points

  • Time: age (years) is the best choice, not questionnaire cycle or calendar year of test

    • The mean age was 74 and V(t0)4.


Example

  • The estimated covariance parameters were

  • SAS code to fit the LDD model with CS covariance

    proc mixed;

    class id;

    model tics=grad age gradage/s;

    random id;

  • SAS code to fit the LDD model with RS covariance

    proc mixed;

    class id;

    model tics=grad age gradage/s ddfm=bw;

    Random intercept age/type=un subject=id;


Program optitxs r makes it all possible
Program optitxs.r makes it all possible




Illustration of use of software optitxs r
Illustration of use of softwareoptitxs.r

  • We’ll calculate the power of the Grodstein’s published study to detect the observed 70% difference in rates of decline between those with more than high school vs. others

  • Recall that 6.2% of NHS had more than high school; there was a –0.3% decline in cognitive function per year


> long.power()

Press <Esc> to quit

Constant mean difference (CMD) or Linearly divergent difference (LDD)? ldd

The alternative is LDD.

Enter the total sample size (N): 15000

Enter the number of post-baseline measures (r>0): 1

Enter the time between repeated measures (s): 2

Enter the exposure prevalence (pe) (0<=pe<=1): 0.062

Enter the variance of the time variable at baseline, V(t0)

(enter 0 if all participants begin at the same time): 4

Enter the correlation between the time variable at baseline and exposure, rho[e,t0]

(enter 0 if all participants begin at the same time): -0.01

Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1)

or the relative (percent) scale (2)? 2

The alternative hypothesis will be specified on the relative (percent) change scale.


Enter mean response at baseline among unexposed (mu00): 32.7

Enter the percent change from baseline to end of follow-up among unexposed (p2)

(e.g. enter 0.10 for a 10% change): -0.006

Enter the percent difference between the change from baseline to

end of follow-up in the exposed group and the unexposed group (p3) (e.g. enter 0.10 for a 10% difference): 0.7

Which covariance matrix are you assuming: compound symmetry (1),

damped exponential (2) or random slopes (3)? 2

You are assuming DEX covariance

Enter the residual variance of the response given the assumed model covariates (sigma2): 12

Enter the correlation between two measures of the same subject separated by one unit (rho): 0.3

Enter the damping coefficient (theta): 0.10

Power = 0.4206059


Power of current study
Power of current study

  • To detect the observed 70% difference in cognitive decline by GRAD

    • CS: 44%

    • RS: 35%

    • DEX : 42%

  • To detect a hypothesized ±10% difference in cognitive decline by current hormone use

    • CS & DEX: 7%

    • RS: 6%


How many additional measurements are needed when tests are administered every 2 years how many more years of follow-up are needed...

  • To detect the observed 70% difference in cognitive decline by GRAD with 90% power?

    • CS, DEX , RS: 3 post-baseline measurements =6

      • one more 5 year grant cycle

  • To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power?

    • CS, DEX : 6 post-baseline measurements =12

      • More than two 5 year grant cycles

        N=15,000 for these calculations


How many more measurements should be taken in four (1 NIH grant cycle) and eight years of follow-up (two NIH grant cycles)...

  • To detect the observed

    70% difference in cognitive

    decline by GRAD with 90%

    power?

  • To detect a hypothesized

    ± 20% difference in cognitive

    decline by current hormone

    use with 90% power?


Optimize n r in a new study of cognitive decline
Optimize grant cycle) and eight years of follow-up (N,r) in a new study of cognitive decline

  • Assume

    • 4 years of follow-up (1 NIH grant cycle);

    • cost of recruitment and baseline measurements are twice that of subsequent measurements

  • GRAD:

    • (N,r)=(26,795; 1) CS

    • =(26,930;1) DEX

    • =(28,945;1) RS

  • CURRHORM:

    • (N,r)=(97,662; 1) CS

    • =(98,155; 1) DEX

    • =(105,470;1) RS


Conclusions
Conclusions grant cycle) and eight years of follow-up

  • Re: Constant Mean Difference (CMD)


Conclusions grant cycle) and eight years of follow-up

  • CMD:

    • If all observations have the same cost, one would not take repeated measures.

    • If subsequent measures are cheaper, one would take no repeated measures or just a small number if the correlation between measures is large.

    • If deviations from CS exist, it is advisable to take more repeated measures.

    • Power increases as and as

    • Power increases as Var( ) goes to 0


Conclusions grant cycle) and eight years of follow-up

  • LDD:

    • If the follow-up period is not fixed, choose the maximum length of follow-up possible (except when RS is assumed).

    • If the follow-up period fixed, one would take more than one repeated measure only when the subsequent measures are more than five times cheaper. When there are departures from CS, values of  around 10 or 20 are needed to justify taking 3 or 4 measures.

    • Power increases as , as , as slope reliability goes to 0, as Var( ) increases, and as the correlation between and exposure goes to 0


Conclusions grant cycle) and eight years of follow-up

  • LDD:

    • The optimal (N,r) and the resulting power can strongly depend on the correlation structure. Combinations that are optimal for one correlation may be bad for another.

  • All these decisions are based on power considerations alone. There might be other reasons to take repeated measures.

  • Sensitivity analysis. Our program.


Future work grant cycle) and eight years of follow-up

  • Develop formulas for time-varying exposure.

  • Include dropout

    • For sample size calculations, simply inflate the sample size by a factor of 1/(1-f).

    • However, dropout can alter the relationship between N and r.


ad