1 / 31

Innovative statistical approaches in health services research: multiple informant analyses

Innovative statistical approaches in health services research: multiple informant analyses. Nicholas Horton Department of Mathematics Smith College, Northampton MA nhorton@email.smith.edu http://www.biostat.harvard.edu/multinform. Acknowledgements.

Download Presentation

Innovative statistical approaches in health services research: multiple informant analyses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Innovative statistical approaches in health services research:multiple informant analyses Nicholas Horton Department of Mathematics Smith College, Northampton MA nhorton@email.smith.edu http://www.biostat.harvard.edu/multinform

  2. Acknowledgements • Joint work with Garrett Fitzmaurice and Nan Laird, Harvard School of Public Health • Jane Murphy and the Stirling County Study for use of their example dataset • Supported by NIH grant RO1-MH54693

  3. Outline • Motivation for multiple source data • Examples of multiple sources/informants • Models for correlated multiple source data • Accounting for complex survey design • Accounting for incomplete/missing data • Example (Stirling County Study) • Conclusions

  4. Why multiple source data? • to provide better measures of some underlying construct that is difficult to measure or likely to be missing • also known as multiple informant reports, proxy reports, co-informants, etc. • discordance is expected, otherwise there is no need to collect multiple reports

  5. Definition of multiple source data • data obtained from multiple informants or raters (e.g., self-reports, family members, health care providers, teachers) • or via different/parallel instruments or methods (e.g., symptom rating scales, standardized diagnostic interviews, or clinical diagnoses) • None of the reports is a “gold’’ standard • We consider multiple source data that are commensurate (multiple measures of the same underlying variable on a similar scale)

  6. Examples of multiple source data • child psychopathology (ask parents, teachers and children about underlying psychological state) • service utilization studies (collect information from subjects and databases) • medical comorbidity (query providers and charts to assess medical problems)

  7. Examples of multiple source data (cont.) • adherence studies (collect self-report of adherence, electronic pill caps [MEMS] plus pharmacy records) • nutritional epidemiology (utilize multiple dietary instruments such as food frequency questionnaires, 24-hour recalls, food diaries)

  8. Incomplete/missing reports • Multiple source reports are commonly incomplete since, by definition, they are collected from sources other than the primary subject of the study • This missingness may be by design or happenstance (or both!)

  9. Example: missing source reports • Consider service utilization studies that collect information from subjects and databases • Subjects may be lost to follow-up (or only contacted periodically) • Databases may be incomplete (lack of consent, lack of appropriate coverage)

  10. Analytic approach • Multiple sources can provide information on outcomes or predictors (risk factors) • Multiple source outcome: what is the prevalence of child psychopathology? (measured using parallel parent and teacher reports) • Fitzmaurice et al (AJE, 1995), Horton et al (HSOR, 2002), Horton and Fitzmaurice (SIM tutorial, in press)

  11. Analytic approach (cont.) • Multiple source predictor: what are the odds of developing depression in adulthood, conditional on parallel reports of anxiety (collected from a child and a parent)? • Examples: Horton et al (AJE, 2001), Lash et al (AJE, 2003), Liddicoat et al (JGIM, 2004), Horton and Fitzmaurice (SIM tutorial, in press) • We will focus on an example using multiple source predictors

  12. Notation • Let Y denote a univariate outcome for a given subject • Let denote the l’th multiple source predictor • Let Z denote a vector of other covariates for the subject • To simplify exposition, we consider two sources with dichotomous reports (L=2)

  13. Questions to consider • Are the sources reporting on the same underlying construct (are they commensurate or interchangeable?) • Is it possible to combine the reports in some fashion? • How to handle missing reports?

  14. Analytic approaches • Reviewed in Horton, Laird and Zahner (IJMPR, 1999) • Use only one source • Fit separate models

  15. Analytic approaches (cont.) • Combine (pool) the reports in some fashion • Include both reports in the model

  16. Analytic approaches (cont.) • We considered simultaneous estimation of the marginal models: • Non-standard application of GEE • Method independently suggested by Pepe et al (SIM, 1999)

  17. Advantages of new approach • can be used to test for source differences in association with the outcome • can test if the effects of other risk factors on the outcome differ by source

  18. Advantages of new approach • different source effects where necessary • a pooled model can be fit if no significant source effects (potentially more efficient) • can be fit using general purpose statistical software

  19. Accounting for survey design • Many health services or epidemiologic studies arise from complex survey samples • Need to address stratification, multi-stage clustering and unequal sampling weights • Failing to properly account for survey design may lead to bias and incorrect estimation of variability

  20. Accounting for survey design (cont.) • Estimation proceeds using the approximate (quasi) log-likelihood (weighted version of the usual score equations for a GLM, accounting for the multi-stage clustering, including multiple source reports) • Can be fit using general purpose statistical software (e.g. Stata)

  21. Accounting for incomplete source reports • Missing source reports in this setting are missing predictors • Account for MAR missingness by weighted estimating equation methodology of Robins et al (JASA, 1994) and Xie and Paik (Biometrics, 1997) • Adds an additional “missingness weight” • Complications to variance estimation

  22. Example: Stirling County • Outcome: time to event (death) over 16 year follow-up period (1952-1968) (n=1079) • multiple source predictors: partially observed dichotomous physician report or self report of psychiatric disorder • other predictors: age (3 categories), gender • statistical model: piecewise exponential survival with 4 intervals each of 4 years duration (subjects contribute time at risk in each interval)

  23. Stirling County survey design Strata 1 Stratum 1 Stratum k Stratum K PSU 1 PSU J PSU j self- report phys.- report

  24. Stirling County missingness • Complete data on mortality • Relatively few reports of diagnosis missing (5% physician, 7% self) • For missing physicians, MCAR plausible • Missing self-reports associated with demographics and physician report • Accounting for missingness did not affect results (Horton et al, AJE, 2001)

  25. Results (separate parameters) • Initially fit model with separate parameters • No evidence for any non-zero source terms • Implies that the association between risk factors and mortality did not differ by source • Dropped these terms from the model, yielding parsimonious shared parameter model with smaller standard errors

  26. Results (shared parameters)

  27. Interpretation of results (annual mortality rate)

  28. Conclusions • new methods of analysis of multiple source data are available • can be implemented using existing software • methods allow the assessment of the relative association of each source • each source yielded similar conclusions: association between psychiatric disorder and mortality is stronger for younger subjects • unified model has less variability, pools information after testing for systematic differences

  29. Conclusions (cont.) • methods account for complex survey designs • methods incorporate partially observed subjects to contribute, under MAR assumptions • multiple source reports arise in many settings (not just for children anymore!)

  30. Innovative statistical approaches in health services research:multiple informant analyses Nicholas Horton Department of Mathematics Smith College, Northampton MA nhorton@email.smith.edu http://www.biostat.harvard.edu/multinform

  31. Future work • Maximum-likelihood estimation instead of GEE approach • May yield efficiency gains • Particularly useful for missing reports • Non-commensurate reports • Different scales • Different underlying constructs • Consider latent variable models (e.g. work of Landrum, Normand)

More Related