Using secondary data analysis for outcomes research
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Using Secondary Data Analysis for Outcomes Research PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Using Secondary Data Analysis for Outcomes Research. Epi 211 April 2010 Michael Steinman, MD. Disclosures and acknowledgements. Disclosures: None Acknowledgements: J. Michael McWilliams Ann Nattinger SGIM Research Committee. Question:.

Download Presentation

Using Secondary Data Analysis for Outcomes Research

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using secondary data analysis for outcomes research

Using Secondary Data Analysis for Outcomes Research

Epi 211

April 2010

Michael Steinman, MD


Disclosures and acknowledgements

Disclosures and acknowledgements

Disclosures:

None

Acknowledgements:

J. Michael McWilliams

Ann Nattinger

SGIM Research Committee


Question

Question:

  • You are a fellow / junior faculty member interested in studying...

    • Impact of nurse-led HTN clinics on clinical outcomes in patients with HTN

    • Impact of implementing EMRs on appropriate prescribing in ambulatory surgical patients

    • Whether quality measures of asthma control in children correlate with actual clinical outcomes in this population


Question1

Question:

  • Here’s your choice:

    • A. Get a multimillion dollar grant to conduct a multi-center, multi-year RCT

    • B. Analyze existing data


Learning objectives

Learning objectives

  • Appreciate key conceptual and methodologic issues involved in outcomes research employing secondary data analysis

  • Identify and use online tools for locating and learning about datasets relevant to your research

  • Understand the range of resources and support required to successfully complete a secondary data analysis


Overview

Overview

  • Working with secondary data

    • Conceptual and methodologic issues

  • Overview of high-value datasets and web-based resources

  • Planning your dataset project

    • Practical advice

  • Q&A


Working with secondary data

Working with Secondary Data


My definition of secondary data

(My) Definition of Secondary Data

Data that have been collected

but not for you


Types of secondary data

Types of Secondary Data

  • Survey

  • Administrative (claims)

  • Discharge

  • Medical chart / EMR

  • Disease registries

  • Aggregate (ARF, US Census)

  • Combinations and linkages


What kinds of research can be conducted with secondary data

What kinds of research can be conducted with secondary data?

Anything but randomized trials

  • By discipline:

    • Outcomes research

    • Epidemiology

    • Health services research

  • By question:

    • Descriptive

    • Comparative

    • Causal


Conceiving a project

Conceiving a Project

  • Which comes first: question or dataset?

    • Research question first

    • Dataset first

    • Trick question: both

  • Hybrid approach

    • Identify research focus, broad question

    • Consider candidate datasets

    • Hone question

    • Iterate between 2 and 3


What makes a good research question finer

What makes a good research question? (FINER)

  • Feasible—data, variables, & resources accessible & available

  • Interesting—to researcher and audience

  • Novel—extends what is already known

  • Ethical—upholds standards

  • Relevant—to patient care, clinical outcomes, policy, etc.

Cummings et al. Conceiving the research question. In: Hulley SB, Cummings SR, Browner WS, et al, eds. Designing clinical research, 3rd ed. Philadelphia: Lippincott Williams & Wilkins, 2007:17-26


Selecting a database

Selecting a Database

  • Compatibility with research question(s)

  • Availability and expense

  • Sample: representativeness, power

  • Measures of interest present and valid

  • Messiness and missingness

  • Local expertise

  • Linkages


Key elements for outcomes research

Key elements for outcomes research

  • Measure of intervention

  • Measure of outcome(s)

    • Intermediate outcomes

      • % of patients receiving treatment, measures of glycemic or lipid control (A1c, serum LDL)

    • Clinical outcomes

      • Death, hospitalization, satisfaction, etc.

  • Measures of important confounders


Advantages of secondary data

Advantages of Secondary Data

  • They are not primary data!

    • Efficiency: fast and cheap

    • No regrets

  • Scale and scope

    • Size and detail not otherwise feasible for individual research team

    • Generalizable

  • Novel and creative research questions

  • Often easier IRB review process


Challenges and pitfalls

Challenges and Pitfalls

  • Data mining/overfitting

    • When the analysis precedes the question

    • Does urine cortisol predict Catholicism?

  • Causal inference

    • Inherently limited with observational data

    • But does not preclude quasi-experimental designs to recover causal effects


Challenges and pitfalls1

Challenges and Pitfalls

  • Validity of measures

    • Beware of assumptions

    • Problems: coding, reporting, recall biases

    • Solutions: direct validation in subgroup or another data source, literature review, sensitivity analyses

  • Complexity of file structure

    • Row in dataset may not be unit of analysis

    • Skip patterns, proxy respondents


What you want and what you have

What You Want and What You Have

  • Want to measure time preferences

    • Behavioral economics: people tend to overvalue the present

    • Explanation for unhealthy habits, underuse of cancer screening?

  • Have measures on financial planning horizons

  • Are the two equivalent?

  • Might financial planning also depend on:

    • Income

    • Source of income, employment status

    • Dependents

    • Inheritance


A simple question

A Simple Question?

Ask: IF ((piRTab1X007AFinFam = FAMILYR) OR (piRTab1X007AFinFam = FINANCIAL_FAMILYR)) AND ((ACTIVELANGUAGE <> EXTENG) AND (ACTIVELANGUAGE <> EXTSPN)) AND (piInitA106_NumContactKids > 0) AND (piInitA100_NumNRKids > 0)

JE012 CHILDREN LIVE WITHIN 10 MILES

Section: E Level: Household Type: Numeric Width: 1 Decimals: 0

CAI Reference: SecE.KidStatus.E012_

2000 Link: G19802002 Link: HE012

IF {R DOES NOT HAVE SPOUSE/PARTNER and DOES NOT STILL HAVE HOME OUTSIDE NURSING HOME {(CS11/A028=1) and (CS26/A070 NOT 1)}} or {R & SPOUSE/PARTNER} LIVE IN SAME NURSING HOME (CS11/A028=1 and CS12/A030=1):

[Do any of your children who do not live with you/Does CHILD NAME] live within 10 miles of you (in R's NURSING HOME CITY, STATE (CS25b/A067))?

OTHERWISE:

[Do any of your children (who do not live with you)/Does CHILD NAME] live within 10 miles of you (in MAIN RESIDENCE [CITY/CITY, STATE STATE])?

6802 1. YES

4720 5. NO

32 8.DK (Don't Know); NA (Not Ascertained)

4 9. RF (Refused)

2087 Blank. INAP (Inapplicable)

* From the Health and Retirement Study


Challenges and pitfalls2

Challenges and Pitfalls

  • Representativeness of Sample

    • External validity (generalizability)

    • Internal validity (selection bias)

    • Example: comparing outcomes for insured and uninsured patients using hospital discharge data

      • Must be hospitalized to enter sample

      • Not only limits generalizability (to outpatients)

      • But inferences about the sample may be wrong

        • Sample would need to include uninsured who would have been hospitalized if insured


Statistical considerations missing data

Statistical Considerations:Missing Data

  • Sources

    • Non-response: unit and item

    • Variability in data collection (e.g. across states or over time, collected on subset due to expense)

    • Incomplete linkages

  • Language

    • MCAR: M╨Y  strong assumption, can ignore

    • MAR: M╨Y|X  weaker assumption, can fix

    • Non-ignorable, informative: M predicts Y  can’t fix


Statistical considerations missing data1

Statistical Considerations:Missing Data

  • Approaches

    • Listwise deletion, complete case (ok if MCAR)

    • Imputation

      • Mean imputation (biased standard errors)

      • Multiple imputation (MAR)

    • Weighting techniques (MAR)

    • Random effects models (MAR)


Statistical considerations analyzing survey data

Statistical Considerations:Analyzing Survey Data

  • Complex survey designs

  • Example multistage probability sample (NAMCS):

    US divided into PSUs (counties / MSAs)  sample of PSUs selected  within each PSU, stratify MDs by specialty  sample of MDs within each stratum  quasi-random sample of patients seen by each MD

  • Survey design

    • Clustering: convenience, ↓precision

    • Stratification: ↑precision , ↑representativeness (protects against a bad sample)

    • Oversampling: ↑representation and precision for subgroup of interest


Statistical considerations analyzing survey data1

Statistical Considerations:Analyzing Survey Data

  • Survey weights: affect point estimates

    • Individuals may have unequal selection probabilities

    • Need to apply weights to recover representativeness

    • W = 1/p(selection) = # people represented

    • W’s reflect sampling design, adjustments to match to census totals, non-response

  • Survey strata, clusters: affect se’s

    • Need variance estimators that account for correlated data

  • Most statistical packages able to handle


Finding the right dataset

Finding the Right Dataset


Finding the right dataset1

Finding the Right Dataset

  • Contain variables of interest

    • predictor, outcome, confounders

  • Relevant time frame

    • Cross-sectional, longitudinal

  • Feasible

    • Access: time, bureaucracy, cost

    • Usable

  • No perfect datasets -> hybrid approach of developing research question


Administrative data va

Administrative Data (VA)

  • VA has multiple high-value administrative databases

    • Outpatient visit information

      • Visit date, type of clinic, provider, ICD9 diagnoses

    • Inpatient information

      • Admitting dx(s), discharge dx(s), CPT codes, bed section, meds administered

    • Lab data

      • >40 labs

    • Pharmacy data

      • All inpatient and outpatient fills

    • Academic affiliation

    • etc


Administrative data va1

Administrative Data (VA)

  • Huge bureaucracy and paperwork


Administrative data va2

Administrative Data (VA)

  • Messy data

  • Huge size

    • 2 TB server

  • Data analyst


Survey data nhanes

Survey Data (NHANES)

  • National Health and Nutrition Examination Survey (NHANES)

    • Nationally representative sample of >10K patients every 2 years

    • Extensive interview data on clinical history (including diseases, behaviors, psychosocial parameters, etc.)

    • Physical exam information (e.g. VS)

    • Labs, biomarkers


Survey data nhanes1

Survey Data (NHANES)

  • Free and easy to download

  • (Relatively) easy to use

    • Although requires careful reading of documentation

  • Serial cross-sectional

  • Disease data self-report

  • Very limited information about providers and systems of care


Survey data namcs

Survey Data (NAMCS)

  • National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS)

  • Nationally representative sample of ~70K outpatient and ED visits per year

  • Physician-completed form about office visit


Survey data namcs1

Survey Data (NAMCS)

  • Data more from physician perspective (diagnoses, treatments Rx’ed, etc) and some info on providers (e.g., clinic organization, use of EMRs, etc)

  • Serial cross-sectional

    • Visit-focused

    • Not comprehensive, ? value for chronic diseases


Discharge data nis

Discharge Data (NIS)

  • National Inpatient Sample (NIS)

    • Database of inpatient hospital stays collected from ~20% of US community hospitals by AHRQ

    • Diagnoses and procedures, severity adjustment elements, payment source, hospital organizational characteristics

    • Hospital and county identifiers that allow linkage to the American Hospital Association Annual Survey and Area Resource File


Discharge data nis1

Discharge Data (NIS)

  • Relatively easy to access (DUA, $200/yr)

  • Relatively easy to use

    • Though need close attention to documentation

  • Limited data elements

  • Huge data files


Web based resources

Web-Based Resources

  • Society of General Internal Medicine (SGIM) Research Dataset Compendium

    • www.sgim.org/go/datasets

  • UCSF K-12 Data Resource Center

    • http://www.epibiostat.ucsf.edu/courses/RoadmapK12/PublicDataSetResources/

  • Partners in Information Access for the Public Health Workforce

    • http://phpartners.org/health_stats.html


Finding additional resources

Finding Additional Resources

  • National Information Center on Health Services Research and Health Care Technology (NICHSR)

  • Inter-University Consortium for Political and Social Research (ICPSR)

  • Partners in Information Access for the Public Health Workforce

  • Roadmap K-12 Data Resource Center (UCSF)

  • List of datasets from the American Sociologic Association

  • Canadian Research Data Centers – Data Sets and Research Tools (Canada)

  • Directory of Health and Human Services Data Resources

  • Publicly Available Databases from National Institute on Aging (NIA)

  • Publicly Available Databases from National Heart, Lung, & Blood Institute (NHLBI)

  • National Center for Health Statistics (NCHS) Data Warehouse

  • Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems

  • Veterans Affairs (VA) data

(all available at www.sgim.org/go/datasets)


National information center on health services research and health care technology nichsr

National Information Center on Health Services Research and Health Care Technology (NICHSR)

  • Databases, data repositories, health statistics

  • Fellowship and funding opportunities

  • Glossaries, research and clinical guidelines

  • Evidence-based practice and health technology assessment

  • Specialized PubMed searches on healthcare quality and costs

http://www.nlm.nih.gov/hsrinfo/datasites.html


Inter university consortium for political and social research icpsr

Inter-University Consortium for Political and Social Research (ICPSR)

  • World’s largest archive of social science data

  • Searchable

  • Many sub-archives relevant to HSR

    • Health and Medical Care Archive

    • National Archive of Computerized Data on Aging

http://www.icpsr.umich.edu/icpsrweb/ICPSR/partners/archives.jsp


Planning your dataset project

Planning Your Dataset Project


Resources needed

Resources Needed

  • Your effort

  • Computer resources and security

  • Programmer and/or statistician effort

  • PhD statistical support – complex sampling or analyses

  • Time timeline


Irb issues

IRB Issues

  • Exempt if no identifiers

    - Still need IRB determination of exempt status

  • Limited data set – DUA

    - Usually expedited for IRB

  • Full data use agreement

    -Still may be expedited for IRB

    -May need to explain lack of consent to funders/IRB


Summary 10 tips for success in secondary data analyses

Summary: 10 Tips for Success inSecondary Data Analyses

  • Start with a clear research question and hypothesis

  • Get to know your data source:

    • Why does the database exist?

    • Who reports the data?

    • What are the incentives for accurate reporting?

    • How are the data audited, if at all?

    • Can you link the data to other large databases?

  • Get good documentation of the cohort, variables, and data layout, then read the fine print

  • Consult or collaborate with researchers who have used the database

Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health


Summary 10 tips for success in secondary data analyses1

Summary: 10 Tips for Success inSecondary Data Analyses

  • Line up computing resources before data arrive

  • Allow time to receive data if not publicly available

  • Learn SAS, Stata, or other statistical software so you can analyze data yourself (or collaborate)

  • Assess data quality (e.g., outliers & missing data) with plots or frequency tables

  • Consult or collaborate with a statistician on your analysis plan, especially for complex surveys with sampling weights

  • Use clinical intuition to interpret results and consult experts as needed

Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health


  • Login