Using secondary data analysis for outcomes research
1 / 48

Using Secondary Data Analysis for Outcomes Research - PowerPoint PPT Presentation

  • Uploaded on

Using Secondary Data Analysis for Outcomes Research. Epi 211 April 2010 Michael Steinman, MD. Disclosures and acknowledgements. Disclosures: None Acknowledgements: J. Michael McWilliams Ann Nattinger SGIM Research Committee. Question:.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Using Secondary Data Analysis for Outcomes Research' - oistin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using secondary data analysis for outcomes research

Using Secondary Data Analysis for Outcomes Research

Epi 211

April 2010

Michael Steinman, MD

Disclosures and acknowledgements
Disclosures and acknowledgements




J. Michael McWilliams

Ann Nattinger

SGIM Research Committee


  • You are a fellow / junior faculty member interested in studying...

    • Impact of nurse-led HTN clinics on clinical outcomes in patients with HTN

    • Impact of implementing EMRs on appropriate prescribing in ambulatory surgical patients

    • Whether quality measures of asthma control in children correlate with actual clinical outcomes in this population


  • Here’s your choice:

    • A. Get a multimillion dollar grant to conduct a multi-center, multi-year RCT

    • B. Analyze existing data

Learning objectives
Learning objectives

  • Appreciate key conceptual and methodologic issues involved in outcomes research employing secondary data analysis

  • Identify and use online tools for locating and learning about datasets relevant to your research

  • Understand the range of resources and support required to successfully complete a secondary data analysis


  • Working with secondary data

    • Conceptual and methodologic issues

  • Overview of high-value datasets and web-based resources

  • Planning your dataset project

    • Practical advice

  • Q&A

My definition of secondary data
(My) Definition of Secondary Data

Data that have been collected

but not for you

Types of secondary data
Types of Secondary Data

  • Survey

  • Administrative (claims)

  • Discharge

  • Medical chart / EMR

  • Disease registries

  • Aggregate (ARF, US Census)

  • Combinations and linkages

What kinds of research can be conducted with secondary data
What kinds of research can be conducted with secondary data?

Anything but randomized trials

  • By discipline:

    • Outcomes research

    • Epidemiology

    • Health services research

  • By question:

    • Descriptive

    • Comparative

    • Causal

Conceiving a project
Conceiving a Project

  • Which comes first: question or dataset?

    • Research question first

    • Dataset first

    • Trick question: both

  • Hybrid approach

    • Identify research focus, broad question

    • Consider candidate datasets

    • Hone question

    • Iterate between 2 and 3

What makes a good research question finer
What makes a good research question? (FINER)

  • Feasible—data, variables, & resources accessible & available

  • Interesting—to researcher and audience

  • Novel—extends what is already known

  • Ethical—upholds standards

  • Relevant—to patient care, clinical outcomes, policy, etc.

Cummings et al. Conceiving the research question. In: Hulley SB, Cummings SR, Browner WS, et al, eds. Designing clinical research, 3rd ed. Philadelphia: Lippincott Williams & Wilkins, 2007:17-26

Selecting a database
Selecting a Database

  • Compatibility with research question(s)

  • Availability and expense

  • Sample: representativeness, power

  • Measures of interest present and valid

  • Messiness and missingness

  • Local expertise

  • Linkages

Key elements for outcomes research
Key elements for outcomes research

  • Measure of intervention

  • Measure of outcome(s)

    • Intermediate outcomes

      • % of patients receiving treatment, measures of glycemic or lipid control (A1c, serum LDL)

    • Clinical outcomes

      • Death, hospitalization, satisfaction, etc.

  • Measures of important confounders

Advantages of secondary data
Advantages of Secondary Data

  • They are not primary data!

    • Efficiency: fast and cheap

    • No regrets

  • Scale and scope

    • Size and detail not otherwise feasible for individual research team

    • Generalizable

  • Novel and creative research questions

  • Often easier IRB review process

Challenges and pitfalls
Challenges and Pitfalls

  • Data mining/overfitting

    • When the analysis precedes the question

    • Does urine cortisol predict Catholicism?

  • Causal inference

    • Inherently limited with observational data

    • But does not preclude quasi-experimental designs to recover causal effects

Challenges and pitfalls1
Challenges and Pitfalls

  • Validity of measures

    • Beware of assumptions

    • Problems: coding, reporting, recall biases

    • Solutions: direct validation in subgroup or another data source, literature review, sensitivity analyses

  • Complexity of file structure

    • Row in dataset may not be unit of analysis

    • Skip patterns, proxy respondents

What you want and what you have
What You Want and What You Have

  • Want to measure time preferences

    • Behavioral economics: people tend to overvalue the present

    • Explanation for unhealthy habits, underuse of cancer screening?

  • Have measures on financial planning horizons

  • Are the two equivalent?

  • Might financial planning also depend on:

    • Income

    • Source of income, employment status

    • Dependents

    • Inheritance

A simple question
A Simple Question?

Ask: IF ((piRTab1X007AFinFam = FAMILYR) OR (piRTab1X007AFinFam = FINANCIAL_FAMILYR)) AND ((ACTIVELANGUAGE <> EXTENG) AND (ACTIVELANGUAGE <> EXTSPN)) AND (piInitA106_NumContactKids > 0) AND (piInitA100_NumNRKids > 0)


Section: E Level: Household Type: Numeric Width: 1 Decimals: 0

CAI Reference: SecE.KidStatus.E012_

2000 Link: G19802002 Link: HE012


[Do any of your children who do not live with you/Does CHILD NAME] live within 10 miles of you (in R's NURSING HOME CITY, STATE (CS25b/A067))?


[Do any of your children (who do not live with you)/Does CHILD NAME] live within 10 miles of you (in MAIN RESIDENCE [CITY/CITY, STATE STATE])?

6802 1. YES

4720 5. NO

32 8.DK (Don't Know); NA (Not Ascertained)

4 9. RF (Refused)

2087 Blank. INAP (Inapplicable)

* From the Health and Retirement Study

Challenges and pitfalls2
Challenges and Pitfalls

  • Representativeness of Sample

    • External validity (generalizability)

    • Internal validity (selection bias)

    • Example: comparing outcomes for insured and uninsured patients using hospital discharge data

      • Must be hospitalized to enter sample

      • Not only limits generalizability (to outpatients)

      • But inferences about the sample may be wrong

        • Sample would need to include uninsured who would have been hospitalized if insured

Statistical considerations missing data
Statistical Considerations:Missing Data

  • Sources

    • Non-response: unit and item

    • Variability in data collection (e.g. across states or over time, collected on subset due to expense)

    • Incomplete linkages

  • Language

    • MCAR: M╨Y  strong assumption, can ignore

    • MAR: M╨Y|X  weaker assumption, can fix

    • Non-ignorable, informative: M predicts Y  can’t fix

Statistical considerations missing data1
Statistical Considerations:Missing Data

  • Approaches

    • Listwise deletion, complete case (ok if MCAR)

    • Imputation

      • Mean imputation (biased standard errors)

      • Multiple imputation (MAR)

    • Weighting techniques (MAR)

    • Random effects models (MAR)

Statistical considerations analyzing survey data
Statistical Considerations:Analyzing Survey Data

  • Complex survey designs

  • Example multistage probability sample (NAMCS):

    US divided into PSUs (counties / MSAs)  sample of PSUs selected  within each PSU, stratify MDs by specialty  sample of MDs within each stratum  quasi-random sample of patients seen by each MD

  • Survey design

    • Clustering: convenience, ↓precision

    • Stratification: ↑precision , ↑representativeness (protects against a bad sample)

    • Oversampling: ↑representation and precision for subgroup of interest

Statistical considerations analyzing survey data1
Statistical Considerations:Analyzing Survey Data

  • Survey weights: affect point estimates

    • Individuals may have unequal selection probabilities

    • Need to apply weights to recover representativeness

    • W = 1/p(selection) = # people represented

    • W’s reflect sampling design, adjustments to match to census totals, non-response

  • Survey strata, clusters: affect se’s

    • Need variance estimators that account for correlated data

  • Most statistical packages able to handle

Finding the right dataset1
Finding the Right Dataset

  • Contain variables of interest

    • predictor, outcome, confounders

  • Relevant time frame

    • Cross-sectional, longitudinal

  • Feasible

    • Access: time, bureaucracy, cost

    • Usable

  • No perfect datasets -> hybrid approach of developing research question

Administrative data va
Administrative Data (VA)

  • VA has multiple high-value administrative databases

    • Outpatient visit information

      • Visit date, type of clinic, provider, ICD9 diagnoses

    • Inpatient information

      • Admitting dx(s), discharge dx(s), CPT codes, bed section, meds administered

    • Lab data

      • >40 labs

    • Pharmacy data

      • All inpatient and outpatient fills

    • Academic affiliation

    • etc

Administrative data va1
Administrative Data (VA)

  • Huge bureaucracy and paperwork

Administrative data va2
Administrative Data (VA)

  • Messy data

  • Huge size

    • 2 TB server

  • Data analyst

Survey data nhanes
Survey Data (NHANES)

  • National Health and Nutrition Examination Survey (NHANES)

    • Nationally representative sample of >10K patients every 2 years

    • Extensive interview data on clinical history (including diseases, behaviors, psychosocial parameters, etc.)

    • Physical exam information (e.g. VS)

    • Labs, biomarkers

Survey data nhanes1
Survey Data (NHANES)

  • Free and easy to download

  • (Relatively) easy to use

    • Although requires careful reading of documentation

  • Serial cross-sectional

  • Disease data self-report

  • Very limited information about providers and systems of care

Survey data namcs
Survey Data (NAMCS)

  • National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS)

  • Nationally representative sample of ~70K outpatient and ED visits per year

  • Physician-completed form about office visit

Survey data namcs1
Survey Data (NAMCS)

  • Data more from physician perspective (diagnoses, treatments Rx’ed, etc) and some info on providers (e.g., clinic organization, use of EMRs, etc)

  • Serial cross-sectional

    • Visit-focused

    • Not comprehensive, ? value for chronic diseases

Discharge data nis
Discharge Data (NIS)

  • National Inpatient Sample (NIS)

    • Database of inpatient hospital stays collected from ~20% of US community hospitals by AHRQ

    • Diagnoses and procedures, severity adjustment elements, payment source, hospital organizational characteristics

    • Hospital and county identifiers that allow linkage to the American Hospital Association Annual Survey and Area Resource File

Discharge data nis1
Discharge Data (NIS)

  • Relatively easy to access (DUA, $200/yr)

  • Relatively easy to use

    • Though need close attention to documentation

  • Limited data elements

  • Huge data files

Web based resources
Web-Based Resources

  • Society of General Internal Medicine (SGIM) Research Dataset Compendium


  • UCSF K-12 Data Resource Center


  • Partners in Information Access for the Public Health Workforce


Finding additional resources
Finding Additional Resources

  • National Information Center on Health Services Research and Health Care Technology (NICHSR)

  • Inter-University Consortium for Political and Social Research (ICPSR)

  • Partners in Information Access for the Public Health Workforce

  • Roadmap K-12 Data Resource Center (UCSF)

  • List of datasets from the American Sociologic Association

  • Canadian Research Data Centers – Data Sets and Research Tools (Canada)

  • Directory of Health and Human Services Data Resources

  • Publicly Available Databases from National Institute on Aging (NIA)

  • Publicly Available Databases from National Heart, Lung, & Blood Institute (NHLBI)

  • National Center for Health Statistics (NCHS) Data Warehouse

  • Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems

  • Veterans Affairs (VA) data

(all available at

National information center on health services research and health care technology nichsr
National Information Center on Health Services Research and Health Care Technology (NICHSR)

  • Databases, data repositories, health statistics

  • Fellowship and funding opportunities

  • Glossaries, research and clinical guidelines

  • Evidence-based practice and health technology assessment

  • Specialized PubMed searches on healthcare quality and costs

Inter university consortium for political and social research icpsr
Inter-University Consortium for Political and Social Research (ICPSR)

  • World’s largest archive of social science data

  • Searchable

  • Many sub-archives relevant to HSR

    • Health and Medical Care Archive

    • National Archive of Computerized Data on Aging

Resources needed
Resources Needed Research (ICPSR)

  • Your effort

  • Computer resources and security

  • Programmer and/or statistician effort

  • PhD statistical support – complex sampling or analyses

  • Time timeline

Irb issues
IRB Issues Research (ICPSR)

  • Exempt if no identifiers

    - Still need IRB determination of exempt status

  • Limited data set – DUA

    - Usually expedited for IRB

  • Full data use agreement

    -Still may be expedited for IRB

    -May need to explain lack of consent to funders/IRB

Summary 10 tips for success in secondary data analyses
Summary: 10 Tips for Success in Research (ICPSR)Secondary Data Analyses

  • Start with a clear research question and hypothesis

  • Get to know your data source:

    • Why does the database exist?

    • Who reports the data?

    • What are the incentives for accurate reporting?

    • How are the data audited, if at all?

    • Can you link the data to other large databases?

  • Get good documentation of the cohort, variables, and data layout, then read the fine print

  • Consult or collaborate with researchers who have used the database

Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health

Summary 10 tips for success in secondary data analyses1
Summary: 10 Tips for Success in Research (ICPSR)Secondary Data Analyses

  • Line up computing resources before data arrive

  • Allow time to receive data if not publicly available

  • Learn SAS, Stata, or other statistical software so you can analyze data yourself (or collaborate)

  • Assess data quality (e.g., outliers & missing data) with plots or frequency tables

  • Consult or collaborate with a statistician on your analysis plan, especially for complex surveys with sampling weights

  • Use clinical intuition to interpret results and consult experts as needed

Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health