slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
HIPAA and its Implications on Epidemiological Research Using Large Databases PowerPoint Presentation
Download Presentation
HIPAA and its Implications on Epidemiological Research Using Large Databases

Loading in 2 Seconds...

play fullscreen
1 / 24

HIPAA and its Implications on Epidemiological Research Using Large Databases - PowerPoint PPT Presentation

  • Uploaded on

K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School. HIPAA and its Implications on Epidemiological Research Using Large Databases. 1. Brief outline of this presentation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

HIPAA and its Implications on Epidemiological Research Using Large Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
K. Arnold Chan, MD, ScD

Harvard School of Public Health

Channing Laboratory,

Birgham & Women’s Hospital

and Harvard Medical School

HIPAA and its Implications on Epidemiological Research Using Large Databases


brief outline of this presentation
Brief outline of this presentation
  • Using large linked automated data for public health research
  • Data development processes to ensure HIPAA-compliance
  • Examples
  • Some thoughts
two types of data for public health research
Two types of data for public health research
  • Primary data
    • Prospectively collected
    • Well-designed data collection tool
    • Informed consent
  • Secondary data
    • Data originally collected for other purposes
    • May be proprietary
    • Privacy and confidentiality (particularly important if no prior authorization)
    • Different data systems
large linked healthcare databases
Large linked healthcare databases
  • Health insurance claims data
    • Medicaid
    • Medicare
    • Managed Care Organizations (MCO)
  • Automated medical records
  • Hospital / Clinic IT systems
  • Availability of written records
  • Need to contact patients / individuals ?
public health research within mcos
Public health research within MCOs
  • Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare)
  • Kaiser Permanente (several states)
  • Group Health Cooperative (Seattle area)
  • Others
  • HMO Research Network
    • 10+ MCOs across the U.S.
public health research within mcos6
Public health research within MCOs
  • Different types of MCOs
    • Group model
    • Staff model
    • Different relationship with hospitals
    • Implications on data access
  • MCOs with research programs
    • Separate research departments
    • Full-time investigators and support staff
data elements in the mco data
Data elements in the MCO data
  • Demographic information
  • Membership
    • Start date, termination date, benefit plan, ...
  • Office visits
    • Type of visit, diagnosis(es), special procedures
  • Special examinations
    • Radiology, Laboratory examinations
  • Hospitalizations
  • Drug dispensings
  • Linkable by a unique ID
hipaa and research with databases
HIPAA and Research with Databases
  • Authorization from individual research subjects not feasible
  • Individual authorization may be waived by Institutional Review Board or Privacy Board
    • Minimal Risk
    • Data reported in aggregate fashion
      • No single-case report
    • “Minimum necessary” principle
    • De-identification
hipaa and research with databases9
HIPAA and Research with Databases
  • Single MCO studies
    • Investigators and research staff are MCO employees
  • Multiple-MCO studies
    • May involve transferral of data across MCOs or to a Data Center
  • Other types of studies not covered in this presentation
    • e.g. Generate a de-identified dataset for public or commercial use
hipaa and data development
HIPAA and data development
  • Do not move individual level data unless absolutely necessary
    • Generate summary tables at each study site
    • Combine the tables for final report
    • Smalley et al. Contraindicated use of cisapride: the impact of an FDA regulatory action. JAMA 2000; 284: 3036-9.
hipaa and data development12
HIPAA and data development
  • Randomly generated Study ID to replace True ID
    • Crosswalk between the two stored at secured location
    • Destroy the crosswalk after successful linkage of data and quality check
    • Implications for storage and back-up
hipaa and data development13
HIPAA and data development
  • Roll-up / transform variables
    • Age --> Age groups
    • National Drug Code --> Drug or Group of drugs
    • ICD-9 diagnosis code --> Disease

e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg 55555-333-22

    • 65-70 y/o m with Heart Failure received Digoxin
hipaa and data development14
HIPAA and data development
  • Preserve temporal sequence of events

but disguise the real dates

  • e.g. Drug use during pregnancy study
    • 29 year-old received 55555-333-22 on Nov 25, 1999 and delivered a baby on Dec 10, 1999


    • 26-30 year-old mother delivered in 1999, baby exposed to amoxicillin at -16 days
hipaa and data development15
HIPAA and data development
  • Only extract information relevant to the study
    • e.g. A study of osteoporosis does not require information on subjects' mental health status
  • Co-morbid conditions may be relevant
    • Use proxy measures to describe level of comorbidity
      • Charlson's Index (based on concomitant diagnoses)
      • Chronic Disease Score (based on co-medications)
hipaa and data development16
HIPAA and data development
  • Geocoding
    • Describe social-economic status of study subjects based on census tract data
    • Send out (Study ID, address) to a geocoding firm
    • (Study ID, X1, X2, X3) returned
      • X1 : education level
      • X2 : income level
      • X3 : race/ethnicity information
an example
An example

Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns.Pediatrics 2003; 112: 620-7.

  • Data elements involved
    • Date of birth, gender
    • Membership
    • Drug dispensings
    • Diagnoses in close proximity to antibiotics dispensings
  • Data from nine MCOs
finkelstein et al pediatric antibiotics use study
Finkelstein et al. Pediatric antibiotics use study
  • Data development at each MCO
    • Extract antibiotics use information
    • Extract diagnosis of interest (infections)
    • Use date of birth, gender, and membership data to calculate person-time of interest
  • Refined, aggregate data forwarded to the Data Center
    • Rate of antibiotics use =

# of antibiotics use / 1,000 person-years

for each age-gender group

hipaa and data development19
HIPAA and data development
  • Individual identification is needed for certain types of research
    • Obtain medical records
    • Contact patient to conduct interview and/or request specimen
    • Linkage with external data
      • Cancer registry
      • National Death Index
hipaa and data development20
HIPAA and data development
  • The process
    • Data extraction, transformation, reduction, and de-identification carried out at each MCO
    • Governed by State laws and local HIPAA-compliant Standard Operating Procedures
    • Principle of Limited Dataset / Minimum necessary
  • The goal
    • Highly processed and de-identified data available for concatenation across study sites and complex analyses
k anonymity and large datasets
k-anonymity and large datasets
  • The goal
    • A de-identified dataset at a certain level of individual anonymity

A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam


A man 40-45 taking a beta-blocker and a thiazolidenedione

hipaa data storage and access
HIPAA, Data Storage and Access
  • Implications on Data Backup Plans
    • Data need to be destroyed after the report is published
  • Data only used to support pre-defined analyses
  • Ancillary analysis are possible after IRB review and approval
epidemiology studies using large databases
Epidemiology studies using large databases
  • In the old days ...
    • Give me all the data, do what I say ...
    • What if the investigator / reviewer want to do THIS analysis ?
    • Use existing datasets to test new hypothesis
  • Good research practice
    • Define necessary data elements according to research protocol
    • Pre-defined analytic plan
epidemiology studies using large databases24
Epidemiology studies using large databases
  • Keys to protection of human subjects
    • Competent, responsible investigators and staff
    • IRB review and oversight
    • Data development guidelines
      • e.g. Good Epidemiology Practice
    • Information technology
  • Some reasonable rules/guidelines are better than no guideline