Strategic Health IT Advanced Research Projects (SHARP)
Download
1 / 40

Strategic Health IT Advanced Research Projects SHARP Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phen - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping. June 30, 2011 Jyoti Pathak, PhD Assistant Professor of Biomedical Informatics Department of Health Sciences Research.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Strategic Health IT Advanced Research Projects SHARP Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phen' - manju


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping

June 30, 2011

Jyoti Pathak, PhD

Assistant Professor of Biomedical Informatics

Department of Health Sciences Research


Project 3 collaborators and acknowledgments
Project 3: Collaborators and Acknowledgments

  • CDISC (Clinical Data Interchange Standards Consortium)

    • Rebecca Kush, Landen Bain, Mark Arratoon

  • Centerphase Solutions

    • Gary Lubin, Jeff Tarlowe

  • Harvard University/MIT

    • GuerganaSavova, Margarita Sordo, Peter Szolovits

  • IBM T.J. Watson Research Labs

    • Marshall Schor

  • Intermountain Healthcare/University of Utah

    • Susan Welch, Herman Post, Darin Wilcox, Peter Haug

  • Mayo Clinic

    • Cui Tao, Lacey Hart, Erin Martin, Sridhar Dwarkanath, Calvin Beebe, Kent Bailey, Kevin Bruce, Mike Conway (UCSD)


Outline

Background

On-going projects and updates

Proposed project ideas for Year 2

Productivity till date

Q & A

Outline


The big question
The Big Question…

  • The era of Genome-Wide Association Studies (GWAS) has arrived

    • Genotyping cost is asymptoting to free [Altman et al.]

    • Most (all?) published GWAS are done on carefully selected and uniformly characterized patient populations

    • Time consuming

  • Clinical Phenotyping, on the other hand, is lacking

    • Slow-throughput

    • Costly and time consuming

  • How “good” are EMRs (with inconsistencies and biases) as a source for phenotypes?


Why is this important now
Why is this important now?

  • Bio-repositories are becoming popular

    • Linking biospecimens to personal health data

  • Population-based studies for genetic and environmental conditions and contributions to disease etiology

    • Often limited in scope or population diversity

  • Clinical trials eligibility

    • Cohort identification is always a bottleneck

  • Quality metrics and HITECH Act

  • Large-scale prospective cohort studies could be facilitated by availability of complete, standardized, and unbiased data from EMRs


Pros and cons of emr data for phenotyping
Pros and Cons of EMR Data for Phenotyping

  • We have a LOT of information about subjects

    • Demographics, labs, meds, procedures…

    • Team diagnoses as opposed to a diagnoses based on a single person’s opinion

    • Potential for more reliable diagnoses

    • Identification of otherwise latent population differences

  • Possible issues with using EMR data for phenotyping

    • Non-standardized, heterogeneous, unstructured data

    • Measured (e.g., demographics) vs. un-measured (e.g., socio-economic status) population differences

    • Hospital specialization and coding practices

    • Population/regional market landscape


But the challenges can be addressed if we
But…the challenges can be addressed…if we

  • Develop techniques for standardization and normalization of clinical data

  • Develop techniques for transforming and managing unstructured clinical text into structured representations

  • Develop techniques for resolving missing and inconsistent data

  • Develop a scalable, robust and flexible framework for demonstratingall of the above in a “real-world setting”

SHARP Area 4 Project!


Emr derived phenotyping
EMR-derived Phenotyping

  • Overarching goal

    • To develop techniques and algorithms that operate on normalized EMR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings

  • Phenotyping (from our perspective)

    • Inclusion and exclusion criteria for cohort identification

    • Numerator and denominator criteria for clinical quality metrics

    • Trigger criteria for clinical decision support


Emr based phenotype algorithms
EMR-based Phenotype Algorithms

  • Typical components

    • Billing and diagnoses codes

    • Procedure codes

    • Labs

    • Medications

    • Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores)

    • Pathology

    • Imaging?

  • Organized into inclusion and exclusion criteria

  • Experience from eMERGE (http://www.gwas.net)

    • Electronic Medical Records and Genomics Network


Emr based phenotype algorithms1
EMR-based Phenotype Algorithms

  • Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95%)

  • For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98%



Challenges
Challenges

  • Algorithm design

    • Non-trivial; requires significant expert involvement

    • Highly iterative process

    • Time-consuming manual chart reviews

    • Representation of “phenotypic logic”

  • Data access and representation

    • Lack of unified vocabularies, data elements, and value sets

    • Questionable reliability of ICD & CPT codes (e,g., omit codes that don’t pay well, billing the wrong code since it is easier to find)

    • Natural Language Processing needs

  • And many more…


Outline1
Outline

  • Background

  • On-going projects and updates

  • Proposed projects for Year 2

  • Productivity till date

  • Q & A


Current htp project themes
Current HTP Project Themes

  • Identification of Clinical Element Models

  • Phenotyping Execution Logic

  • Data Quality, Validation and Cost Effectiveness


Project overview
Project Overview

  • Three eMERGEphenotyping algorithms as initial Use Cases

    • Type 2 Diabetes Mellitus (T2DM)

    • Peripheral Arterial Disease (PAD)

    • Hypothyroidism

  • Specified computable mappings between CEMs and algorithms

  • Classified phenotyping input specifications into two categories:

    • General EHR data requirements (Examples: demographics, diagnoses)

    • Phenotype-specific EHR data (Example: Ankle-brachial index for PAD)

  • Proposed semantic types of the input specifications


Semantic classification types
Semantic Classification Types

Demographic data (e.g., Gender, Race, Age, etc)

Physical measurements (e.g., Weight, Height, BMI, etc)

Diagnosis (ICD codes, SNOMED CT annotations from problem list, administrative coding workflows, clinical notes, and etc)

Procedure (CPT codes, ICD procedure codes)

Medication

Laboratory


General models for scalability

Diagnosis

AdministrativeDiagnosisCode: billing purposes

ClinicalAssertedDiagnosisCode: problem list, clinical notes, etc

Medication

Prescribed/Ordered

Dispensed

Administered

Procedure

AdministrativeProcedureCode: CPT code, ICD 9 code for inpatient.

Laboratory

General Models for Scalability


Mapping issues
Mapping Issues

  • Secondary use versus patient care meanings

    • History of X meaning “evidence of X prior to date Y”

      versus history of X statementin text documents

    • Diagnosis inputs often validated on ICD-9-CM codes

  • Non-standard aggregations

    • Fasting glucose test

  • Availability of data in EHR

    • Age at onset of X

    • Medical specialty (ankle brachial index)

    • Smoking history/family history (NLP/structured solutions)


Mapping considerations
Mapping Considerations

  • Algorithm inputs are abstractions of EHR content

    • Native content

    • Generalized content

    • Computed

    • Selected content

  • Common constraints of EHR content

    • Source of data, i.e., EHR application used, encounter type

    • Allowable codes

    • Temporal bounds

    • Relationships among separate observations




Current htp project themes1
Current HTP Project Themes

  • Identification of Clinical Element Models

  • Phenotyping Execution Logic

  • Data Quality, Validation and Cost Effectiveness


Drools based phenotyping architecture
Drools-based PhenotypingArchitecture

Clinical Element Database

List of

Patients for Specific Cases

Drools

(A long with other technologies)

  • Workflow authoring by domain experts (clinicians)

  • Rule accessibility by clinicians – BPMN, decision tables, DSL; collaborative authoring

Domain Expert ~

Analyst ~

Developer


Drools based phenotyping architecture1
Drools-based Phenotyping Architecture

Clinical Element Database

Data Access Layer

Business Logic

Transformation Layer

Inference Engine (Drools)

List of

Diabetic Patients

Service for Creating Output (File, Database, etc)

Transform physical representation

 Normalized logical representation (Fact Model)



Diabetes project status
Diabetes Project Status

  • Diabetes Rules are Completed

  • Demonstrated the Workflow/Rules for Feedback

  • Make Rules “Shareable”

  • Performance Validation

  • More details in the later session!




Current htp project themes2
Current HTP Project Themes

  • Identification of Clinical Element Models

  • Phenotyping Execution Logic

  • Data Quality, Validation and Cost Effectiveness


Data quality objectives
Data Quality: Objectives

  • Assess Data variability within and across institutions

  • Assess impact of this variability on Secondary Use of EMR

  • Generate specifications for Widgets

    • “Warning Label” for suspect data categories

    • Data quality audits with logs

    • Batch data correction / removal

  • More details during the later session!


Centerphase Project

Research Design

Randomly generate ONE sample set of patient records from database:

Based on T2DM ICD9 codes from at least 2 visits during measurement period

Sample Patient Records

Manual

Process

Algorithm-Driven

Process

Study coordinator (SC) conducts manual review of patient charts, and monitors activity time

Programmer develops and runs algorithm to query records, and monitors development and run time

Screens 1 -3

Screens 1 -3

Patient

Result Set

Patient

Result Set

Compare time, cost and accuracy of results


Outline2
Outline

  • Background

  • On-going projects and updates

  • Proposed projects for Year 2

  • Productivity till date

  • Q & A


Project 1 national library for clinical phenotyping algorithms
Project 1: National Library for Clinical Phenotyping Algorithms

  • Current state of the art

    • MS Word files: do not scale

    • An FTP server: will not work either

    • We need…programmatic access, querying, navigation

    • Promote re-use (where applicable)

  • Research Question: To develop an implementation independent, phenotyping logic representation template for algorithm design

    • Existing work on Drools, GELLO and NQF

    • Leverage CEMs for algorithm design and representation

    • Publicly accessible Web-based environment for phenotyping algorithms

    • Validate algorithm deployment in multiple EMR settings


Project 2 machine learning and phenotyping
Project 2: Machine Learning and Phenotyping

  • EMR-derived phenotyping algorithm development is tedious, and time-consuming

    • Based on our own experience!

  • Research Question: To leverage machine learning methods for rule/algorithm development, and validate against expert developed ones

    • Use eMERGE library of phenotype algorithms for validation

    • Asthma and Diabetes as initial use-cases

  • Preliminary work by Susan

    • Work with data normalization and NLP teams


Project 3 just in time phenotyping
Project 3: Just-in-Time Phenotyping

  • The current pipeline prototype is based on a relational persistence layer

    • Access to historical, retrospective data

    • Offline processing of data and phenotyping algorithms

  • Research Question: To to apply phenotyping algorithms as “data sniffers” that can be plugged within an UIMA pipeline

    • Online, real-time phenotyping (e.g., for clinical decision support)

    • How much data is “necessary”? How much data is “necessary and sufficient”?

    • More active role of NLP techniques


Project 4 phenotyping workbench
Project 4: Phenotyping Workbench

  • EMR-based phenotyping algorithms are hard to design, and even harder to implement

    • Access to domain experts—often a resource issue

    • Access to IT/informatics experts—also, a resource issue

    • Lot of moving components

  • Research Question: To develop a phenotyping “plug & play” workbench for algorithm design and evaluation

    • Visual and graphical algorithm editing (jPBMN)

    • Configurable algorithms (Drools code snippets)

    • User workspace management (who are these “users”?)

    • File-based or database access layer (CEM-based)

    • Leverage i2b2 workbench where applicable

    • “Plug & Play” is still a big challenge…


Outline3
Outline

  • Background

  • On-going projects and updates

  • Proposed projects for Year 2

  • Productivity till date

  • Q & A


Productivity till date
Productivity till date

  • Manuscripts/Abstracts/Posters

    • Conway MA, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, Linneman JG, Pacheco JA, Pessig PL, Rasmussen L, Weston N, Chute CG, Pathak J. Analyzing Heterogeneity and Complexity of Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (paper).

    • Tao C, Parker CG, Oniki TA, Pathak J, Huff SM, Chute CG. An OWL Meta-Ontology for Representing the Clinical Element Model. AMIA 2011 (paper).

    • Chute CG, Pathak J, Savova GK, Bailey KR, Schor MI, Hart LA, Beebe CE, Huff SM. The SHARPn Project on Secondary Use of Electronic Medical Record Data: Progress, Plans and Possibilities. AMIA 2011 (paper).

    • Conway MA, Pathak J. Analyzing the Prevalence of Hedges in Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (poster).

    • Tao C, Welch SR, Wei WQ, Oniki TA, Parker CA, Pathak J, Huff SM, Chute CG. Normalized Representation of Data Elements for Phenotype Cohort Identification in Electronic Health Record. AMIA 2011 (poster).

  • Prototype software

    • Drools-based implementation of the diabetes algorithm



ad