Using natural language processing to enhance EMRs for healthcare quality research Brian Hazlehurst, PhD Center for Health Research Kaiser Permanente Northwest
EMR promises to improve healthcare quality • Providing distributed access to patient data • Supporting real-time clinical decision-making • Enabling proactive outreach to chronically ill patients • Detecting and preventing adverse events caused by medical care • Assessing the care actually delivered to guide improvement of delivery processes
Creating comprehensive Quality of Care Measures • Comprehensive care quality means simultaneous application of evidence-based care guidelines across multiple, disparate care processes. • The EMR could make possible routine, comprehensive assessment of care eliminating sampling, surveying, and manual review of charts.
McGlynn et al (RAND) study NEJM, June 2003 • Developed and applied 439 quality measures to comprehensively“score” care quality from paper records. • Condition (30 conditions) • Type of care (Acute, Chronic, Prev) • Function of care (Dx, Tx, Screen, F/U) • Mode of care (Hx, Counsel, Lab, Med, Immun., Surg., Physical exam) • Manually reviewed medical records for ~7,000 participants recruited in 12 metropolitan regions of the US. • 50 min per participant, 20 nurses ~2 months
McGlynn et al Conclusions • On average, Americans receive about half of recommended medical care processes. • A key component of any solution to this state of affairs is the routine availability of information on care delivery performance at all levels.
Where will these data come from? • Jennifer Hicks dissertation: analysis of the electronic data needed to construct the RAND QA measures • Using electronicclaims data alone, only 34% of the measures can be obtained codes for billable services (includes diagnosis codes, procedure/lab orders, basic demographic information).
What’s missing? Clinically detailed information • Severity of a condition • Timing or results of procedure or lab • History • Counseling/education • Signs/symptoms • Physical examination
What does additional standard coded clinical data provide? • Four additional types of coded information considered by Hicks as possible “add-on” to claims data. • Lab results • Procedure results • Vital signs • Signs/symptoms • Computed additional coverage of the RAND QA measures using “optimistic estimates” (i.e., a necessary code may not actually exist or may be unreliably captured).
Additional coverage (upper bound) of standardized coded data • Coverage goes from about 34% to about 47% • The remainder is found in local codes or the templated- or free-text clinical notes of the EMR!
MediClass (Medical Classifier) • Potential to process records of any EMR • Use a standard representation for electronic medical record data (Clinical Document Architecture, CDA) • Potential to utilize any type of data captured in the EMR • Process both text and coded data in the EMR • Potential to implement any specific care quality measure • Allow for modular definition of quality measures (classifications determined by plug-in “knowledge modules”)
EMR data 1) EMR Integration 2) Concept Indentification Knowledge Module (KM) 3) Event Classification MediClass Architecture Produces CDA representation of clinical encounter Unified Medical Language System (UMLS) Marks up the CDA document with identified clinical concepts Classification Rules Marks up the CDA document with classifications or identified clinical events Classification results
Developing a “5 A’s” Knowledge ModuleIdentifying delivery of the 5A's of smoking cessation
Concept Identification Smoking (C0037369) Location-Info Modifier-Info smoking (MeSH) smoking (SNOMED) ….. Matched UMLS Strings smoking patiently patient continues smoke smoke patiently continually continue smoking ….. Candidate Strings Segmenting Window continue continuous continually continuity …. smoke (v) smoky (adj) smoking smoker …. Lexical Parsing patient continues to smoke 1/2ppd. not ready to quit. Matched UMLS Concept patient (n) patient (adj) patience patiently ….
Working Memory Continue, C0750536 SmokingIndicator rule fires Apply rules Continue, C0750536 SmokingStatus rule fires SmokingIndicator, IC1 Apply rules Continue, C0750536 SmokingIndicator, IC1 SmokingASK rule fires Apply rules Smoking Status, IC2 Continue, C0750536 SmokingIndicator, IC1 Smoking Status, IC2 Classification Process Beginning concept instances created by concept identification layer Smoking, C0037369 Working Memory Add “SmokingIndicator” intermediate concept to working memory Smoking, C0037369 Working Memory Add “SmokingStatus” intermediate concept to working memory Smoking, C0037369 Working Memory “SmokingASK” is a terminal state (one of the 5 A’s) so procedure halts Smoking, C0037369 SmokingASK
How well does it work? • Sample of primary care encounter records for smokers at each of 4 HMO's (n= 4x125 =500). • Gold Standard created by 4 trained abstractors (one abstractor from each study site, majority wins, ties broken by the trainer).
Another way to evaluate Compare the mean inter-rater agreement for all coders incl. MediClass (“+MC Agreement”) and for human coders alone (“-MC Agreement”).
In office discussion Lit/info assist. Referral counsel Pharmaco assist. What is the relative coverage of EMR data types for addressing the quality of smoking cessation care?
We have developed MediClass KM’s in several domains • Measuring delivery of the 5 A’s of smoking cessation in primary care • Detecting adverse vaccination reactions • Identifying severity of retinopathy in diabetic patients • Identifying family history of cancer • Assessing delivery of counseling and services to obese and overweight patients
Summary • The EMR holds promise for accelerating quality improvement, but standard coding schemes and practices alone do not afford comprehensive and routine quality assessment. • Even if structured entry forms/codes are developed for capturing specific data elements, • these forms/codes are often underutilized (create extra steps in clinical workflow) • insufficient for research purposes (don’t represent latest knowledge or research interest). • NLP technologies can be used to automatically classify the contents of medical records and may be able to overcome these challenges.
Collaborators • Victor Stevens, KP Northwest • Dean Sittig, KP Northwest • Jack Hollis, KP Northwest • Russ Glasgow, KP Colorado • Ted Palen, KP Colorado • Tom Vogt, KP Hawaii • Nancy Rigotti, Harvard Pilgram • Jonathan Winicoff, Harvard Pilgram