1 / 39

Agenda

Agenda. Introduce institution, research group, and project Outline automated information extraction problem Solution Using open frameworks Open ontology resources Current Status Domain experts engagement Conclusions. University of California, Irvine Medical Center.

rusty
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda • Introduce institution, research group, and project • Outline automated information extraction problem • Solution • Using open frameworks • Open ontology resources • Current Status • Domain experts engagement • Conclusions

  2. University of California, Irvine Medical Center University of California, Irvine Medical Center is a 422-bed tertiary teaching hospital with a commitment to education, research and quality patient care. UCI Medical Center is a Magnet Designated facility with a Level 1 Trauma Center, Burn Center and Level II Neonatal Care Center. • Not-for-Profit • # Employees • # ER Visits • # Admissions

  3. Data Warehouse Profile • UCI Clinical Informatics Team • Director • Informatics Solutions Architect • Principal Statistician/Advisor • Informatics Outreach Architect (future) • Clinical Practice Engineer • Clinical Research Informatics Lead • Business Intelligence Developer (2) • Clinical Informatics Specialist • NLP Specialist

  4. Project Team • Supported by UCI Medical Center Clinical Informatics Department • Collaboration between UCI Medical Center (Clinical Informatics) and Calit2/Computer Science • Members • Naveen Ashish (NLP and CS Researcher) • Lisa Dahm (Director, Biomedical Informatics) • Charles Boicey (Informatics Architect)

  5. Vision Analysis UCI QUP Quest Text Reports

  6. (UCI) Pathology Report

  7. Reports • Pathology reports • Free text but “semi-structured” as well • Nuggets of information in the text

  8. What do we want to ask ? • Sample (retrieval) “questions” • Surgical Pathology • Patients with a surgical pathology report containing undifferentiated lymphoepithelioma-like gastric carcinoma. • Patients with a surgical pathology report containing spindle cell carcinoma of the breast, grade 3, margin(s) positive, node(s) positive. • Discharge Note • Patients with a discharge note containing a diagnosis of cerebrovascular accident and diabetes mellitus type II discharged in stable condition to home. • Female patients with a discharge diagnosis of Ewing sarcoma, hypertension and obesity discharged in stable condition to home.

  9. In Text • Thus we need • Sections and sub-sections • Associations • Terms • Dimensions • … FINAL DIAGNOSIS AFTER MICROSCOPY: LUNG, LEFT LOWER LOBE, WEDGE RESECTION: POORLY DIFFERENTIATED ADENOCARCINOMA OF PULMONARY ORIGIN SIZE: 1.5 CM STAPLED RESECTION MARGIN: NEGATIVE 5 NECROSIS EXTENSIVE FIBROSIS IS NOT PRESENT PLEASE SEE COMMENT FINAL DIAGNOSIS AFTER MICROSCOPY: A. DEEP TRICEPS MARGIN, EXCISION: POSITIVE FOR SARCOMA B. LATERAL SUPERIOR MARGIN, EXCISION: POSITIVE FOR SARCOMA

  10. System OHNLP UCI-QUP Application (Rules, Code) Database (warehouse) GUI, Tableau, i2b2 Unstructured Structured Analysis

  11. Related Work Computerized Extraction of Information on the Quality of Diabetes Care from Free Text in Electronic Patient Records of General Practitioners. Jaco Voorham,Petra Denig. JAMIA 2007;14:349-354 doi:10.1197/jamia.M2128 Application of information technology: MedEx: a medication information extraction system for clinical narratives. Hua Xu, Shane P Stenner,Son Doan,Kevin B Johnson,Lemuel R Waitman,Joshua C Denny. JAMIA 2010;17:19-24 doi:10.1197/jamia.M3378 Identifying Smokers with a Medical Extraction System. Cheryl Clark,Kathleen Good,Lesley Jezierny,Melissa Macpherson,Brian Wilson,Urszula Chajewska. JAMIA 2008;15:36-39 doi:10.1197/jamia.M2442 Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE). Jung-Hsien Chiang, Jou-Wei Lin, Chen-Wei Yang. JAMIA 2010;17:245-252 doi:10.1136/jamia.2009.000182 Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes. Alexander Turchin, Nikheel S Kolatkar,Richard W Grant,Eric C Makhni,Merri L Pendergrass, Jonathan S Einbinder. JAMIA 2006;13:691-695 doi:10.1197/jamia.M2078 Natural Language Processing Framework to Assess Clinical Conditions. Henry Ware, Charles J Mullett,V Jagannathan JAMIA 2009;16:585-589 doi:10.1197/jamia.M3091 A General Natural-language Text Processor for Clinical Radiology. Carol Friedman,Philip O Alderson, John H M Austin, James J Cimino,Stephen B Johnson.JAMIA 1994;1:161-174 doi:10.1136/jamia.1994.95236146 Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon. Yang Huang,Henry J Lowe, Dan Klein,Russell J Cucina .JAMIA 2005;12:275-285 doi:10.1197/jamia.M1695 Automated Encoding of Clinical Documents Based on Natural Language Processing. Carol Friedman, Lyudmila Shagina,Yves Lussier, George Hripcsak. JAMIA 2004;11:392-402 doi:10.1197/jamia.M1552 Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical Data. Lois C Childs, Robert Enelow,Lone Simonsen, Norris H Heintzelman,Kimberly M Kowalski,Robert J Taylor. JAMIA 2009;16:571-575 doi:10.1197/jamia.M3083 Automated Detection of Adverse Events Using Natural Language Processing of Discharge Summaries. Genevieve B Melton,George Hripcsak. JAMIA 2005;12:448-457 doi:10.1197/jamia.M1794

  12. Related Work Analysis Quality of care, Smoker status, Adverse events Other diagnoses … Processing Extract blood pressure, Medications, … Documents Identify noun phrases. Section (headings) Numerical values, Negations, … Discharge summaries, Patient notes, EMR sections, Path or Radiology reports …

  13. Systems • http://zellig.cpmc.columbia.edu/medlee/ • http://incubator.apache.org/uima/ • http://gate.ac.uk/ • http://nlp.stanford.edu/software/lex-parser.shtml • Columbia • Carol Friedman (et al.,) • MedLee • “Black art” • Systems from Defense, Intelligence etc., companies • Open Software and Tools • Medical Informatics • OHNLP • Open Health Natural Language Processing • IBM, MayoClinic, (NCI) • General • UIMA, GATE • Variety of lexical tools, named-entity recognizers, parsers etc., • XAR

  14. Extraction Techniques • What do we employ to achieve automated extraction ? • Broad paradigms • Rule driven (expert) • Machine-learning based (trained) • Combined (most recent systems) • Multiple levels • Semi-structured data extraction • Named entity extraction • POS tagging, NE identification • Ontology driven (domain terms) • “Deep” relation level extraction • Associations • Natural Language Parsing

  15. (ROOT (S (NP (NP (NNP Tissue)) (PP (IN between) (NP (DT the) (CD two) (JJ surgical) (NNS clips)))) (VP (VBZ contains) (NP (NP (NNS foci)) (PP (IN of) (NP (NP (JJ ductal) (NN carcinoma)) (ADJP (FW in) (FW situ))))) (PP (PP (IN within) (NP (DT a) (NN papilloma))) (, ,) (CONJP (RB as) (RB well) (IN as)) (PP (IN within) (NP (NNS ducts))))) (. .))) NL Parse Illustration • “Tissue between the two surgical clips contains foci of ductal carcinoma in situ within a papilloma, as well as within ducts.” nsubj(contains-7, Tissue-1) det(clips-6, the-3) num(clips-6, two-4) amod(clips-6, surgical-5) prep_between(Tissue-1, clips-6) dobj(contains-7, foci-8) amod(carcinoma-11, ductal-10) prep_of(foci-8, carcinoma-11) amod(carcinoma-11, in-12) dep(in-12, situ-13) det(papilloma-16, a-15) prep_within(contains-7, papilloma-16) prep_within(contains-7, ducts-22) conj_and(papilloma-16, ducts-22)

  16. MedLee Illustration

  17. OHNLP • OHNLP • Open Health Natural Language Processing Consortium • IBM and MayoClinic are founding partners • caBIG/NCI supported • Open-source consortium promoting the use of UIMA • Features • Built upon Apache UIMA • Annotators, Pipelines • Medical domain • MedKAT/P (IBM) • Pathology reports extraction • cTAKES (Mayo Clinic) • Clinical data

  18. Rationale for OHNLP • Based on UIMA • Open source • Community of developers • OHNLP itself • NCI • IBM, Mayo • MedKAT/P and cTakes • Two way benefits • Adopt • Contribute back

  19. MedKAT Annotations

  20. “Programming” UIMA Primitive Engine (section headings) Primitive Engine (numerical) Primitive Engine (dict terms) • OHNLP based on UIMA • UIMA composed of “Analysis Engines” • Primitive • Aggregate

  21. Descriptors, Resources

  22. Analysis Engines • Developing “UCI-QUP” • UCIQuestUimaPipeline • Analysis Engines • Recognize sections and sub-sections • Regular expressions • Significant terms • Medical terms • Existing dictionary in MedKAT/P • Useful, not complete • Integrate additional terminology

  23. AE Terms • Good resources • NCI Thesaurus • Cancer related • > 500,000 terms/concepts • NCI Metathesaurus • Several million concepts • Developed • Converter • NCI Thesaurus  UIMA Dictionary Resource • Application • Database • MySQL

  24. Architecture Knowledge Sources Extracted Data (Structured) OHNLP Pathology Reports (Unstructured) UCI Quest Uima Pipeline

  25. UMLS and Metathesaurus

  26. UMLS • UMLS • Obtained system from NLM • Installed successfully on informatics-nlp • Features • Browse concepts and relationships • Flat files • DB import • Being integrated into UCI-QUP

  27. OurContribution to OHNLP • We indeed adopted • Framework • Relevant “resources” • Contribute to overall OHNLP effort • Specific analysis engines • Sections and sub-sections in pathology reports • Significant items • Dictionary terms (UMLS integration) • … • Contribute as a project back to OHNLP

  28. Database Schema

  29. Demo

  30. SQL Queries SELECT reportid FROM collection WHERE (sectioncontent like ‘%carcinomia%’) AND (heading like ‘%tumor%) • Example (possible) queries

  31. Interfaces i2b2 Tableau

  32. Guide For Fields Specimen (Note A) ___ Partial breast ___ Total breast (including nipple and skin) ___ Other (specify): ____________________________ ___ Not specified Procedure (Note A) ___ Excision without wire-guided localization ___ Excision with wire-guided localization ___ Total mastectomy (including nipple and skin) ___ Other (specify): ____________________________ ___ Not specified Lymph Node Sampling (select all that apply) (Note B) ___ No lymph nodes present ___ Sentinel lymph node(s) ___ Axillary dissection (partial or complete dissection) ___ Lymph nodes present within the breast specimen (ie, intramammary lymph nodes) ___ Other lymph nodes (eg, supraclavicular or location not identified) Specify location, if provided: _________________________ Specimen Integrity ___ Single intact specimen (margins can be evaluated) ___ Multiple designated specimens (eg, main excisions and identified margins) ___ Fragmented (margins cannot be evaluated with certainty) ___ Other (specify): __________________________________ Specimen Size (for excisions less than total mastectomy) • College of American Pathologists (CAP) • Detailed protocols

  33. Implications Multiple specific extraction and distillation techniques Section, sub-section segmentation Term spotting Associations Negation (Absence) and Assertion (Presence) Dimensions Expressions Full NL Parse where required

  34. Current Status • System first version • Creation of database for data warehouse • QUEST “compliant” • Meta-thesaurus integration • Retrieval • SQL and UI • Tableau • i2b2 • Star schema

  35. Presentation Content Continued • Direction • Demonstrate value to researchers • CTSA Investigators • Lessons learned • Open source frameworks very useful ! • Reuse external solutions, resources • Our solutions can be adopted • Approach appears scalable • Domain expert engagement essential

  36. Content • What went well • UIMA and MedKAT choice • UMLS integration • What would you would do differently • Project is in early stage • Technical and framework choices seem right • Will learn more as we engage domain experts • What will provide value to investigators ?

  37. Summary Comprehensive approach to detailed information extraction from Pathology reports Exploiting open source and programmable frameworks (UIMA) Integration of UMLS Contribution of pipeline Engagement of domain experts

  38. Presenter(s) Contact Information • Contact information • NaveenAshish • ashish@ics.uci.edu • http://www.ics.uci.edu/~ashish

More Related