1 / 23

Extraction and Analysis of Information from Structured and Unstructured Clinical Records

Extraction and Analysis of Information from Structured and Unstructured Clinical Records. Richard Power Open University. Henk Harkema Andrea Setzer Ian Roberts Rob Gaizauskas Mark Hepple University of Sheffield. Jeremy Rogers University of Manchester. AHM 2005 Text Mining Workshop. 29/9/5.

lang
Download Presentation

Extraction and Analysis of Information from Structured and Unstructured Clinical Records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extraction and Analysis of Information from Structured and Unstructured Clinical Records Richard Power Open University Henk HarkemaAndrea SetzerIan RobertsRob GaizauskasMark Hepple University of Sheffield Jeremy Rogers University of Manchester AHM 2005 Text Mining Workshop 29/9/5

  2. Overview • Background • Information Extraction • Information Integration

  3. Background: CLEF • Clinical e-Science Framework • Objective: • To develop a high quality, secure and interoperable information repository, derived from operational electronic patient records to enable ethical and user-friendly access to patient information in support of clinical care and biomedical research • Duration, funding, participants: • 2003 – 2005 (CLEF), 2005 – 2007 (CLEF-Services) • Funded by Medical Research Council (MRC) • Six universities, Royal Marsden Hospital, industrial partners engaged through CLEF Industrial Forum Meetings

  4. Sheffield NLP & CLEF • Information Extraction • Analyzing clinical narratives to extract medically relevantentities and events, and their properties and relationships • Information Importation • Importing extracted information into the CLEF repository • Information Integration • Combining extracted information with structured information(i.e., non-narrative data) already in repository in order to build summary of patient’s conditions and treatment over time

  5. Medical IE • Standard Information Extraction tasks: • Entity/event extraction & relationship extraction • Additional challenges: • Cross-document event co-reference • Same event mentioned in multiple documents; many documentsprovide only partial descriptions of events • Modality of Information • Negation: “I cannot feel any lump in her right supraclavicular fossa” • Uncertainty: “I just wonder if there is an outside possibility that she might have mediastinal fibrosis to account for her symptomology” • Temporality of Information

  6. Entities, Events & Relationships • Entities, events: • Problem: melanoma, swelling, … • Present/absent • Clinical course: getting worse, getting better, no change • Intervention: amputation, chemotherapy, … • Status: planned, booked, started, completed, … • Investigation: CT scan, ultrasound, … • Status: planned, booked, started, completed, … • Goal: treat, cure, palliate • Drug: Atenolol, antibiotics, … • Locus: abdomen, blood, … • Laterality: left, right

  7. Entities, Events & Relationships • Relationships: • Location of problem: problem  locus • hip pain • lesions in her liver • Finding of investigation: investigation  problem • An ECG examination revealed atrial fibrillation • CT scan of her thorax and abdomen shows progressive disease • Target of intervention: intervention  locus • radiotherapy to back • breast radiotherapy • Further relationships

  8. IE Approach • Pipeline of processing modules • Pre-processing: • Tokenization, sentence splitting • Lexical & terminological processing: • Morphological analysis, term look-up, term parsing • Syntactic & semantic processing: • Sentence-based syntactic, semantic analysis • Discourse processing & IE pattern application: • Integration of semantic representations into discourse model • Application of patterns to collect information to be extracted

  9. Terminology Processing • Termino: a large-scale terminological resource to support term processing for information extraction, retrieval, and navigation • Termino contains a database holding large numbersof terms imported from various existing terminological resources, including UMLS • Efficient recognition of terms in text is achieved through use of finite state recognizers compiled from contents of database • The results of lexical look-up in Termino can feed into further term processing components, e.g., term parser

  10. location_np  latitude_adj area_noun latitude_adj: upper, middle, lower, mid, basal area_noun: zone, region, area, field, lung, lobe Termino Terminology Processing • Termino for CLEF • Imported 160,000 terms from UMLS drawn from semantictypes such as pharmacologic substances, anatomical structures, therapeutic procedures, diagnostic procedures, … • Term grammars • Rules for combining terms identified by term look-up in Termino into longer terms • Example: locations in the lung

  11. Information Extraction Patterns • IE patterns inspect syntactic and semanticanalyses and assert properties of entities and relationships between entities • Example: finding of investigation • “CT scan of her thorax showsprogressive disease” • IE pattern: invest_finding(I, P) if investigation(I), problem(P), show_event(S), lsubj(S, I), lobj(S, P).

  12. Information Extraction Patterns • Finding patterns • Hand-crafted patterns • “Redundancy” approach: • given a patient for whom a relationship between two particular entities is known to exist (e.g., we know patienthas a tumor in his lung), … • find all sentences in all notes of this patient that contain these two entities, … • and assume these sentences express the same relationship

  13. Information Integration • Combining structured information in repository with information extracted from narratives into coherent overview of patient’s condition and treatment over time • Issues in Information Integration: • Ambiguity: given an event extracted from a narrative, to whichevent in the structured data does it correspond? • Fragmentation & duplication: Information Extraction over narrative data produces collection of potentially fragmented and duplicated descriptions of medical events which need to be sorted out • Investigation of contribution of temporal information found within narratives to Information Integration

  14. Linking extracted and structured events • Reduce ambiguity through use of: • Medical information: type of event, relationships, … • Temporal information: time stamps, temporal expressions,verbal tense & aspect, … Events in narratives Chest X-RAY arranged for next week. 2000-05-16 The chest X-RAY performed … 2000-05-24 1 2 Type: MRI Location: abdomen Date: 2000-05-23 Type: X-RAY Location: chest Date: 2000-05-23 Type: X-RAY Location: chest Date: 2000-05-26 Type: X-RAY Location: chest Date: 2000-07-19 1 2 3 4 Events in structured data

  15. Constraint Satisfaction • Ambiguity reduction as a Constraint Satisfaction problem • Each narrative event is associated with a time domain, i.e., setof possible dates on which event could have taken place • Temporal and medical information extracted from narratives is formulated as set of constraints on time domain of narrative event • Use Constraint Logic Programming tools to resolve time domains of narrative events • If resolved time domain of narrative event contains date of structured event, link narrative event to structured event

  16. Evaluation • Evaluation of effectiveness of temporal constraints in Information Integration • Link each narrative event to set of potentially matching eventsof same type in structured data according to medical constraints • Measure how well application of temporal constraints narrowdown this initial set of “structured” candidates • We used a semi-automated pipeline to produce an idealised version of what a fully automatic system would provide as the input to the CSP component • Results must be viewed in the light of the idealised input

  17. Data and Gold Standard • Confined to investigation events • Patient notes of 5 patients analysed and annotated (large overhead of manual annotation) • 446 documents, of which 94 contain 152 investigation events • Manually created Gold Standard linking each narrative event to structured events of the same type, and correct targets

  18. Annotating Temporal Information • We annotate times, events (i.e., investigations)and temporal relations holding between these • The annotation scheme used is a subset of the TimeML annotation scheme • Example: We have arranged an MRI scan for next week. during

  19. Evaluation: Recall & Precision • We want to quantify the impact of using temporal constraints to reduce the ambiguity of mapping narrative events to structured events • Ideally, temporal constraints should greatly reduce ambiguity by eliminating incorrect candidates from the set of possible targets in structured data – but not eliminate the true target • Global evaluation measures: • Recall: proportion of correct targets recognised as possible targets • Precision: proportion of recognised possible targets that are correct • We applied both metrics before and after application of temporal constraints in CSP and compared the results

  20. Evaluation: Strict & Liberal Accuracy • The limitation of the Recall and Precision metrics is that they score for the overall data set – i.e. over all events for all 5 patients • If even only a small number of events retain a large number of possible targets, the overall precision score will be low even though most events are close to being correctly resolved • Consequently, we developed two “accuracy” based scores (liberal and strict), which quantify for each narrative event the extent to which it is correctly resolved, and then average across all narrative events • Liberal score for single event: 1 if at least one true target is correctly preserved, 0 otherwise • Strict score for single event: proportion of recognised possible targets that are correct

  21. Results

  22. Discussion • The results show that there is a substantial amount of ambiguity at the start, which is reduced by application of temporal constraints, as best shown by the strict accuracy score • A large degree of ambiguity remains, but … • Use of temporal information is conservative • E.g., a “past” narrative event is linked to all structured events dated before the date of the letter, but could heuristically be linked to the one structured event dated immediately before the date of the letter • We have not yet exploited additional medical information, e.g.,the locus of an investigation, nor additional temporal information, e.g., temporal relationships between events

  23. Conclusions & Future Work • Information Extraction • Essential functionality implemented • Extending coverage of system • Evaluating performance • Information Integration • Initial assessment of approach • Automating processing pipeline • Extending method to other events

More Related