1 / 17

Scott Duvall, Brett South, Stéphane Meystre

A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development of NLP Systems MedInfo 2010 Congress September 11, 2010. Scott Duvall, Brett South, Stéphane Meystre. NLP applied to Clinical Documents.

Download Presentation

Scott Duvall, Brett South, Stéphane Meystre

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hands-on Introduction to Natural Language Processing in HealthcareAnnotation as a Central Task for Development of NLP SystemsMedInfo 2010 Congress September 11, 2010 Scott Duvall, Brett South, Stéphane Meystre

  2. NLP applied to Clinical Documents • Detailed clinical information is often locked in free-text clinical note documents. • Applying Natural language Processing (NLP) methods to clinical free-text allows more detailed document level review • Clinical information can be used for DSS, Q/A, Research, Performance improvement, Surveillance etc… Annotation is a central task for evaluation of NLP systems used for Information Retrieval (IR) or Information Extraction (IE) tasks.

  3. Why and When to Annotate? • Annotation as a Central Task: • Manually annotated corpora focus and clarify NLP system requirements. • Establish reference standard(s) to train and evaluate NLP tools applied to clinical texts for various tasks. • Provide data for NLP systems development (supervised learning) • Extraction rules may be created automatically or by hand. • Statistical models of text documents built by machine learning algorithms.

  4. What to annotate and at what level? • Meta Information:specific document types, annotator, institution, clinic, authors, document type, author • Document level: Sections, headers, paragraphs, templates, or other document level assessments • Lexical: semantic categories of words. • Syntax:structures combined to produce sentences: • Words combine in well-defined structures POS (syntactic parse trees), grammatical level. • Semantics:meaning/interpretation combined • Individual word sense – combined to form meaningful sentences

  5. What to annotate and at what level? • Pragmatics:relies on clinical inference/context affects the interpretation of meaning • Domain, report section, location within the section of a report, or other implicit information. • Discourse:links between annotated instances, or across sentences • Previous information affects the interpretation of the current information. • Includes referents (pronouns, definite or bridging clauses, time of events, coherence of sentences) • World knowledge:facts about the world at large and/or common sense These levels may differ depending on the specific use case, the clinical question, and goals of application for NLP.

  6. Semantic annotation • Concepts (“markables”): types of information defined by the annotated instance level. • Use case dependent • Focus on noun phrases only? • Focus on specific semantic types (diagnoses, findings, treatments, procedures, etc…)? • Modifiers (“attributes”): information features • Negation, experiencer, temporality, certainty, change over time, severity, numeric values, anatomic locations, note section, modifiers, information quality, etc…?

  7. Some jargon • Annotation guideline: • Defines what qualifies as a “markable” for a given use case, how annotated instances should be identified, and how/what particular attributes are associated with annotated instances. • - In other words…the rules of the game so to speak defining what information will be used to train and evaluate the performance of the NLP system. • Annotation schema: • Provides a logical representation of the annotation guideline.

  8. Common annotation tasks Task 1 Task 2 Task 3

  9. What is measured at the task level? • Estimate of reliability (task consistency): • IAA = matches/(matches+nonmatches) • Partial (spans of annotated instances overlap) • Exact (spans of annotated instances match exactly) • Measurement of Validity (task accuracy): • Recall = TP/TP+FN, Precision TP/TP+FP, • F-measure = [(1+β2 )(PR)] / (β2 P + R) • These metrics will be discussed in more depth in Part 2

  10. Who should do annotation tasks? • Who: depends on use case, and annotation goals • For some use cases may need many annotators • Level of domain expertise (physicians, nurses, nurse practitioners, pharmacists, physician assistants, coders…and yes even graduate students). • Depends on the level of clinical inference required.

  11. A commonly used approach

  12. The annotation task? • Use Case: Focus on extracting as manyexplicitlymentioned diagnoses as possible from a collection of 75 discharge summaries selected from one of the i2b2 Challenge tasks. • Goals: • Illustrate level of difficulty involved with annotation tasks. • Demonstrate use of annotation guideline and schema to develop a reference standard. • Demonstrate calculation of evaluation metrics in terms of task consistency and accuracy (i.e. IAA, precision, recall F-measure).

  13. Workshop annotation task • The good news (things we built for you): • We don’t expect you to infer clinical diagnoses (no discourse or linking of concepts across sentences). • We have already developed an annotation guideline and schema for this task. • Diagnoses are loosely based on semantic types from the UMLS:

  14. Workshop annotation task • The bad news (or the challenge) : • One of the attributes we will identify is negation status. • e.g “No evidence of peripheral arterial disease”. • This task does have a certain level of difficulty, but will be a good demonstration of reference standard and practical application of NLP.

  15. Protégé/Knowtator • What tool will be used? • For annotation tasks we will use the Knowtator plugin written for the Protégé knowledge representation system. • Knowtator facilitates annotation and adjudication tasks. • A final reference standard has been created and will be available to participants. • Concepts (“markables”) are called “classes”. • Modifiers (“attributes”) are called “slots”.

  16. Hands-on component: • Install Protégé 3.3.1, Knowtator 1.9 available from: Protégé: http://protege.cim3.net/download/old-releases/3.3.1/basic Knowtator: http://knowtator.sourceforge.net • Review the annotation guideline and try using the Knowtator schema. • annotate the first 5 documents. • Don’t Panic! – Ask for help from any of the instructors, this is a hands-on exercise.

  17. Thank you for your attention! • For more information: • Brett.South@hsc.utah.edu • Shuying.Shen@hsc.utah.edu • Scott.Duvall@hsc.utah.edu • Stephane.Meystre@hsc.utah.edu • TA: Chris.Leng@utah.edu

More Related