Nlp for health informatics text mining patient records
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

NLP for Health Informatics: text-mining patient records PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on
  • Presentation posted in: General

School of Computing FACULTY OF ENGINEERING. NLP for Health Informatics: text-mining patient records. SNOMED CT based semantic tagging of medical narratives Verbal Autopsy corpus for Machine Learning of Cause of Death E-Health GATEway to the Clouds

Download Presentation

NLP for Health Informatics: text-mining patient records

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nlp for health informatics text mining patient records

School of Computing

FACULTY OF ENGINEERING

NLP for Health Informatics: text-mining patient records

SNOMED CT based semantic tagging of medical narratives

Verbal Autopsy corpus for Machine Learning of Cause of Death

E-Health GATEway to the Clouds

SamanHina, Sammy Danso, Eric Atwell, Owen Johnson

Natural Language Processing Group


Nlp for health informatics text mining patient records

School of Computing

FACULTY OF ENGINEERING

SNOMED CT based semantic tagging

of medical narratives

---------------------------------------------------------------------

  • Research Objective

  • Key Challenges

  • Resources

  • Methods

    1. Baseline Application

    2. SNOMED CT Rule-based semantic tagger

  • Results

  • Conclusion and Future Work


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Sample Text

Output


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Research Objective

---------------------------------------------------------------------

To design a novel approach for extraction of semantic information from unstructured medical narratives.

The underlying research hypothesis is that it is possible to annotate natural language medical narratives with high accuracy using SNOMED CT healthcare data standard.

Healthcare Data standards

  • Secure

  • Consistent

  • Authentic sharing among healthcare users with codes.


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Key Challenges

---------------------------------------------------------------------

  • Clinicians have different ways of expressing one single medical term and do not follow language of healthcare data standards which is a challenge in extracting domain knowledge.

  • Not having domain expert.

  • Use of synonyms, abbreviations , paraphrasing the concepts and different preferred names of a concept increases the complexity of the current research challenge.

  • Different patterns of section headers, capitalization of words and content.


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Corpus

-----------------------------------------------------------------------

  • Corpus from the fourth i2b2/VA 2010 challenge.

  • Contains discharge summaries and progress notes from four healthcare partners.


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Data Standard

------------------------------------------------------------------

SNOMED-CT(Systematized Nomenclature of Medicine Clinical Terms) a comprehensive international data standard for clinical terminology.

  • Number of Concepts from SNOMED CT : 356,432

  • 16 out of 31Semantic types from SNOMED CT have been used to develop SNOMED CT semantic tagger.

  • Attribute9. Person

  • Body Structure 10. Physical Object

  • Disorder11. Procedure

  • Environment12. Product Or Substance

  • Findings 13. Qualifier Value

  • Observable Entity 14. Record Artifact

  • Occupation 15. Regime/ Therapy

  • Organism16. Situation


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Annotation scheme for Gold Standard Corpus

--------------------------------------------------------------------

  • Pre annotation of corpus using SNOMED CT dictionary application (Baseline system).

  • Reviewing the corpus manually and mark the remaining concepts considering the following language issues; Synonyms, abbreviations, incomplete concepts, paraphrase of concepts and concepts under section headings.

  • Concepts which are not identified correctly should be removed.

  • In case of non domain user, NCBO bioportal annotator will be used to annotate the gold standard corpus by searching the key words and bigrams of the possible concepts.


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

  • Concepts should be marked up to three levels of granularity.

  • Agreement of gold standard is more than 90 %.


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Baseline System - SNOMED CT Dictionary Application

-----------------------------------------------------------------------

  • Basic language processing (Tokenize, Sentence Splitting, POS tagging)

  • Concepts have been tagged automatically (Dictionary, Lookup).

  • SNOMED CT knowledge base was developed by constructing separate dictionaries of 16 semantic types.

  • 6 out of 16 tags performed well with dictionary application.

    1. Disorder4. Record Artifact

    2. Observable Entity 5. Regime/Therapy

    3. Person6. Situation


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Optimization of SNOMED CT Knowledgebase

---------------------------------------------------------------

  • Optimizing the concepts in SNOMED CT semantic types to write general rules for semantic tagger.

  • Optimization process reduce the size of knowledge base by removing un necessary and ambiguous information.

    Entire lung -> Lung

    Ear NOS -> Ear

  • Long multiword concepts have been transformed into individual concepts to solve paraphrasing problem.

    Radiography of chest and lung -> 1. Radiography 2. Chest 3.lung


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Rule-based SNOMED CT Semantic tagger

---------------------------------------------------------------

This application use the optimized SNOMED CT dictionary as knowledgebase.

Rules

Documents containing narratives

Colour coded SNOMED CT Semantic types

Extracting concepts and plural concepts

Tokenizer

Sentence Splitter

Part Of Speech Tagger

Morphological Analyzer

SNOMED CT knowledge base


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING


Nlp for health informatics text mining patient records

PhD Annual Symposium-2011

School of Computing

FACULTY OF ENGINEERING

Conclusion and Future Work

---------------------------------------------------------------

  • Corpus containing long multiword concepts has been automatically extracted and tagged with 10 out 16 SNOMED semantic types.

  • Annotation of unseen test corpus will be completed by domain users to test SNOMED CT semantic tagger.

  • Optimization of the remaining SNOMED CT semantic types to construct general rules.

  • Corpus annotations will be contributed to the users through i2b2 organizers.


Nlp for health informatics text mining patient records

Samuel Danso1,3, Eric Atwell1, Owen Johnson1,  Guus ten Asbroek2,  Seyi Soromekun2, Karen Edmond2, Chris Hurt4, Lisa Hurt2, Charles Zandoh3, Charlotte Tawiah3, Zelee Hill2, Justin Fenty2, SeebaAmenga Etego3, Seth Owusu Agyei2,3, and Betty R Kirkwood2.

1 University of Leeds

2 London School of Hygiene and Tropical Medicine

3 Kintampo Health Research Centre, Ghana

4 University of Cardiff

Presented By

Samuel Danso

PhD Student - NLP Research Group, University of Leeds

[email protected]

21st July 2011


Nlp for health informatics text mining patient records

Causes of Death Information – The global picture

About 57 million people die each year

Cause of Death Information is vitally important to health planners and policy makers at all levels.

How do we find out the 67% ?

Verbal Autopsy


Nlp for health informatics text mining patient records

Use of CoD Information

Source Byass et al, 2007


Nlp for health informatics text mining patient records

  • basically, a narrative of an account of an incident that led to the death of a person.

  • An idea from the 17th Century used in the UK and other developed countries. Now recommended by WHO as the standard approach used in the developing countries.


Nlp for health informatics text mining patient records

Sample of VA Data for Infant Death I

Coded part


Nlp for health informatics text mining patient records

Sample of VA Data for Infant Death II

free text part


Nlp for health informatics text mining patient records

Characteristics of corpus


Nlp for health informatics text mining patient records

  • Data sparseness and imbalance - 46 categories


Nlp for health informatics text mining patient records

Characteristics of corpus: free text

Some Statistics


Nlp for health informatics text mining patient records

Characteristics of corpus: free text

  • Spelling and grammatical mistakes posing parsing problems

Misspellings

Grammatical error

Inappropriate use of punctuation marks

“WHEN THE CHILD WAS SIXTEEN (16) DAYS OLD SHE FELL SICK WHICH LAUTED FOR THREE (3) DAYS BEFORE SHE DIED. THE CHILD HAVING DIFFICULT BREATING. ANY TIME, SHE BREATHS, YOU SEE A HOLE IN THE CHEST, AND ALSO MAKING NOISE IN THE CHEST. SHE HAD CONVULSION WHEN SHE WAS SEVERTEEN (17) DAYS OLD BEFORE SHE DIED THE FOLLOWING DAY. SHE ALSO HAD A BULGING FONTENED AND SEVERE HOT BODY WHICH LASTED FOR TWO (2) DAYS BEFORE SHE DIED. THE CHILD ALSO HAD A FIT WHICH SHE COULD NOT OPEN HER MOUTH.”


Nlp for health informatics text mining patient records

  • Different ways of expressing the same concept.

    • Baby came out

    • Baby landed

    • Gave birth

  • Local words

    • ‘afam’

    • ‘bentoa’

  • Abbreviations

    • ANC = Antenatal Clinic

    • TBA = Traditional Birth Antendant

  • Fuzzy expression of clinical concepts. Sometimes no clinical concept expressed at all. (..”I visited Kintampo hospital on Tuesday and was given one drip of water. ..”)

  • Delivery


    Nlp for health informatics text mining patient records

    • Missing values (-)

    • 215 variables

    • Entries are coded

      • sex = 1, 2, 8 or 9

      • Weight= 1.45, 9.99 or 8.88

    • Continues revision of questionnaire resulting in blank values for some variables


    Nlp for health informatics text mining patient records

    Results: 46 categories - combined dataset

    % Accuracy of correctly classified instances


    Nlp for health informatics text mining patient records

    Results: 6 categories – combined dataset


    Nlp for health informatics text mining patient records

    Results: 46 categories – time of death


    Nlp for health informatics text mining patient records

    Results: 6 categories – time of death


    Nlp for health informatics text mining patient records

    Discussion and Conclusion

    • Key lessons

      • CRISP-DM is the appropriate methodology for this project

      • It is feasible to use machine learning techniques to classify CoD in Verbal Autopsies

      • Split of dataset by clinical definitions into homogenous sets improves classifier performance

      • Classification at top level of hierarchy of CoD could lead to increase in performance across classifiers due to number of classes (46 to 6) and instances per class.


    Nlp for health informatics text mining patient records

    Discussion and Conclusion

    Other Uses of corpus?


    E health gateway to the clouds

    e-Health GATEway to the Clouds

    http://www.comp.leeds.ac.uk/nlp/e-health

    • WP1: Clouds on the White Rose Grid VRE

    • Deliverables: A secure cloud-based VRE on the White Rose Grid (Month 2), e-health records from TPP stored (Month 3), access and research support tools (Month 3). Iterative refinement (Month 3-5).

    • WP2: GATEway component

    • Deliverables: A GATE plug-in module capable of securely pseudonymising the free text elements of the example e-health records (Month 3). Iterative refinement (Month 3-5).

    • WP3: Evaluation and Sustainability

    • Deliverables: Evaluation of WP1 and WP2 combined into a cohesive e-health VRE (Month 5), sustainability plan (Month 4), dissemination as a case study (mid Month 5), hand-over to ongoing support by YCHI (Month 6)


    We welcome e health msc phd projects

    We welcome e-Health MSc / PhD Projects

    School of Computing

    FACULTY OF ENGINEERING

    SNOMED CT based semantic tagging of medical narratives

    Verbal Autopsy corpus for Machine Learning of Cause of Death

    E-Health GATEway to the Clouds

    SamanHina, Sammy Danso, Eric Atwell, Owen Johnson

    Natural Language Processing Group


  • Login