1 / 17

Amit Satsangi amit@cs.ualberta

Amit Satsangi amit@cs.ualberta.ca. Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006). Focus. Ontology for describing age-related macular degeneration (AMD)

Download Presentation

Amit Satsangi amit@cs.ualberta

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amit Satsangi amit@cs.ualberta.ca Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology AcquisitionInniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006) CMPUT 605

  2. CMPUT 605 Focus • Ontology for describing age-related macular degeneration (AMD) • Comparison of the accuracy of three methods for Ontology – Natural Language Processing (NLP) – Text Mining (SAS Text Miner) – Human Expert • Manual and adhoc knowledge acquisition • IDOCS (Intelligent Distributed Ontology Consensus System)

  3. CMPUT 605 Introduction • No existing common and standardized vocabulary for classification of disease types for certain eye-diseases • Clinicians, dispersed geographically, may use different terms to describe the same condition • Research aimed at extracting the feature and attribute descriptions for the vocabulary of AMD, and build an Ontology from that.

  4. CMPUT 605 Related Work • Lot of research done, since 1990’s, for applying NLP techniques in medicine, bio-medicine etc. • NLP & Text Data Mining have been recognized to play an important role in this endeavor • Research focused on online repositories such as Medline & PubMed • NLP systems developed: MedLee, UMLS, GENIES etc.

  5. CMPUT 605 IDOCS

  6. CMPUT 605 Methodology • Four clinical experts in retinal diseases enlisted to view 100 eye sample images of AMD • Experts in different geographic locations • Described the observations using digital voice recorders – no artificially imposed vocabulary constraints • Another retinal expert for manual parsing of the transcribed text – extracting key words, organization of key-words into categories etc.

  7. CMPUT 605 Results: Human Experts

  8. CMPUT 605 Methodology: NLP • NLP: Used for information extraction and automatic summarization. • Identify short sequences of words having meaning over and above a meaning composed directly from their parts – “extreme programming” • Ngram Statistics Package (NSP) used for collocation discovery in case of bi-grams • Word-pair associations measured by PMI

  9. CMPUT 605 Methodology: NLP • Large PMI for larger degree of association between the words

  10. CMPUT 605 Results: NLP

  11. CMPUT 605 Methodology:Text Mining (SAS Text Miner) • Collection of documents (corpus) used as input to any text mining algorithm • Corpus broken into tokens or terms (tokens in a particular language) • Term weighting Measures: Entropy, Inverse Document Frequency (IDF), Global Frequency (GF) -IDF, None (Global weight of 1) & Normal term wt.

  12. CMPUT 605 Results: Text Miner • Frequency wt. None • Term wt. Normal

  13. CMPUT 605 Common Terms • sss

  14. CMPUT 605 Comparison • Thus text mining is a viable and effective method for determining vocabulary to describe a particular disease • Text Mining found a lot of terms that NLP found • Human Expert is the best Ground Truth

  15. CMPUT 605 Ontology Generation

  16. CMPUT 605 Conclusion and Future Work • Human experts are the best, but they did miss some key descriptors • Text Mining and NLP can enhance the generation of feature generations, by preventing the above case • As a consequence more robust vocabulary can be generated • Extension – evaluate the effectiveness of the automated tools, text mining & NLP • Different weighting schemes to be tried in the future

  17. Thank You For Your Attention! CMPUT 605

More Related