1 / 23

Determining the Syntactic Structure of Medical Terms in Clinical Notes

Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu. Determining the Syntactic Structure of Medical Terms in Clinical Notes. The goal of this presentation is to present a simple but effective approach to identify the syntactic structure of three word terms. Goal.

ojal
Download Presentation

Determining the Syntactic Structure of Medical Terms in Clinical Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu Determining the Syntactic Structure of Medical Terms in Clinical Notes

  2. The goal of this presentation is to present a simple but effective approach to identify the syntactic structure of three word terms Goal

  3. Importance • Potentially improve the analysis of unrestricted medical text • Mapping of medical text to standardized terminologies • Unsupervised syntactic parsing

  4. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2 w3 w1w2 w3 blue = independence green = dependence

  5. small bowel obstruction Example

  6. small bowel obstruction Syntactic Structure of Example Monolithic Non-branching Left-branching Right-branching small bowel obstruction small bowel obstruction small bowel obstruction smallbowel obstruction

  7. The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur Method used to determine the structure of a term Probability of Term Occurring ----------------------------------- Expected Probability of Term

  8. The expected probability of a term is often based on the Non-branching (Independence) Model Log Likelihood Ratio OBSERVED PROBABILITY P(small bowel obstruction) ----------------------------------- P(small) P(bowel) P(obstruction) EXPECTED PROBABILITY

  9. The expected probabilities can be calculated using two other hypothesis (models) Extended Log Likelihood Ratio Non-branching Left-branching Right-branching P(small)P(bowel)P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

  10. Three Log Likelihood Ratio Equations Non-branching P(small bowel obstruction) ----------------------------------- P(small) P(bowel) P(obstruction) Right-branching Left-branching P(small bowel obstruction) ----------------------------------- P(small bowel) P(obstruction) P(small bowel obstruction) ----------------------------------- P(small) P(bowel obstruction)

  11. The expected probability of a term differs as does the Log Likelihood Ratio Expected Probability Non-branching Left-branching Right-branching P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction) LL = 5,169.81 LL = 8,532.90 LL = 11,635.45

  12. The model with the lowest Log Likelihood Ratio best describes the underlying structure of the term Model Fitting Non-branching Left-branching Right-branching P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction) LL = 5,169.81 LL = 8,532.90 LL = 11,635.45

  13. ReCap • The Log Likelihood Ratio is calculated for each possible model • Non-branching • Right-branching • Left-branching • The probabilities for each model are obtained from a corpus • The term is assigned the structure whose model has the lowest Log Likelihood Ratio

  14. Contains 708 three word terms from the SNOMED-CT Test Set Monolithic Non-branching Left-branching Right-branching 73 terms 6 terms 251 terms 378 terms

  15. Test Set (cont) • Syntactic structure of each term was determined through the consensus of two medical text index experts (kappa = 0.704) • The probabilities were obtained from over 10,000 Mayo Clinic clinical notes

  16. Monolithic Results 74.8 53.4 Percentage agreement with human experts 35.5 Technique

  17. Results without Monolithic Terms 83.5 59.5 39.5 Percentage agreement with human experts Technique

  18. Limitations • Monolithic structures • possibly identify through collocation extraction or dictionary lookup • As the number of words in a term grows so does the number of hypothesis (models) to be evaluated • only consider adjacent models • limit the length of the terms to 5 or 6 words

  19. Conclusions • Present a simple but effective method to identify the structure of three word terms • The method uses the Log Likelihood Ratio • Could be extended to identify the structure of for four, five and six word terms

  20. Future Work • Improve accuracy of method • explore other measures of association • Chi-squared, Phi, Dice coefficient ... • incorporate multiple measures together • Extend our method to four and five word terms • difficulty: finding a test set

  21. Software: Ngram Statistic Package (NSP) www.d.umn.edu/~tpederse/nsp.html Log Likelihood Ratio Models www.cs.umn.edu/~bthomson/mti.html Thank you

  22. 2 * ∑xyz ( nxyz * log(nxyz / mxyz) ) Log Likelihood Equation

  23. 2 * ∑xyz ( nxyz * log(nxyz / mxyz) ) Non-branching: mxyz = nx++ * n+y+ * n++z / n+++ Left-branching: mxyz = nxy+ * n++z / n+++ Right-branching: mxyz = nx++ * n+yz / n+++ Expected Values

More Related