1 / 29

Semantic Interpretation of Medical Text

Semantic Interpretation of Medical Text. Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS. Semantic Interpretation of Medical Text . More accurate representation of the content of the input text

arleen
Download Presentation

Semantic Interpretation of Medical Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS

  2. Semantic Interpretation of Medical Text • More accurate representation of the content of the input text • Enhance text with information (concept, relationships) drawn from a medical knowledge source • Determine semantic meaning of the words (and bigger constructs) and the relationships between them.

  3. Combine Statistical and Symbolic Methods • Use of knowledge bases, semantic hierarchies, medical knowledge, rules • Use of statistic methods and machine learning techniques

  4. Statistical methods • Disambiguation • Detection of semantic patterns • Classification of semantically related constructs • Degrees (weights, probabilities)

  5. First Experiment: Noun Compounds and MeSH • Interpretation of noun compounds is crucially semantic • Noun compounds extracted from a collection of titles and abstracts of medical journals found in Medline • MeSH (Medical Subject Headings) concepts for the labels

  6. Input: Medline Text File Preprocessing Tagger Noun Compound Extraction MeSH Semantic Labeling Output: Semantic Labelled Noun Compounds

  7. MeSH Tree Structures (main) 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

  8. 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] + Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] + Hemic and Immune Systems [A15] + Embryonic Structures [A16] + Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Retroperitoneal Space[A01.047.681] Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] Pelvis [A01.673] + Perineum [A01.719] Skin [A01.835] + Thorax [A01.911] + Viscera [A01.960] MeSH Tree Structures (node A expanded)

  9. Mapping Nouns to MeSH Concepts • Ex: migraine headache recurrence

  10. migraine headache recurrence C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937 blood plasma perfusion A12.207.152 A15.145.693 E05.680 migraine headache pain C10.228.140.546.800.525 C23.888.592.612.441 G11.561.796.444 brain stem neurons A08.186.211 E05.595.402.541.250 A08.663 rat liver mitochondria B02.649.865.635.560 A03.620 A11.368.702.564 plasma arginine vasopressin A15.145.693 D12.125.095.104 D06.472.734.692.781 rat thyroid cells B02.649.865.635.560 A06.407.900 A11 growth hormone secretion G07.553.481 D27.505.440.472 A12.200 blood urea nitrogen A12.207.152 D02.948 D01.362.625 breast cancer cells A01.236 C04 A11 cancer cell lines C04 A11 G05.331.599.110.708.330.800.400 More Nouns Compounds

  11. Attachment and Semantic Interpretation • Attachment classification • “acute migraine treatment” [[N N] N] (LA) • “intra-nasal migraine treatment” [N [N N]] (RA) • To bootstrap semantic interpretation • Decision tree (Quinlan)

  12. Levels of Descriptions • migraine headache recurrence (LA) • C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937

  13. Decision Tree Classification

  14. Expressiveness of Decision Trees • first noun tree = B: ra (33.0/3.7) • first noun tree = E: ra (2.0/1.6) • first noun tree = F: la (0.0) • first noun tree = G: la (4.0/0.3) • first noun tree = A: • | second noun tree = B: la (0.0) • | second noun tree = D: la (4.0/0.3) • | second noun tree = E: la (10.0/0.4) • | second noun tree = F: la (0.0) • | second noun tree = G: la (6.0/1.6) • | second noun tree = A: • | | first tree position <= 4 : ra (7.0/1.6) • | | first tree position > 4 : la (36.0/5.8) • | second noun tree = C: • | | third noun tree = A: ra (9.0/0.3) • | | third noun tree = B: la (0.0) • | | third noun tree = D: la (1.0/0.3) • | | third noun tree = E: la (5.0/0.3) • | | third noun tree = F: la (0.0) • | | third noun tree = G: ra (2.0/1.6) • | | third noun tree = C: • | | | third tree position <= 21 : ra (5.0/2.6) • | | | third tree position > 21 : la (5.0/0.3) • first noun tree = C: • …..

  15. Semantic Interpretation • Use decision tree paths for the detection of clusters of noun compounds with the same semantic interpretation

  16. Ex: ACA: <anatomy> <disease> <anatomy>

  17. Ex: ACE:<anatomy> <disease> <Analytical, Diagnostic and Therapeutic Techniques and Equipment>

  18. From MeSH to UMLS • Unified Medical Language System, project at U.S National Library of Medicine • 3 UMLS Knowledge Sources • Metathesaurus • Semantic Network • SPECIALIST lexicon and programs

  19. Metathesaurus • Most extensive of UMLS sources • 730,000 concepts representing more then 1,500,000 strings in over 60 vocabularies and classifications • Organized by concept or meaning. • In essence, its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts. • Relationships in the Metathesaurus come from the sources themselves or are created by the Metathesaurus editors.

  20. Semantic Network • Consistent categorization of all concepts represented in the UMLS Metathesaurus and the important relationships between them. • Every concept has been assigned a semantic type. • The semantic types (134) are the nodes in the Network, and the relationships between them are the links (54) • High level semantic structure

  21. "Biologic Function" Hierarchy

  22. Noun Compounds, again • Very preliminary studies… • Can we use the information of the Semantic Net for the semantic interpretation on the noun compounds? • Are semantic types and relationships good descriptors? Are they useful for disambiguation and classification?

  23. Mapping of Noun Compounds

  24. Mapping of Noun Compounds

  25. Mapping of Noun Compounds

  26. Mapping Words - Semantic Types, Semantic Relationships • Semantic types correctly assigned (on 246 nc, 738 nouns): 59% • Semantic types disambiguated by the relationships • Doesn’t disambiguate: 42.7% • Disambiguates wrong: 17.3% • Disambiguates correctly: 40%

  27. (Some of) Future Work • Explore in more depth UMLS sources • What form the best basis for automatic semantic interpretation of noun phrases? • Semantic types? • Metathesaurus concepts?(and what parts of them) • Just MeSH concepts? • Machine Learning algorithms to help choose a good representation of medical terms

  28. Future Work • Machine learning algorithms for classification • Can we (and how) generalize patterns found for noun compounds to other syntactic structures? • How can we best formally represent semantics? • How can we combine symbolic rules with statistical methods? • How can we deal with non medical words? • Can the system help us disambiguate them? • Should we use other ontologies (ex WordNet)?

More Related