1 / 31

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. Barbara Rosario, Marti Hearst SIMS, UC Berkeley. LINDI Project. Goal: Extract semantics from text Method: statistical corpus analysis Focus: Biomedical text Rich lexical resources

zavad
Download Presentation

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley

  2. LINDI Project • Goal: Extract semantics from text • Method: statistical corpus analysis • Focus: Biomedical text • Rich lexical resources • Semantic NLP problems • Noun Compounds

  3. Noun Compounds (NCs) • Any sequence of nouns that itself functions as a noun • asthma hospitalizations • asthma hospitalization rates • health care personnel hand wash • Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.

  4. NCs: 3 computational tasks(Lauer & Dras ’94) • Identification • Syntactic analysis (attachments) • [Baseline [headache frequency]] • [[Tension headache] patient] • Semantic analysis • Headache treatment treatment for headache • Corticosteroid treatment treatment that uses corticosteroid

  5. Outline • Classification schema for NC relations in the biomedical domain • Experiments • Supervised learning for classification of NC relations • Examine generalization over lexical items using a lexical hierarchy • Related work • Conclusions

  6. NC Semantic relations • 38 Relations found by iterative refinement based on 2245 NCs • Goals: • More specific than case roles • Allow for domain-specific relations

  7. Semantic relations • Frequency/time of • influenza season, headache interval • Measure of • relief rate, asthma mortality, hospital survival • Instrument • aciclovir therapy, laser irradiation, aerosol treatment • “Purpose” • headache drugs, hiv medications, influenza treatment • Defect • hormone deficiency, csf fistulas, gene mutation • Inhibitor • Adrenoreceptor blockers, influenza prevention

  8. Semantic relations • Cause • Asthma hospitalization, aids death • Change • Papilloma growth, disease development • Activity/Physical Process • Bile delivery, virus reproduction • Person Afflicted • Aids patients, headache group • ….

  9. Multi-class Assignment • Some NCs can be describe by more than one semantic relationships • eyelid abnormalities : location and defect • food allergy: cause and activator • cell growth: change and activity

  10. NC Semantic Relations • Linguistic theories regarding the nature of the relations between constituents in NCs all conflict. • J. Levi ‘78 • P. Downing ’77 • B. Warren ‘78

  11. Extraction of NCs • Titles and abstracts from Medline (medical bibliographic database) • Part-of-Speech Tagger • Extraction of sequences of units tagged as nouns • Collection of 2245 NCs with 2 nouns

  12. Models • Lexical (words) • Class based model using MeSH descriptors

  13. MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

  14. 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….) MeSH Tree Structures

  15. Mapping Nouns to MeSH Concepts • headache recurrence • C23.888.592.612.441 C23.550.291.937 • headache pain • C23.888.592.612.441 G11.561.796.444 • breast cancer cells • A01.236 C04 A11

  16. Levels of Description headache pain • MeSH 2: C.23 G.11 • MeSH 3: C23.888 G11.561 • MeSH 4: C23.888.592 G11.561.796 • MeSH 5: C23.888.592 G11.561.796 • MeSH 6: C23.888.592.612 G11.561.796.444

  17. Classification Task & Method • Multi-class (18) classification problem • Multi layer Neural Networks to classify across all relations simultaneously. • Evaluation: distinguish between • Seen: NCs where 1 or 2 words appeared in the training set • Unseen: NCs in which neither word appeared in the training set

  18. Lexical MeSH Accuracy for 18-way Classification Correct answer in first three (76%-78%) Correctanswer in first two (71%-73%) Correct answer ranked first (61%-62%) Logistic Regression (31%) Guessing (1/18 = 5%)

  19. Accuracies for 18-way classification: generalization on unseen NCs MeSH MeSH on unseen Lexical Lexical on unseen

  20. Accuracy for each relation

  21. Accuracy for sample relations Frequency/time of Test Set: disease recurrence headache recurrence enterovirus season influenza season mosquito season pollen season disease stage transcription stage drive time injection time ischemia time travel time

  22. Accuracy for sample relations Produces (genetic) Ex. Test Set: thymidine allele tumor dna csf mrna acetylase gene virion rna (…)

  23. Accuracy for sample relations Purpose Purpose Test Set: varicella vaccine influenza vaccination influenza immunization abscess drainage disease treatment asthma therapy Training Set: Instrument: antigen vaccine Object: vaccine development Subtype-of: opv vaccine

  24. Related work(Noun Compound Relations) • Finin (1980) • Detailed AI analysis, hand-coded • Rindflesch et al. (2000) • Hand-coded rule base to extract certain types of assertions

  25. Related work(Noun Compound Relations) • Vanderwende (1994) • automatically extracts semantic information from an on-line dictionary • manipulates a set of handwritten rules • 13 classes • 52% accuracy • Lapata (2000) • classifies nominalizations into subject/object • binary distinction • 80% accuracy • Lauer (1995): • probabilistic model • 8 classes • 47% accuracy

  26. Related work (Lexical Hierarchies) • Prepositional Phrase Attachment • Attachment, not semantics • Binary choice • Approaches • Word occurrences (Hindle & Rooth ’93) • Using a lexical hierarchy • Conceptual association using a lexical hierarchy (Resnik ’93, Resnik & Hearst ’93) • Transformation-based incorporating counts from a lexical hierarchy (Brill & Resnik ’94) • MDL to find optimal tree cut (Li & Abe ’98) finds improvements over lexical

  27. Conclusions • A simple method for assigning semantic relations to noun compounds • Does not require complex hand-coded rules • Does make use of existing lexical resources • Off-the-shelf ML algorithms • High accuracy levels for an 18-way class assignment • ~60% accuracy on mixed seen and unseen words • ~40% accuracy on entirely unseen words on a tiny training set (73 NCs)

  28. Future work • Analysis of erroneous cases • Other statistical models • Bootstrapping & Active learning for labeling • NCs with > 2 terms • [[growth hormone] deficiency] • (purpose + defect) • Other syntactic structures • Non-biomedical words • Other ontologies (e.g.,WordNet)?

  29. Relations

  30. Accuracies by Unseen Noun

More Related