classification of semantic relations in noun compounds via a domain specific lexical hierarchy l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy PowerPoint Presentation
Download Presentation
Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy

Loading in 2 Seconds...

play fullscreen
1 / 31

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. Barbara Rosario, Marti Hearst SIMS, UC Berkeley. LINDI Project. Goal: Extract semantics from text Method: statistical corpus analysis Focus: Biomedical text Rich lexical resources

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy' - zavad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
classification of semantic relations in noun compounds via a domain specific lexical hierarchy

Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy

Barbara Rosario, Marti Hearst

SIMS, UC Berkeley

lindi project
LINDI Project
  • Goal: Extract semantics from text
  • Method: statistical corpus analysis
  • Focus: Biomedical text
    • Rich lexical resources
    • Semantic NLP problems
      • Noun Compounds
noun compounds ncs
Noun Compounds (NCs)
  • Any sequence of nouns that itself functions as a noun
    • asthma hospitalizations
    • asthma hospitalization rates
    • health care personnel hand wash
  • Technical text is rich with NCs

Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.

ncs 3 computational tasks lauer dras 94
NCs: 3 computational tasks(Lauer & Dras ’94)
  • Identification
  • Syntactic analysis (attachments)
      • [Baseline [headache frequency]]
      • [[Tension headache] patient]
  • Semantic analysis
      • Headache treatment treatment for headache
      • Corticosteroid treatment treatment that uses corticosteroid
outline
Outline
  • Classification schema for NC relations in the biomedical domain
  • Experiments
    • Supervised learning for classification of NC relations
    • Examine generalization over lexical items using a lexical hierarchy
  • Related work
  • Conclusions
nc semantic relations
NC Semantic relations
  • 38 Relations found by iterative refinement based on 2245 NCs
  • Goals:
    • More specific than case roles
    • Allow for domain-specific relations
semantic relations
Semantic relations
  • Frequency/time of
    • influenza season, headache interval
  • Measure of
    • relief rate, asthma mortality, hospital survival
  • Instrument
    • aciclovir therapy, laser irradiation, aerosol treatment
  • “Purpose”
    • headache drugs, hiv medications, influenza treatment
  • Defect
    • hormone deficiency, csf fistulas, gene mutation
  • Inhibitor
    • Adrenoreceptor blockers, influenza prevention
semantic relations8
Semantic relations
  • Cause
    • Asthma hospitalization, aids death
  • Change
    • Papilloma growth, disease development
  • Activity/Physical Process
    • Bile delivery, virus reproduction
  • Person Afflicted
    • Aids patients, headache group
  • ….
multi class assignment
Multi-class Assignment
  • Some NCs can be describe by more than one semantic relationships
      • eyelid abnormalities : location and defect
      • food allergy: cause and activator
      • cell growth: change and activity
nc semantic relations10
NC Semantic Relations
  • Linguistic theories regarding the nature of the relations between constituents in NCs all conflict.
    • J. Levi ‘78
    • P. Downing ’77
    • B. Warren ‘78
extraction of ncs
Extraction of NCs
  • Titles and abstracts from Medline (medical bibliographic database)
  • Part-of-Speech Tagger
  • Extraction of sequences of units tagged as nouns
  • Collection of 2245 NCs with 2 nouns
models
Models
  • Lexical (words)
  • Class based model using MeSH descriptors
mesh tree structures
MeSH Tree Structures

1. Anatomy [A]

2. Organisms [B]

3. Diseases [C]

4. Chemicals and Drugs [D]

5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]

6. Psychiatry and Psychology [F]

7. Biological Sciences [G]

8. Physical Sciences [H]

9. Anthropology, Education, Sociology and Social Phenomena [I]

10. Technology and Food and Beverages [J]

11. Humanities [K]

12. Information Science [L]

13. Persons [M]

14. Health Care [N]

15. Geographic Locations [Z]

mesh tree structures14
1. Anatomy [A]

Body Regions [A01] +

Musculoskeletal System [A02] Digestive System [A03] +

Respiratory System [A04] +

Urogenital System [A05] +

Endocrine System [A06] +

Cardiovascular System [A07] +

Nervous System [A08] +

Sense Organs [A09] +

Tissues [A10] +

Cells [A11] +

Fluids and Secretions [A12] +

Animal Structures [A13] +

Stomatognathic System [A14]

(…..)

Body Regions [A01]

Abdomen [A01.047]

Groin [A01.047.365]

Inguinal Canal [A01.047.412]

Peritoneum [A01.047.596] +

Umbilicus [A01.047.849]

Axilla [A01.133]

Back [A01.176] +

Breast [A01.236] +

Buttocks [A01.258]

Extremities [A01.378] +

Head [A01.456] +

Neck [A01.598]

(….)

MeSH Tree Structures
mapping nouns to mesh concepts
Mapping Nouns to MeSH Concepts
  • headache recurrence
    • C23.888.592.612.441 C23.550.291.937
  • headache pain
    • C23.888.592.612.441 G11.561.796.444
  • breast cancer cells
    • A01.236 C04 A11
levels of description
Levels of Description

headache pain

  • MeSH 2: C.23 G.11
  • MeSH 3: C23.888 G11.561
  • MeSH 4: C23.888.592 G11.561.796
  • MeSH 5: C23.888.592 G11.561.796
  • MeSH 6: C23.888.592.612 G11.561.796.444
classification task method
Classification Task & Method
  • Multi-class (18) classification problem
  • Multi layer Neural Networks to classify across all relations simultaneously.
  • Evaluation: distinguish between
    • Seen: NCs where 1 or 2 words appeared in the training set
    • Unseen: NCs in which neither word appeared in the training set
accuracy for 18 way classification

Lexical

MeSH

Accuracy for 18-way Classification

Correct answer in first three (76%-78%)

Correctanswer in first two (71%-73%)

Correct answer ranked first (61%-62%)

Logistic Regression (31%)

Guessing (1/18 = 5%)

accuracies for 18 way classification generalization on unseen ncs
Accuracies for 18-way classification: generalization on unseen NCs

MeSH

MeSH

on unseen

Lexical

Lexical

on unseen

accuracy for sample relations
Accuracy for sample relations

Frequency/time of

Test Set:

disease recurrence

headache recurrence

enterovirus season

influenza season

mosquito season

pollen season

disease stage

transcription stage

drive time

injection time

ischemia time

travel time

accuracy for sample relations22
Accuracy for sample relations

Produces (genetic)

Ex. Test Set:

thymidine allele

tumor dna

csf mrna

acetylase gene

virion rna

(…)

accuracy for sample relations23
Accuracy for sample relations

Purpose

Purpose Test Set:

varicella vaccine

influenza vaccination

influenza immunization

abscess drainage

disease treatment

asthma therapy

Training Set:

Instrument:

antigen vaccine

Object:

vaccine development

Subtype-of:

opv vaccine

related work noun compound relations
Related work(Noun Compound Relations)
  • Finin (1980)
    • Detailed AI analysis, hand-coded
  • Rindflesch et al. (2000)
    • Hand-coded rule base to extract certain types of assertions
related work noun compound relations25
Related work(Noun Compound Relations)
  • Vanderwende (1994)
    • automatically extracts semantic information from an on-line dictionary
    • manipulates a set of handwritten rules
    • 13 classes
    • 52% accuracy
  • Lapata (2000)
    • classifies nominalizations into subject/object
    • binary distinction
    • 80% accuracy
  • Lauer (1995):
    • probabilistic model
    • 8 classes
    • 47% accuracy
related work lexical hierarchies
Related work (Lexical Hierarchies)
  • Prepositional Phrase Attachment
    • Attachment, not semantics
    • Binary choice
  • Approaches
    • Word occurrences (Hindle & Rooth ’93)
    • Using a lexical hierarchy
      • Conceptual association using a lexical hierarchy (Resnik ’93, Resnik & Hearst ’93)
      • Transformation-based incorporating counts from a lexical hierarchy (Brill & Resnik ’94)
      • MDL to find optimal tree cut (Li & Abe ’98) finds improvements over lexical
conclusions
Conclusions
  • A simple method for assigning semantic relations to noun compounds
    • Does not require complex hand-coded rules
    • Does make use of existing lexical resources
    • Off-the-shelf ML algorithms
  • High accuracy levels for an 18-way class assignment
    • ~60% accuracy on mixed seen and unseen words
    • ~40% accuracy on entirely unseen words on a tiny training set (73 NCs)
future work
Future work
  • Analysis of erroneous cases
  • Other statistical models
    • Bootstrapping & Active learning for labeling
  • NCs with > 2 terms
    • [[growth hormone] deficiency]
    • (purpose + defect)
  • Other syntactic structures
  • Non-biomedical words
    • Other ontologies (e.g.,WordNet)?