Automatic labeling of semantic roles
1 / 30

Automatic Labeling of Semantic Roles - PowerPoint PPT Presentation

  • Uploaded on

Automatic Labeling of Semantic Roles. By Daniel Gildea and Daniel Jurafsky Presented By Kino Coursey. Outline. Their Goals Semantic Roles Related Work Methodology Results Their Conclusions. Their Goals.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Automatic Labeling of Semantic Roles' - melyssa-mosley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Automatic labeling of semantic roles

Automatic Labeling of Semantic Roles

By Daniel Gildea and Daniel Jurafsky

Presented By Kino Coursey


  • Their Goals

  • Semantic Roles

  • Related Work

  • Methodology

  • Results

  • Their Conclusions

Their goals
Their Goals

  • To create a system that can identify the semantic relationships or semantic roles, filled by the syntatic constituents of a sentence and place them into a semantic frame.

  • Lexical and syntactic features are derived from parse trees and are used to make statistical classifiers from hand –annotated training data

Potential users
Potential users

  • Shallow semantic analysis would be useful in a number of NLP tasks

  • Domain independent starting point for information extraction

  • Word sense Disambiguation based on current semantic role

  • Intermediate representation for translation and summarization

  • Adding semantic roles could improve parser and speech recognition accuracy

Their approach
Their Approach

  • Treat the role assignment problem as being like other tagging problems

  • Use recent successful methods in probabilistic parsing and statistical classification

  • Use the hand-labeled FrameNet database to provide training info over 50,000 sentences from the BNC

  • FrameNet roles defines the tag set

Semantic roles
Semantic Roles

  • Historically two types of roles

    • Very abstract like AGENT & PATIENT

    • Verb specific like EATER and EATEN for “eat”

  • FrameNet defines and intermediate, schematic representation of situations, with participants, props and conceptual roles.

  • A frame being a situation description can be activated by multiple verbs or other constituents

Frame advantages
Frame Advantages

  • Avoids difficulty with trying to find a small set of universal, abstract or thematic roles

  • Has as many roles as necessary to describe the situation with minimal information loss and discrimination

  • Abstract roles can be defined as high level roles of abstract frames such as “action” or “motion” at the top of the hiearchy

Example framenet markup
Example FrameNet Markup

<CORPUS CORPNAME="bnc" DOMAIN="motion" FRAME="removing" LEMMA="take.v">

<S TPOS="80499932">

<T TYPE="sense1"></T>

<C FE="Agt" PT="NP" GF="Ext">Pioneer/VVB European/AJ0</C>






<C TARGET="y"> take/VVI</C>

<C FE="Thm" PT="NP" GF="Obj">land/NN1</C>

<C FE="Src" PT="PP" GF="Comp">from/PRP indigenous/AJ0 people/NN0</C>



Related work
Related Work

  • Traditional parsing and understanding systems rely on hand-developed grammars

    • Must anticipate the way semantic roles are realized through syntax

    • Time consuming to develop

    • Limited coverage (human proscriptive recall problem)

Related work1
Related Work

  • Others have used data-driven approaches for template-based semantic analysis in “shallow” systems

  • Miller(1996) Air Travler Information System, probability of a constituent filling slots in frames. Each node could have both semantic and syntactic elements

  • Data-driven information extraction by Riloff. Automatically derived case frames for words in domain

Related work2
Related Work

  • Blaheta and Charniak used a statistical algorithm for assigning Penn Tree bank functional words with F-measure of 87% with 99% when ‘no tag’ is valid choice


  • Two part strategy

    • Identify the boundaries of the frame elements in the sentence

    • Given the boundaries label each with the correct role

  • Statistics based: train a classifier on labeled training set then test on unlabeled test set


  • Training

    • Trained using Collins parser on 37000 sentences

    • Match annotated frame elements to parse constituents

    • Extract various features from string of words and parse tree

  • Testing

    • Run parser on test sentences and extract same features

    • Probability for each semantic role r is computed from features

Features used
Features used

  • Phrase Type: Standard syntactic type (NP,VP,S)

  • Grammatical Function

    • Relation to rest of sentence (subject of verb, object of verb…)

    • Limited to NP’s

  • Position

    • Before or after predicate defining the frame

    • Correlated to Grammatical functions

    • Redundant backup information

  • Voice: Used 10 passive-identifying patterns for active/passive classification

  • Head Word: head words of each constituent


  • FrameNet corpus test set

    • 10% of each target word -> test set

    • 10% of each target word -> tuning set

    • Words with fewer than 10 ignored

    • Average number of sentences per target word = 34 [Too SPARSE !!!]

    • Average number of sentences per frame = 732

Sparseness problem
Sparseness Problem

  • Problem: Data is too sparse to directly calculate probabilities on the full set of features

  • Approach: Build classifiers by combining probabilities from distributions conditioned on combinations of features

  • Additional problem: FrameNet data was selected to show prototypical examples of semantic frames, not as a random sample for each frame

  • Approach : Collect more data in the future

Results probability distributions
Results: Probability Distributions

  • Coverage= % of test data seen in training

  • Accuracy = % of test data correctly predicted (similar to precision)

  • Performance = overall % of test data for which correct role is predicted (similar to recall)

Results simple probabilities
Results: Simple Probabilities

Used simple empirical distributions

Results combining data
Results: Combining data

  • Schemes of giving more weight to distributions with more data did not have a significant effect

  • Role assignments only depended on relative ranking so fine tuning makes little difference

Backoff combination: use less specific data only if more specific is missing

Results linear backoff was the best
Results: Linear Backoff was the best

  • Final system performance 80.4% up from the 40.9% baseline

  • Linear Backoff performed 80.4% on development set and 76.9% on Test set

  • Baseline performed 40.9% on development set and 40.6% on Test set

Results their discussions
Results: Their Discussions

  • Constituent position relative to target word + active/passive info (78.8%) performed as well as reading grammatical functions off the parse tree (79.2%)

  • Using active/passive info can improve performance from 78.8% to 80.5%. 5% of examples were passives

  • Lexicalization via head words when available is good

    • P(role|head,target) is available for only 56.0% of data

    • P(role|head,target) is 86.7% correct without using any syntactic features.

Results lexical clustering
Results: Lexical Clustering

  • Since head words performed so well but are so sparse, try to use clustering to improve coverage

  • Compute soft clusters for nouns using only frame elements with noun head words from the BNC

    P(r|h,nt,t)=SumOf( P(r|c,nt,t)*P(c|h), over C clusters h belongs to)

  • Unclustered data is 87.6% correct but only covers 43.7%

  • Clustered head words 79.9% for the 97.9% of nominal head words in vocabulary.

  • Adding clustering of NP constituents improved performance from 80.4% to 81.2%

  • (Question: Would other lexical semantic resources help?)

Automatic identification of frame element boundaries
Automatic Identification of Frame Element Boundaries

  • Original experiments used hand annotated frame element boundaries

  • Used features in a sentence parse tree likely to be a frame element

  • System given human annotated target word and frame

  • Main feature used: path from target word through parse tree to constituent, using upward and downward links

  • Used P(fe|path), P(fe|path,target) and P(fe|head,target)

Automatic identification of frame element boundaries1
Automatic Identification of Frame Element Boundaries

  • P(fe|path,target) peforms relatively poorly since only about 30 sentences for each target word

  • P(fe|head,target) alone not a useful classifier, but helps with linear interpolation

  • Can only ID frame elements that have a constituent in the parse tree, but can be helped with partial matching

  • With relaxed matching, 86% agreement with hand annotations

  • When correctly ID’ed FE’s are fed into the previous role labeler, 79.6% are correct, in the same range as with human data

  • (Question: If it is correctly ID’ed, shouldn’t this be the case?)

Their conclusions
Their Conclusions

  • Their system can label roles with some accuracy

  • Lexical statistics on constituents head words were most important feature used

  • Problem is while very accurate they are very sparse

  • Key to high overall performance was combining features

  • Combined system was more accurate than any feature alone, the specific method was less important