300 likes | 493 Views
Automatic Labeling of Semantic Roles. By Daniel Gildea and Daniel Jurafsky Presented By Kino Coursey. Outline. Their Goals Semantic Roles Related Work Methodology Results Their Conclusions. Their Goals.
E N D
Automatic Labeling of Semantic Roles By Daniel Gildea and Daniel Jurafsky Presented By Kino Coursey
Outline • Their Goals • Semantic Roles • Related Work • Methodology • Results • Their Conclusions
Their Goals • To create a system that can identify the semantic relationships or semantic roles, filled by the syntatic constituents of a sentence and place them into a semantic frame. • Lexical and syntactic features are derived from parse trees and are used to make statistical classifiers from hand –annotated training data
Potential users • Shallow semantic analysis would be useful in a number of NLP tasks • Domain independent starting point for information extraction • Word sense Disambiguation based on current semantic role • Intermediate representation for translation and summarization • Adding semantic roles could improve parser and speech recognition accuracy
Their Approach • Treat the role assignment problem as being like other tagging problems • Use recent successful methods in probabilistic parsing and statistical classification • Use the hand-labeled FrameNet database to provide training info over 50,000 sentences from the BNC • FrameNet roles defines the tag set
Semantic Roles • Historically two types of roles • Very abstract like AGENT & PATIENT • Verb specific like EATER and EATEN for “eat” • FrameNet defines and intermediate, schematic representation of situations, with participants, props and conceptual roles. • A frame being a situation description can be activated by multiple verbs or other constituents
Frame Advantages • Avoids difficulty with trying to find a small set of universal, abstract or thematic roles • Has as many roles as necessary to describe the situation with minimal information loss and discrimination • Abstract roles can be defined as high level roles of abstract frames such as “action” or “motion” at the top of the hiearchy
Example FrameNet Markup <CORPUS CORPNAME="bnc" DOMAIN="motion" FRAME="removing" LEMMA="take.v"> <S TPOS="80499932"> <T TYPE="sense1"></T> <C FE="Agt" PT="NP" GF="Ext">Pioneer/VVB European/AJ0</C> settlers/NN2 used/VVD-VVN several/DT0 methods/NN2 to/TO0 <C TARGET="y"> take/VVI</C> <C FE="Thm" PT="NP" GF="Obj">land/NN1</C> <C FE="Src" PT="PP" GF="Comp">from/PRP indigenous/AJ0 people/NN0</C> ./PUN </S>
Related Work • Traditional parsing and understanding systems rely on hand-developed grammars • Must anticipate the way semantic roles are realized through syntax • Time consuming to develop • Limited coverage (human proscriptive recall problem)
Related Work • Others have used data-driven approaches for template-based semantic analysis in “shallow” systems • Miller(1996) Air Travler Information System, probability of a constituent filling slots in frames. Each node could have both semantic and syntactic elements • Data-driven information extraction by Riloff. Automatically derived case frames for words in domain
Related Work • Blaheta and Charniak used a statistical algorithm for assigning Penn Tree bank functional words with F-measure of 87% with 99% when ‘no tag’ is valid choice
Methodology • Two part strategy • Identify the boundaries of the frame elements in the sentence • Given the boundaries label each with the correct role • Statistics based: train a classifier on labeled training set then test on unlabeled test set
Methodology • Training • Trained using Collins parser on 37000 sentences • Match annotated frame elements to parse constituents • Extract various features from string of words and parse tree • Testing • Run parser on test sentences and extract same features • Probability for each semantic role r is computed from features
Features used • Phrase Type: Standard syntactic type (NP,VP,S) • Grammatical Function • Relation to rest of sentence (subject of verb, object of verb…) • Limited to NP’s • Position • Before or after predicate defining the frame • Correlated to Grammatical functions • Redundant backup information • Voice: Used 10 passive-identifying patterns for active/passive classification • Head Word: head words of each constituent
Testing • FrameNet corpus test set • 10% of each target word -> test set • 10% of each target word -> tuning set • Words with fewer than 10 ignored • Average number of sentences per target word = 34 [Too SPARSE !!!] • Average number of sentences per frame = 732
Sparseness Problem • Problem: Data is too sparse to directly calculate probabilities on the full set of features • Approach: Build classifiers by combining probabilities from distributions conditioned on combinations of features • Additional problem: FrameNet data was selected to show prototypical examples of semantic frames, not as a random sample for each frame • Approach : Collect more data in the future
Results: Probability Distributions • Coverage= % of test data seen in training • Accuracy = % of test data correctly predicted (similar to precision) • Performance = overall % of test data for which correct role is predicted (similar to recall)
Results: Simple Probabilities Used simple empirical distributions
Results: Combining data • Schemes of giving more weight to distributions with more data did not have a significant effect • Role assignments only depended on relative ranking so fine tuning makes little difference Backoff combination: use less specific data only if more specific is missing
Results: Linear Backoff was the best • Final system performance 80.4% up from the 40.9% baseline • Linear Backoff performed 80.4% on development set and 76.9% on Test set • Baseline performed 40.9% on development set and 40.6% on Test set
Results: Their Discussions • Constituent position relative to target word + active/passive info (78.8%) performed as well as reading grammatical functions off the parse tree (79.2%) • Using active/passive info can improve performance from 78.8% to 80.5%. 5% of examples were passives • Lexicalization via head words when available is good • P(role|head,target) is available for only 56.0% of data • P(role|head,target) is 86.7% correct without using any syntactic features.
Results: Lexical Clustering • Since head words performed so well but are so sparse, try to use clustering to improve coverage • Compute soft clusters for nouns using only frame elements with noun head words from the BNC P(r|h,nt,t)=SumOf( P(r|c,nt,t)*P(c|h), over C clusters h belongs to) • Unclustered data is 87.6% correct but only covers 43.7% • Clustered head words 79.9% for the 97.9% of nominal head words in vocabulary. • Adding clustering of NP constituents improved performance from 80.4% to 81.2% • (Question: Would other lexical semantic resources help?)
Automatic Identification of Frame Element Boundaries • Original experiments used hand annotated frame element boundaries • Used features in a sentence parse tree likely to be a frame element • System given human annotated target word and frame • Main feature used: path from target word through parse tree to constituent, using upward and downward links • Used P(fe|path), P(fe|path,target) and P(fe|head,target)
Automatic Identification of Frame Element Boundaries • P(fe|path,target) peforms relatively poorly since only about 30 sentences for each target word • P(fe|head,target) alone not a useful classifier, but helps with linear interpolation • Can only ID frame elements that have a constituent in the parse tree, but can be helped with partial matching • With relaxed matching, 86% agreement with hand annotations • When correctly ID’ed FE’s are fed into the previous role labeler, 79.6% are correct, in the same range as with human data • (Question: If it is correctly ID’ed, shouldn’t this be the case?)
Their Conclusions • Their system can label roles with some accuracy • Lexical statistics on constituents head words were most important feature used • Problem is while very accurate they are very sparse • Key to high overall performance was combining features • Combined system was more accurate than any feature alone, the specific method was less important