1 / 38

Proposition Bank: a resource of predicate-argument relations

Proposition Bank: a resource of predicate-argument relations. Martha Palmer, Dan Gildea, Paul Kingsbury University of Pennsylvania February 26, 2002 ACE PI Meeting, Fairfield Inn, MD. Outline. Overview Status Report Outstanding Issues Automatic Tagging – Dan Gildea

salali
Download Presentation

Proposition Bank: a resource of predicate-argument relations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proposition Bank: a resource of predicate-argument relations • Martha Palmer, Dan Gildea, Paul Kingsbury • University of Pennsylvania • February 26, 2002 • ACE PI Meeting, Fairfield Inn, MD

  2. Outline • Overview • Status Report • Outstanding Issues • Automatic Tagging – Dan Gildea • Details – Paul Kingsbury • Frames files • Annotator issues • Demo

  3. Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met consult Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting Proposition Bank:Generalizing from Sentences to Propositions meet(Somebody1, Somebody2) . . . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))

  4. Penn English Treebank • 1.3 million words • Wall Street Journal and other sources • Tagged with Part-of-Speech • Syntactically Parsed • Widely used in NLP community • Available from Linguistic Data Consortium

  5. (S (NP-SBJ Analysts) • (VP have • (VP been • (VP expecting • (NP (NP a GM-Jaguar pact) • (SBAR (WHNP-1that) • (S (NP-SBJ *T*-1) • (VP would • (VP give • (NP the U.S. car maker) • (NP (NP an eventual (ADJP 30 %) stake) • (PP-LOC in (NP the British company)))))))))))) VP have been VP expecting SBAR NP a GM-Jaguar pact WHNP-1 that VP give NP Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company. NP the US car maker NP an eventual 30% stake in the British company A TreeBanked Sentence S VP NP-SBJ Analysts NP S VP NP-SBJ *T*-1 would NP PP-LOC

  6. (S Arg0 (NP-SBJ Analysts) • (VP have • (VP been • (VP expecting • Arg1 (NP (NP a GM-Jaguar pact) • (SBAR (WHNP-1that) • (S Arg0 (NP-SBJ *T*-1) • (VP would • (VP give • Arg2 (NP the U.S. car maker) • Arg1 (NP (NP an eventual (ADJP 30 %) stake) • (PP-LOC in (NP the British company)))))))))))) a GM-Jaguar pact Arg0 that would give Arg1 *T*-1 an eventual 30% stake in the British company Arg2 the US car maker expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) The same sentence, PropBanked have been expecting Arg1 Arg0 Analysts

  7. English PropBank • 1M words of Treebank over 2 years, May’01-03 • New semantic augmentations • Predicate-argument relations for verbs • label arguments: Arg0, Arg1, Arg2, … • First subtask, 300K word financial subcorpus (12K sentences, 29K+ predicates) • Spin-off: Guidelines (necessary for annotators) • English lexical resource – FRAMES FILES • 3500+ verbs with labeled examples, rich semantics • http://www.cis.upenn.edu/~ace/

  8. English PropBank – Current Status • Frames files • 742 verb lemmas (includes phrasal variants - 932) • 363/899 VerbNet semi-automatic expansions (subtask/PB) • First subtask: 300K financial subcorpus • 22,595K unique predicates annotated out of 29K, (80%) • 6K+ remaining (7 weeks, 1000@week, first pass) • 1005 verb lemmas out of 1700+ (59%) • 700 remaining (3.5 months, 200@month) • PropBank, (including some of Brown?) • 34,437 predicates annotated out of 118K, (29%) • 1904 (1005 + 899) verb lemmas out of 3500, (54%)

  9. Projected delivery dates • Financial subcorpus • alpha release – December, 2001 • beta release – June, 2002 • adjudicated release – Dec, 2002 • Propbank • alpha release – December, 2002 • beta release – Spring, 2003

  10. English PropBank - Status • Sense tagging • 200+ verbs with multiple rolesets • sense tag this summer with undergrads using NSF funds • Still need to address • 3 usages of "have”: imperative, possessive, auxiliary • be, become: predicate adjectives, predicate nominals

  11. Automatic Labeling of Semantic Relations Features: • Predicate • Phrase Type • Parse Tree Path • Position (Before/after predicate) • Voice (active/passive) • Head Word

  12. Example with Features

  13. Parses Framenet PropBank PropBank > 10 instances Gold Standard 77.0 83.1 Automatic 82.0 73.6 79.6 Labelling Accuracy-Known Boundaries Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. Framenet examples (training/test) are handpicked to be unambiguous.

  14. Parses Framenet Precision Recall PropBank Precision Recall Gold Standard 71.1 64.4 Automatic 64.6 61.2 57.7 50.0 Labelling Accuracy – Unknown Boundaries Accuracy of semantic role prediction for unknown boundaries--the system must identify the constituents as arguments and give them the correct roles.

  15. Complete Sentence • Analysts have been expecting a GM-Jaguar pact that • *T*-1 would give the U.S. car maker an eventual 30% • stake in the British company and create joint ventures • that *T*-2 would produce an executive-model range • of cars. expect(analysts, pact) give(pact, car_maker,stake) create(pact,joint_ventures) produce(joint_ventures,range_of_cars)

  16. Guidelines: Frames Files • Created manually -Paul Kingsbury • new framer: Olga Babko-Malaya, (Ph.D.,Rugters, Linguistics) • Refer to VerbNet, WordNet and Framenet • Currently in place for 787/986 verbs • Use "semantic role glosses" unique to each verb (map to Arg0, Arg1 labels appropriate to class)

  17. Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

  18. Frames example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

  19. How are arguments numbered? • Examination of example sentences • Determination of required / highly preferred elements • Sequential numbering, Arg0 is typical first argument, except • ergative/unaccusative verbs (shake example) • Arguments mapped for "synonymous" verbs

  20. Additional tags (arguments or adjuncts?) • Variety of ArgM’s (Arg#>4): • TMP - when? • LOC - where at? • DIR - where to? • MNR - how? • PRP -why? • REC - himself, themselves, each other • PRD -this argument refers to or modifies another • ADV -others

  21. Ergative/Unaccusative Verbs: rise Roles Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. *Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have used ArgM-Source, ArgM-Goal. Arbitrary distinction.

  22. Synonymous Verbs: add in sense rise Roles: Arg1 = Logical subject, patient, thing rising/gaining/being added to Arg2 = EXT, amount risen Arg4 = end point The Nasdaq composite index added 1.01 to 456.6 on paltry volume.

  23. Phrasal Verbs • Put together • Put in • Put off • Put on • Put out • Put up • ... Accounts for additional 200 "verbs"

  24. Frames: Multiple Rolesets • Rolesets are not necessarily consistent between different senses of the same verb • Verb with multiple senses can have multiple frames, but not necessarily • Roles and mappings onto argument labels are consistent between different verbs that share similar argument structures, Similar to Framenet • Levin / VerbNet classes • http://www.cis.upenn.edu/~dgildea/Verbs/ • Out of the 787 most frequent verbs: • 1 Roleset - 521 • 2 rolesets - 169 • 3+ rolesets - 97 (includes light verbs)

  25. Semi-automatic expansion of Frames • Experimenting with semi-automatic expansion • Find unframed members of Levin class in VerbNet--inherit” frames from other member • 787 verbs manually framed • Can expand to 1200+ using VerbNet • Will need hand correction • First experiment, automatic expansion provided 90% coverage of data

  26. More on Automatic Expansion Destroy: Arg0: destroyer Arg1: thing destroyed Arg2: instrument of destruction Verbnet class Destroy-44: annihilate, blitz, decimate, demolish, destroy, devastate, exterminate, extirpate, obliterate, ravage, raze, ruin, waste, wreck

  27. What a Waste Waste: Arg0: destroyer Arg1: thing destroyed Arg2: instrument of destruction • He didn’t waste any time distancing himself from his former boss Arg0: He Arg1: any time Arg2 =? distancing himself...

  28. Trends in Argument Numbering • Arg0 = agent • Arg1 = direct object / theme / patient • Arg2 = indirect object / benefactive / instrument / attribute / end state • Arg3 = start point / benefactive / instrument / attribute • Arg4 = end point

  29. Morphology • Verbs also marked for tense/aspect/voice • Passive/Active • Perfect/Progressive • Third singular (is has does was) • Present/Past/Future • Infinitives/Participles/Gerunds/Finites • Modals and negation marked as ArgMs

  30. Annotation procedure • Extraction of all sentences with given verb • First pass: Automatic tagging (Joseph Rosenzweig) • http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon • Second pass: Double blind hand correction • Variety of backgrounds • Less syntactic training than for treebanking • Tagging tool highlights discrepancies • Third pass: Solomonization (adjudication)

  31. Inter-Annotator Agreement

  32. Annotator vs. Gold Standard

  33. Financial Subcorpus Status • 1005 verbs framed (700+ to go) • (742 + 363 VerbNet siblings) • 535 verbs first-passed • 22,595 unique tokens • Does not include ~3000 tokens tagged for Senseval • 89 verbs second-passed • 7600+ tokens • 42 verbs solomonized • 2890 tokens

  34. Throughput • Framing: approximately 25 verbs/week • Olga will also start framing; joint up to 50 verbs/wk • Annotation: approximately 50 predicates/hour • 20 hours of annotation a week, 1000 predicates/wk • Solomonization: approximately 1 hour per verb, but will speed up with lower frequency verbs.

  35. Summary • Predicate-argument structure labels are arbitrary to a certain degree, but still consistent, and generic enough to be mappable to particular theoretical frameworks • Automatic tagging as a first pass makes the task feasible • Agreement and accuracy figures are reassuring • Financial subcorpus is 80% complete, beta-release June

  36. Solomonization Source tree: Intel told analysts that the company will resume shipments of the chips within two to three weeks . *** Kate said: arg0 : Intel arg1 : the company will resume shipments of the chips within two to three weeks arg2 : analysts *** Erwin said: arg0 : Intel arg1 : that the company will resume shipments of the chips within two to three weeks arg2 : analysts

  37. Solomonization Such loans to Argentina also remain classified as non-accruing, *TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of interest income in the third period. *** Kate said: arg1 : *TRACE*-1 arg2 : $ 10 million *TRACE*-*U* of interest income arg3 : the bank argM-TMP : in the third period *** Erwin said: arg1 : *TRACE*-1 -> Such loans to Argentina arg2 : $ 10 million *TRACE*-*U* of interest income arg3 : the bank argM-TMP : in the third period

  38. Solomonization Also , substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth. *** Kate said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-MNR : relative to earnings growth *** Katherine said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-ADV : relative to earnings growth

More Related