1 / 77

Lexical Semantics and Semantic Annotation

Lexical Semantics and Semantic Annotation. James Pustejovsky (with additional slides from: Martha Palmer, Nianwen Xue , Olga Babko - Malaya, Ben Snyder). CLSW 2011 NTU, Taipei May 4, 2011. Examples of Semantic Annotations. Predicators and their named arguments

norris
Download Presentation

Lexical Semantics and Semantic Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Semantics and Semantic Annotation James Pustejovsky (with additional slides from: Martha Palmer, NianwenXue, Olga Babko-Malaya, Ben Snyder) CLSW 2011 NTU, Taipei May 4, 2011

  2. Examples of Semantic Annotations • Predicators and their named arguments • [The man]agentpainted [the wall]patient. • Anaphors and their antecedents • [The protein]inhibits growth in yeast.[It]blocks production… • Acronyms and their long forms • [Platelet-derived growth factor] (known as [pdgf]) impacts … • Semantic Typing of entities • [The man]humanfired[the gun]firearm

  3. Linguistic Phenomena • Syntactic Structure • Describes grammatical arrangements of words into hierarchical structure • Predicate Argument Structure • Who did what to whom: Subject, object, predicate • Temporal Structure • Temporal ordering and anchoring of events in a text • Emotive and Discourse Structure • How language is used across sentences, and how content is expressed emotionally. Annotated corpora allow us to evaluate and train systems to be able to make these distinctions

  4. Motivation of Annotation • Semantic annotation is critical for robust language understanding • Question answering, summarization, inference, reading, … • Annotation schemata should focus on a single coherent theme: • Different linguistic phenomena should be annotated separately over the same corpus • The Annotate, Train, and Test Model advances linguistic theory: • Theories needs testing to evaluate coverage and predictive force. • Semantic theories are too complex to develop without this model.

  5. Methodological Assumption • Annotation scheme: • assumes a given feature set • Feature set: • encodes specific structural descriptions and properties of the input data • Structural descriptions: • theoretically-informed attributes derived from empirical observations over the data TheoryDescriptionFeaturesAnnotation

  6. Linguistic Annotation Schemes • PropBank • Palmer, Gildea, and Kingsbury (2005) • NomBank • Meyers, Reeves, Macleod, Szekely, Zielinska, Young, and Grishman (2004) • TimeBank • Pustejovsky, Littman, Knippen, and Sauri (2005) • Opinion Corpus • Wiebe, Wilson, and Cardie (2005) • Penn Discourse TreeBank • Miltsakaki, Prasad, Joshi, and Webber (2004)

  7. PropBank • Corpus annotated with semantic roles for arguments and adjuncts of verbs • 1M word Penn Treebank II WSJ corpus. • Coarse-grained sense tags, based on grouping of WordNet senses

  8. Powell met Zhu Rongji Powell and Zhu Rongji met Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting Proposition Bank:From Sentences to Propositions meet(Somebody1, Somebody2) . . .

  9. PropBank Annotation Example • [ArgM-ADV According to reports], [Arg1sea trials for [Arg1 a patrol boat] [Rel_develop.02 developed] [Arg0 by Kazakhstan]] are being [Rel_conduct.01 conducted] and [Arg1 the formal launch] is [Rel_plan.01 planned] [ArgM-TMP for the beginning of April this year]. 

  10. Powell met Zhu Rongji Powell and Zhu Rongji met Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting Proposition Bank:From Sentences to Propositions meet(Somebody1, Somebody2) . . .

  11. PropBank Annotation Example • [ArgM-ADV According to reports], [Arg1sea trials for [Arg1 a patrol boat] [Rel_develop.02 developed] [Arg0 by Kazakhstan]] are being [Rel_conduct.01 conducted] and [Arg1 the formal launch] is [Rel_plan.01 planned] [ArgM-TMP for the beginning of April this year]. 

  12. What is a PropBank? • A PropBank is a corpus annotated with the predicate-argument structure of the verbs: • English Propbank: www.cis.upenn.edu/~ace3/’04 LDC Kingsbury and Palmer 2002, Palmer, Gildea, Kingsbury, 2005 • Wall Street Journal, 1M words, 120K+ predicate instances • Brown, 14K predicate instances • Chinese Propbank: www.cis.upenn.edu/~chinese/cpb Xue and Palmer 2003, Xue 2004 • Xinhua (250K words – almost done), • Sinorama (250K words – estimated 2007) • Nominalized verbs for English = NomBank/NYU • Chinese NomBank?

  13. Capturing “neutral” semantic roles • Boyan broke [ Arg1the LCD-projector.] break (agent(Boyan), patient(LCD-projector)) • [Arg1 The windows] were broken by the hurricane. • [Arg1 The vase] broke into pieces when it toppled over

  14. Frames File example: give< 4000 Frames for PropBank Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

  15. Frames File example: givew/ Thematic Role Labels Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation VerbNet – based on Levin classes

  16. PropBank Exercise Ex. • [He]-Arg1Theme[will]-MOD [probably]-MOD be [extradited]-rel[to the U.S]-DIR[for trial under an extradition treaty President Virgilia Barco has revived]-PRP.  • He will probably be extradited to the U.S for trial under [an extradition treaty]-Arg1Theme[President Virgilia Barco]-Arg0Agent has [revived]-rel. 

  17. A Chinese Treebank Sentence 国会/Congress 最近/recently通过/pass 了/ASP银行法/banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会/Congress)) (VP (ADVP (ADV 最近/recently)) (VP (VV 通过/pass) (AS 了/ASP) (NP-OBJ (NN 银行法/banking law)))))

  18. 通过(f2) (pass) arg0 argM arg1 国会 最近银行法(law) (congress) (IP (NP-SBJ arg0 (NN 国会)) (VP argM (ADVP (ADV 最近)) (VP f2 (VV 通过) (AS 了) arg1 (NP-OBJ (NN 银行法))))) The Same Sentence, PropBanked

  19. Annotation procedure • PTB II – Extract all sentences of a verb • Create Frame File for that verb Paul Kingsbury (3400+ lemmas, 4700 framesets,120K predicates) • 1st pass: Automatic tagging Joseph Rosenzweig • 2nd pass: Double blind hand correction by verb Inter-annotator agreement 84% (87% Arg#’s) • 3rd pass: Adjudication Olga Babko-Malaya • 4th pass: Train automatic semantic role labellers Dan Gildea, Sameer Pradhan, Nianwen Xue, Szuting Yi, …. CoNLL-04 shared task, 2004, 2005, ….

  20. Word Senses in PropBank • Orders to ignore word sense not feasible for 700+ verbs • Mary left the room • Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in WordNet?

  21. PropBank II – English/Chinese (100K) We still need relations between events and entities: • Event ID’s with event coreference • Selective sense tagging • Tagging nominalizations w/ WordNet sense • Grouped WN senses - selected verbs and nouns • Nominal Coreference • not names • Clausal Discourse connectives – selected subset Level of representation that reconciles many surface differences between the languages

  22. Event IDs – Parallel Prop II (1) • Aspectual verbs do not receive event IDs: • 今年/this year 中国/China 继续/continue 发挥/play 其/it 在/at 支持/support 外商/foreign business 投资/investment 企业/enterprise 方面/aspect 的/DE 主/main 渠道/channel 作用/role “This year, the Bank of China will continue to play the main role in supporting foreign-invested businesses.”

  23. Event IDs – Parallel Prop II (2) • Nominalized verbs do: • He will probably be extradited to the US for trial. done as part of sense-tagging (all 7 WN senses for “trial” are events.) • 随着/with 中国/China 经济/economy 的/DE 不断/continued 发展/development… “With the continued development of China’s economy…” The same events may be described by verbs in English and nouns in Chinese, or vice versa. Event IDs help to abstract away from POS tag

  24. Event reference – Parallel Prop II • Pronouns (overt or covert) that refer to events: [This] is gonna be a word of mouth kind of thing. 这些/these 成果/achivements 被/BEI 企业/enterprise 用/apply (e15) 到/to 生产/production 上/on 点石成金/spin gold from straw, *pro*-e15 大大/greatly 提高/improve 了/le 中国/China 镍/nickel 工业/industry 的/DE 生产/production 水平/level 。 “These achievements have been applied (e15) to production by enterprises to spin gold from straw, which-e15 greatly improved the production level of China’s nickel industry.” • Prerequisites: • pronoun classification • free trace annotation

  25. Chinese PB II: Sense tagging • Much lower polysemy than English • Avg of 3.5 (Chinese) vs. 16.7 (English) Dang, Chia, Chiou, Palmer, COLING-02 • More than 2 Framesets 62/4865 (250K) Ch vs. 294/3635 (1M) English • Mapping Grouped English senses to Chinese (English tagging - 93 verbs/168 nouns, 5000+ instances) • Selected 12 polysemous English words (7 verbs/5 nouns) • For 9 (6 verbs/3 nouns), grouped English senses map to unique Chinese translation sets (synonyms)

  26. Mapping of Grouped Sense Tagsto Chinese lift, elevate, orient upwards 仰 / yang3 increase 提高 / ti2gao1 Collect, levy 募集 / mu4ji2 筹措 / chou2cuo4 筹... / chou2… invoke, elicit, set off 提 / ti4 raise – translations by group

  27. Discourse connectives: The Penn Discourse TreeBank • WSJ corpus (~1M words, ~2400 texts) http://www.cis.upenn.edu/~pdtb Miltsakaki, Prasad, Joshi and Webber, LREC-04, NAACL-04 Frontiers Prasad, Miltsakaki, Joshi and Webber ACL-04 Discourse Annotation • Chinese: 10 explicit discourse connectives that include subordination conjunctions, coordinate conjunctions, and discourse adverbials. • Argument determination, sense disambiguation [arg1 学校/school 不/not 教/teach 理财/finance management], [conn结果/as a result] [arg2 报章/newspaper 上/on 的/DE 各/all 种/kind 专栏/column 就/then 成为/become 信息/information 的/DE 主要/main 来源/source]。 “The school does not teach finance management. As a result, the different kinds of columns become the main source of information.”

  28. Mapping of Grouped Sense Tagsto Chinese • Zhejiang|浙江zhe4jiang1 will|将jiang1 raise|提高ti2gao1the level|水平shui3ping2 of|的de opening up|开放kai1fang4 to|对dui4 the outside world|外wai4. (浙江将提高对外开放的水平。) • I|我wo3 raised|仰yang3 my|我的wo3de head|头tou2 in expectation|期望qi1wang4.(我仰头望去。) • …, raising|筹措chou2cuo4 funds|资金zi1jin1 of|的de 15 billion|150亿yi1ban3wu3shi2yi4 yuan|元yuan2 (…筹措资金150亿元。) • The meeting|会议hui4yi4 passed|通过tong1guo4 the “decisionregarding motions”|议案yi4an4 raised|提ti4 by 32 NPC|人大ren2da4 representatives|代表dai4biao3 (会议通过了32名人大代表所提的议案。)

  29. NomBank Provides argument structure for 5000 common noun lemmas from the Penn Treebank II corpus. Borrows heavily from PropBank where possible (for example for nominalizations)

  30. NomBank Examples • Verb-Related • Powell’s/ARG0meeting with Zhu Rongji/ARG1 • Adjective Related • The absence of patent lawyers/ARG1 in the court/ARG2 • Nominals (16 classes) • Her/ARG1husband/ARG0 • An Oct. 1/ARG2date for the attack/ARG1

  31. NomBank Annotation Example • According to [Rel_report.01 reports], [Arg1 sea [Rel_trial.01 trials] [Arg1 for [Arg1-CF_launch.01 a patrol boat] developed by Kazakhstan] are being conducted and the [ArgM-MNR formal] [Rel_launch.01 launch] is planned for the [[REL_beginning.01 beginning] [ARG1 of April this year]]. 

  32. Opinion Annotation I think people are happy because Chavez has fallen. direct subjective span: think source: <writer, I> attitude: direct subjective span: are happy source: <writer, I, People> attitude: attitude span: think type: positive arguing intensity: medium target: attitude span: are happy type: pos sentiment intensity: medium target: inferred attitude span: are happy because Chavez has fallen type: neg sentiment intensity: medium target: target span: people are happy because Chavez has fallen target span: Chavez has fallen target span: Chavez

  33. Motivating Example “I think people are happy because Chavez has fallen. But there’s also a feeling of uncertainty about how the country’s obvious problems are going to be solved,” said Ms. Ledesma. AAAI 2004

  34. Motivating Example medium strength Though some of them did not conceal their criticisms of Hugo Chavez, the member countries of the Organization of American States condemned the coup and recognized the legitimacy of the elected president. high strength low strength

  35. Private States and Subjective Expressions Private state: covering term for opinions, emotions, sentiments, attitudes, speculations, etc. (Quirk et al., 1985) Subjective Expressions: words and phrases that express private states (Banfield, 1982) “The US fears a spill-over,” said Xirao-Nima. “The report is full of absurdities,” he complained.

  36. Corpus of Opinion Annotations • Multi-perspective Question Answering (MPQA) Corpus • Sponsored by NRRC ARDA • Released November, 2003 • http://nrrc.mitre.org/NRRC/publications.htm • Detailed expression-level annotations of private states: strength • See Wilson and Wiebe (SIGdial 2003) Freely Available

  37. Penn Discourse Treebank (PDTB) • Annotate discourse connectives and their arguments • Discourse connectives take clauses as their arguments and express relations between clauses • i.e., relations between propositions, events, situations • Discourse connectives such as- and, or, but, because, since, while, when, however, instead, although, also, for example, then, so that, insofar as, nonetheless • Subordinate conjunctions, Coordinate conjunctions, Adverbial connectives, Implicit connectives • Because[Arg2 he was sick], [Arg1 John left early] • Since[Arg2 the store is closed], [Arg1 we’ll go home].

  38. The Problem Connective Arg2 Afteradjusting for inflation, the Commerce Department saidspending didn’t change in September. Arg1 Afteradjustingfor inflation, the Commerce Department saidspending didn’tchangein September. Given a discourse connective, identify the heads of its two arguments

  39. Identifying Arguments in PDTB • Task • Identify lexicalized relations in Penn Discourse TreeBank (PDTB) • Identify head-words of arguments • Don’t identify relation type or non-lexicalized relations • Approach • Rank Arg1 & Arg2 candidate arguments separately • Apply MaxEnt statistical ranker • Re-rank top N argument pairs • Model both argument candidates jointly • Re-ranking reduces error 5-11% • Main Results: 74% accuracy at identifying both arguments correctly for a connective • Using gold-standard TreeBank parses

  40. PDTB Examples Coordinator Choose 203 business executives, including, perhaps, someonefrom your own staff,andput them out on the streets, to be deprived for one month of their homes, families and income. Subordinator Drug makers shouldn’t be able to duck liabilitybecause peoplecouldn’t identify precisely which identical drug was used. France’s second-largest government-owned insurance company,Assurances Generales de France, has been building its own Naviation Mixte stake, currently thought to be between 8% and 10%. Analysts said they don’t think it is contemplating a takeover,however, and its officials couldn’t be reached. Discourse Adverbial

  41. Motivation for time and event markup • Natural language is filled with references to past and future events, as well as planned activities and goals; • Without a robust ability to identify and temporally situate events of interest from language, the real importance of the information can be missed; • A Robust Annotation standard can help leverage this information from natural language text.

  42. Temporal Awareness in Real Text • The bridge collapsed during the storm but after traffic was rerouted to the Bay Bridge. • President Roosevelt died in April 1945 before • the war ended. (event happened) • he dropped the bomb.(event didn’t happen) • The CEO plans to retire next month. • Last week Bill was running the marathon when he twisted his ankle. Someone had tripped him. He fell and didn't finish the race.

  43. Current Time Analysis Technology • Document Time Linking • Find the document creation time and link that to all events in the text; • Local Time Stamping • find an event and a “local temporal expression”, and link it to that time;

  44. Document Time Stamping April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." The blast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obama ordered a review earlier this month and blamed mine officials for lax regulation.

  45. Document Time Stamping: April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." The blast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obama ordered a review earlier this month and blamed mine officials for lax regulation.

  46. Document Time Stamping: for real April 25, 2010 • President Obama paid tributeSunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." The blast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obama ordered a review earlier this month and blamed mine officials for lax regulation.

  47. Time Stamping: the good, bad, … ✓ • ☺Set up a meeting on Tuesday with EMC. ✓ • ☺Franklin arrives tomorrow from London. ✗ • ☹ Franklin arrives on the afternoon flight from London tomorrow. ✗ • ☹ ☹ Most people drive today while talking on the phone.

  48. Temporal Awareness Challenge • Identification of all important events in a text • Actual temporal ordering and time anchoring of these events to temporal expressions.

  49. ISO-Timeml Enables Temporal Parsing • A new generation of language analysis tools that are able to temporally organize events in terms of their ordering and time of occurrence • These tools can be integrated with visualization, summarization, question answering, and link analysis systems to help analyze large event-rich information spaces.

  50. ISO-TimeML Provides elements to: • Find all events and times in newswire text • Link events to the document time and to local times • Order event relative to other events • Ensure consistency of the the temporal relations

More Related