1 / 19

Using Syntax to Disambiguate Explicit Discourse Connectives in Text

Using Syntax to Disambiguate Explicit Discourse Connectives in Text. Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen. Discourse connectives. Words or phrases that explicitly signal the presence of a discourse relation such as on ce since

natan
Download Presentation

Using Syntax to Disambiguate Explicit Discourse Connectives in Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen

  2. Discourse connectives • Words or phrases that explicitly signal the presence of a discourse relation • such as • once • since • on the contrary • Implicit relations • a discourse connective is absent and inferred by the reader • hard to identify automatically • Explicit relations are much easier to predict, but…

  3. Two types of ambiguity • Discourse or non-discourse usage • For example, ”once” • a temporal discourse connective • a simply a word meaning “formerly” • Some connectives are ambiguous in terms of the relation they mark • For example, ” since” • serve as temporal connective • serve as causal connective

  4. Goal • Explore the predictive power of syntactic features for both disambiguation tasks

  5. Corpus and features • Corpus: Penn Discourse Treebank (PDTB) • Each discourse connective is assigned a sense from a three-level hierarchy of senses • Annotates 40,600 discourse relations (the largest public resource ) • 18,459 Explicit Relations • of 100 explicit discourse connectives • 16,053 Implicit Relations • Other relations • Annotators were allowed to provide two senses for a given connective

  6. Relation categories of discourse connective in PDTB • This work consider only the top level categories • general enough to be annotated with high inter-annotator agreement • Expansion擴展 (遞進/解證) • one clause is elaborating information in the other • Comparison對比 (並列) • information in the two clauses is compared or contrasted • Contingency情況 (因果/條件) • one clause expresses the cause of the other • Temporal循序(承接) • information in two clauses are related because of their timing

  7. Syntactic features • Syntax has not been used for discourse vs. non-discourse disambiguation • Syntax extensively used for dividing sentences into elementary discourse units • Idea: Discourse connectives appear in specific syntactic contexts • Four feature categories: • Self Category • Parent Category • Left Sibling Category • Right Sibling Category Parent Left Sibling Self Right Sibling

  8. Self Category • The highest node in the tree which dominates the words in the connective • For single word connectives • this might correspond to the POS tag of the word • For multi-word connectives • Example cue phrase “in addition” • Parsed as (PP (IN In) (NP (NN addition) )) • Preposition + Noun • the Self Category of the phrase is prepositional phrase

  9. Parent Category • The category of the immediate parent of the Self Category • Example: My favorite colors are blue and green • when “and” doesn’t has a discourse function • the parent of “and” would be an NP (“blue and green”)

  10. Left Sibling Category • The syntactic category of the sibling immediately to the left of the Self Category • If the left sibling does not exist, this features takes the value “NONE” • Self Category has a discourse function • while in example above, the left sibling of “and” is “NP” • so doesn’t has a discourse function

  11. Right Sibling Category • The syntactic category of the sibling immediately to the right of the Self Category • English is a right-branching language • the right sibling is often the dependent of the potential discourse connective • If the connective string has a discourse function • this dependent will often be a clause (SBAR) • Example: • “After I went to the store, I went home” • “After May, I will go on vacation”

  12. More features about the right sibling • Example: • NASA won’t attempt a rescue; instead, it will try to predict whether any of the rubble will smash to the ground and where. • Although the syntactic category of “where” is SBAR, “and” doesn’t has a discourse function • So include two additional features about the contents of the right sibling • Right Sibling Contains a VP • Right Sibling Contains a Trace • This example is a wh-trace

  13. Discourse vs. non-discourse usage • only 11 PDTB connectives appear as a discourse connective more than 90% of the time • although, in turn, afterward, consequently, additionally, alternatively, whereas, on the contrary, if and when, lest, and on the one hand...on the other hand • while “or” only serves a discourse function 2.8% of the times it appears

  14. Training and testing • Positive examples: • explicit discourse connectives annotated in the PDTB • Negative examples: • same strings in the PDTB texts that were not annotated as explicit connectives • report results using a maximum entropy classifier • 2 sections (0 and 1) of the PDTB were used for development of the features • 21 sections (2-22) used for ten-fold cross-validation • Baseline: the string of the connective • f-score=75.33% Accuracy=85.86%

  15. Combinations of features • Different connectives have different syntactic contexts • pair-wise interaction features • For example: connective=also-RightSibling=SBAR • Adding interaction terms between pairs of syntactic features

  16. Sense classification • a few connectives are quite ambiguous • since : indicates Temporal or Contingency • Contingency and Temporal are the senses most often annotated together. • do classification between the four senses for each explicit relation • using a Naive Bayes classifier • The connectives most often doubly annotated are • when • and • as

  17. Results • The human inter-annotator agreement on the top level sense class was also 94% • suggesting further improvements may not be possible

  18. Error Analysis • Temporal relations are the least frequent of the four senses(19% of the explicit relations) • But more than half of the errors involve the Temporal class • most commonly confused pairing was Contingency relations > Temporal relations • making up 29% of errors

  19. Conclusion • Using a few syntactic features leads to state-of-the-art accuracy for discourse vs. non-discourse usage classification • Syntactic features also helps sense class identification • already attained results at the level of human annotator agreement

More Related