1 / 7

Layering of Annotations in the Penn Discourse TreeBank (PDTB)

Layering of Annotations in the Penn Discourse TreeBank (PDTB). Rashmi Prasad Institute for Research in Cognitive Science University of Pennsylvania. Discourse Relations in the PDTB. Argument Structure of Explicit/Implicit Conns (spans):. She hasn’t played any music since the earthquake hit.

veta
Download Presentation

Layering of Annotations in the Penn Discourse TreeBank (PDTB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute for Research in Cognitive Science University of Pennsylvania Workshop on Treebanking, HLT/NAACL, Rochester

  2. Discourse Relations in the PDTB • Argument Structure of Explicit/Implicit Conns (spans): • She hasn’t played any musicsincethe earthquake hit. • “We asked police to investigate why they are allowed to distribute the • flag in this way.Implicit=because It should be considered against the • law,”said Danny Leish, a spokesman for the association. • Semantics (labels) of connectives: Causal Temporal • Attribution (spans and (4) features (labels)): Source= Writer (implicit), span=unmarked Source=Other agent, span=marked 3 other attribution features: Type: Assertion, Belief, Factive, Intention Scopal Polarity: I don’t think X > I think NOT X Determinacy: I might think X !> I think X Workshop on Treebanking, HLT/NAACL, Rochester

  3. Layering with the PTB • Stand-off annotations of connective, argument and attribution spans: • Character offsets in the WSJ raw texts: • generated during the annotation • Tree node addresses of constituents in PTB trees (constituent sets for spans not dominated by a single node and for discontinuous text spans): • generated in post-annotation phase Workshop on Treebanking, HLT/NAACL, Rochester

  4. PTB Affecting PDTB Choices • Effect of PS vs. dependency annotation: none • Distinct POS marking of connectives in the PTB could have • allowed for automatic identification of connectives: For example, Discourse connective: (PP (IN For (NP (NN example )))) For John, Not a discourse connective: (PP (IN For (NP (NN John)))) Subordinating conjunctions marked as adverbs: When: (WHADP (WRB When )) Workshop on Treebanking, HLT/NAACL, Rochester

  5. PTB Affecting PDTB Choices S SBAR-TMP NP-SBJ VP WHADVP-1 S PRP VBZ S WRB John was hired he says Sue had already left When Discourse relations occurring intra-sententially could have been marked in the underlying annotation if not constrained by certain syntactic choices: Syntax incorrectly forces attribution to be the temporally modified element Syntax assumption: All words/phrases must be connected in a tree! Workshop on Treebanking, HLT/NAACL, Rochester

  6. What else could be annotated • Attribution phrases: since they often lead to a mismatch with discourse arguments of connectives • When Max was hired, he says Sue had already left. • Representative list obtainable from PDTB. Directly observable during syntactic annotation. • Alternative Lexicalizations (AltLex): lexical realizations of discourse relations with non-connective expressions • Mary has been depressed lately. The reason: she failed • Representative list obtainable from PDTB. May involve some • multi-sentence processing. Workshop on Treebanking, HLT/NAACL, Rochester

  7. Methodology and Quality Control • Choices made at more basic levels should make the task easier for discourse-level annotations. • Do some annotations at more basic levels if it prevents a • reassessment of annotator choices/judgements. • Quality control can be done by checking existing annotations • (or representative samples thereof) • Stand-off annotation: prevents incompatibilities in representation where unavoidable • Alignments with other layers to check for incompatibilities • e.g., attribution in PDTB and PTB Workshop on Treebanking, HLT/NAACL, Rochester

More Related