1 / 20

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank. Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National University of Singapore EMNLP 2009. Outline. Recognizing implicit discourse relations Related work Penn Discourse Treebank

preston
Download Presentation

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou NgDepartment of Computer ScienceNational University of SingaporeEMNLP 2009

  2. Outline • Recognizing implicit discourse relations • Related work • Penn Discourse Treebank • Model and features • Experiments • Discussion • Conclusion

  3. Implicit discourse relations • When there is a discourse connective between two text spans, it is often easy to recognize the relation between the spans. • because, but, so.. • It is difficult to recognize the discourse relations when there are no explicit textual cues. • I am in Singapore, but I live in the United States. • The main conference is over Wednesday. I am staying for EMNLP.

  4. Background • Recently, the Peen Discourse Treebank (PDTB) has been released, which features discourse level annotation on both explicit and implicit relations. • With the recent release of the second version of this corpus, which provides a cleaner and more thorough implicit relation annotation. • It is an opportunity to address this area of work.

  5. Related work • Generated implicit relation annotation • Marcu & Echihabi (2002) • Saito et al. (2006) • Pitler et al. (2009) • Implicit relation classification on the second version of the PDTB. • Using several linguistically informed features, such as word polarity, verb classes, and word pairs. • Four-class classification (first level of relations)

  6. Penn Discourse Treebank • PDTB is a discourse level annotation over the one million word Wall Street Journal corpus. • Annotations for Explicit and implicit discourse relations are provided. • In any case, the brokerage firms are clearly moving faster to create new ads than they did in the fall of 1987. But it remains to be seen whether their ads will be any more effective. • “A lot of investor confidence comes from the fact that they can speak to us,” he says. [so] “To maintain that dialogue is absolutely crucial.”

  7. Methodology • The implicit relation classifier is built using supervised learning on a maximum entropy (MaxEnt) classifier. • Represent the annotated argument pairs into binary feature vectors suitable for training. • Four classes of features as they represent a wide range of information. • Contextual • Constituent parse • Dependency parse • Lexical

  8. Contextual features • For each relation curr we want to classify, we use the previous relation prev and the next relation next as evidence to fire six binary features.

  9. Constituent parse features • Build the syntactic tree and collect all production rules from the parse tree, with function tags (e.g. SBJ) and POS tags. • S -> NP VP, NP -> PRP, PRP -> “We”

  10. Dependency parse features • Produce a dependency tree for each argument using Stanford dependency parser. • The dependency trees encode additional information at the word level that is not explicit present in the constituent trees. • “had” <- nsubj dobj • “problems” <- det nn advmod • “at” <- dep

  11. Lexical features • Word pairs extracted from the respective text spans are a good signal of the discourse relation between arguments. • Stemmed and collected all word pairs from Arg1 and Arg2, i.e., all (wi, wj) where wi is a word from Arg1 and wj is a word from Arg2.

  12. Feature selection • Used a frequency cutoff of 5 to remove infrequent features. • 11,113 production rules. • 5,031 dependency rules. • 105,783 word pairs. • Rank the features with mutual information (MI).

  13. Experiment setup • OpenNLP MaxEnt package • PDTB Sections 2-21 as the training set and Section 23 as the test set. • Baseline: the majority class Cause. • Accuracy of 26.1% on the test set.

  14. Efficacy of the different feature classes

  15. Performance for the 11 relations

  16. Why are implicit discourse relations difficult to recognize? • Ambiguity • Inference • Context • Possibly need to read the whole article and understand what was happening. • World knowledge • Sometimes need world knowledge to understand the arguments and hence to interpret the relation. • Implicit discourse relation classification needs deeper semantic representations, more robust system design, and access to more external knowledge.

  17. Ambiguity • Contrast and Conjunction in the PDTB annotation are very similar to each other. • In the third quarter, AMR said, net fell to $137 million, or $2.16 a share, from $150.3 million, or $2.50 a share. [while] Revenue rose 17% to $2.73 billion from $2.33 billion a year earlier. (Contrast) • Dow’s third quarter net fell to $589 million, or $3.29 a share, from $632 million, or $3.36 a share, a year ago. [while] Sales in the latest quarter rose 2% to $4.25 billion from $4.15 billion a year earlier. (Conjunction) • Sales surged 40% to 250.17 billion yen from 178.61 billion. [meanwhile] Net income rose 11% to 29.62 billion yen from 26.68 billion. (Conjunction; Contrast)

  18. Inference • Sometimes inference and a knowledge base are required to resolve the relation type. • “I had calls all night long from the States,” he said. [in fact] I was woken up every hour – 1:30, 2:30, 3:30, 4:30.” (Restatement) • I had calls all night long -> I was woken up every hour.

  19. Conclusion • Implement an implicit discourse relation classifier and showed initial results on the recently released Penn Discourse Treebank. • The features include the modeling of the context of relations, features extracted from constituent parse trees and dependency parse trees, and word pair features. • The classifier achieves an accuracy of 40.2%, a 14.1% improvement over the baseline. • Conducted a data analysis and discussed four challenges that need to be addressed.

More Related