1 / 31

Toward Opinion Summarization: Linking the Sources

ACL2006 Workshop on Sentiment and Subjectivity in Text. Toward Opinion Summarization: Linking the Sources. Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850, USA {ves,cardie}@cs.cornell.edu. Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo

ronda
Download Presentation

Toward Opinion Summarization: Linking the Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACL2006 Workshop on Sentiment and Subjectivity in Text Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850, USA {ves,cardie}@cs.cornell.edu Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2006/10/23

  2. Agenda • Introduction • Toward opinion summarization • Source coreference resolution • Data set • The method • Transformation • Standard noun phrase coreference resolution • Coreference resolution • By Ng and Cardie (2002) • Evaluation • Conclusion

  3. Introduction 1/4 • Problem of opinion summarization • Addressing the dearth of approaches for summarizing opinion information • Source coreference resolution • Deciding which source mentions (opinion holders) are associated with opinions that belong to the same real-world entity • Example (see next page) • Coreference resolution • Deciding what noun phrases in the text refer to the same real-world entities • 阿扁 or 陳總統 or 中華民國陳總統 = 陳水扁

  4. Introduction 2/4 • Example (corpus of manually annotated opinions) “ [Target Delaying of Bulgaria’s accession to the EU] would be a serious mistake” [Source Bulgarian Prime Minister Sergey Stanishev] said in an interview for the German daily Suddeutsche Zeitung. “ [Target Our country]serves as a model and encourages countries from the region to follow despite the difficulties”, [Source he] added. [Target Bulgaria] is criticized by [Source the EU] because of slow reforms in the judiciary branch, the newspaper notes. Stanishev was elected prime minister in 2005. Since then, [Source he] has been a prominent supporter of [Target his country’s accession to the EU].

  5. Introduction 3/4

  6. Introduction 4/4 • Example (source coreference resolution) “ [Target Delaying of Bulgaria’s accession to the EU] would be a serious mistake” [Source Bulgarian Prime Minister Sergey Stanishev] said in an interview for the German daily Suddeutsche Zeitung. “ [Target Our country] serves as a model and encourages countries from the region to follow despite the difficulties”, [Sourcehe] added. [Target Bulgaria] is criticized by [Source the EU] because of slow reforms in the judiciary branch, the newspaper notes. Stanishev was elected prime minister in 2005. Since then, [Source he] has been a prominent supporter of [Target his country’s accession to the EU].

  7. Data set 1/2 • MPQA corpus(Wilson and Wiebe, 2003) • Multi-Perspective Question Answering • Developing annotation using GATE • General Architecture for Text Engineering • Example (see next page) • 535 manually annotated documents with phrase-level opinion information • Over 11-month period, between June 2001 and May 2002 • Suitable for the political, government and commercial domain • Can find source coreference chain • Contains no coreference information for general NPs (which are not sources)

  8. Data set 2/2 • Example of annotations in GATE

  9. The method 1/10 • To solve source coreference resolution • Transformation • How source coreference resolution (SCR) can be transformed into standard noun phrase coreference resolution (NPCR) ? • Difference between SCR and NPCR : • The sources of opinions do not exactly correspond to the automatic extractors’ notion of noun phrases (NPs) • The time-consuming nature of coreference annotation

  10. The method 2/10 • The general approach to SCR • Preprocessing • To obtain an augmented set of NPs in the text • Done by Ng and Cardie (2002) • Running a tokenizer, sentence splitter, POS tagger, parser, a base NP finder, and a named entity finder • Source to noun phrase mapping • Three problems • Using a set of heuristics • Coreference resolution • Applying a state-of-the-art coreference resolution approach to the transformed data • “ Improving Machine learning approaches to coreference resolution ” [ by Ng and Cardie (2002) ]

  11. The method 3/10 • Three problems • Inexact span match • “Venezuelan people” vs. “the Venezuelan people” • “Muslims rulers” was not recognized, while “Muslims” and “rulers” were recognized by the NP extractor • Multiple NP match • “the country’s new president, Eduardo Duhalde” • “Latin American leaders at a summit meeting in Costa Rica” • “Britain, Canada and Australia” • No matching NP • “Carmona named new ministers, including two military officers who rebelled against Chavez” • “many”, “which”, and “domestically” • “lash” and “taskforce”

  12. The method 4/10 • Using a set of heuristics • Rule 1 • If a source matches any NP exactly in span, match that source to the NP; do this even if multiple NPs overlap the source • Example_1 • [determiner] “ the Venezuelan people ” • [NP extractor] “ the Venezuelan people ” • Example_2 • [determiner] “ the country’s new president, Eduardo Duhalde ” • [NP extractor] “ the country’s new president” , “Eduardo Duhalde ”

  13. The method 5/10 • Rule 2 • If no NP matches exactly in span then : • If a single NP overlaps the source, • Then map the source to that NP • Ifmultiple NP overlaps the source, • Then prefer three cases : • The outermost NP • Because longer NPs contain more information • The last NP • Because it is likely to be the head NP of a phrase • NP’s before preposition • Because a preposition signals an explanatory prepositional phrase

  14. The method 6/10 • Example • The outermost NP • [determiner] • “Prime Minister Sergey Stanishev” • [NP extractor] • “Bulgarian Prime Minister” , “Sergey Stanishev” • “Bulgarian Prime Minister Sergey Stanishev” • The last NP • [determiner] • “new president, Eduardo Duhalde” • [NP extractor] • “the country’s new president” , “Eduardo Duhalde” • NP’s before preposition • [determiner] • “Latin American leaders at a summit meeting in Costa Rica” • [NP extractor] • “Latin American leaders” , ”summit meeting” , “Costa Rica”

  15. The method 7/10 • Rule 3 • If no NP overlaps the source, select the last NP before the source. • Stanishev was elected prime minister in 2005. Since then, [source he] has been a prominent supporter. • [determiner] => “he“ • [NP extractor] • “Stanishev“,“prime minister”,“prominent supporter” • In half of the cases we are dealing with the word who, which typically refers to the last preceding NP. • “Carmona named new ministers, including two military officers who rebelled against Chavez”

  16. The method 8/10 • Coreference resolution • Using the standard combination of classification and single-link clustering • Soon et al. (2001) and Ng and Cardie (2002) • Machine learning approach • Computing a vector of 57 features for every pair of source noun phrases from the preprocessed corpus • (source , NP) • Training • To predict whether a source NP pair should be classified as positive (the NPs refer to the same entity) or negative • Testing • To predict whether a source NP pair is positive • and single-link clustering to group together sources that belong to the same entity

  17. The method 9/10 • Example (Single-link clustering) • Training (positive instance) • (source , NP) + feature set • (李登輝 , 李前總統) + 57 features • (李登輝 , 登輝先生) + 57 features • (阿輝伯 , 登輝先生) + 57 features • Testing • (李前總統 , 登輝先生) => positive • (阿輝伯 , 李前總統) => positive • 阿輝伯 -- 李前總統 -- 登輝先生

  18. The method 10/10 • Machine learning techniques • To try the reportedly best techniques for pairwise classification • RIPPER (Cohen, 1995) • Repeated Incremental Pruning to Produce Error Reduction • Using 24 different settings • SVM light • Support Vector Machines • Using 56 different settings • Feature set • 57 = 12 + 41 + ?? • 12 by Soon et al. (2001) • 41 by Ng and Cardie (ACL2002)

  19. Feature set (12 features) • ( NPi , NPj )

  20. Feature set (41 features)

  21. Feature set (41 features) cont.

  22. Evaluation • MPQA corpus (535 documents) • 400 for training set (random) • 135 for test set (remaining) • The purpose of the evaluation • To create a strong baseline • Using the best setting for the NP coreference resolution

  23. Evaluation • Instance selection • Adopt the method of Soon et al.(2001) • selects for each NP the pairs with the n preceding coreferent instances and all intervening non-coreferent pairs • Soon 1 (n=1) [ Ng and Cardie (2002) ] • Soon 2 (n=2) [ Ng and Cardie (2002) ] • None

  24. Evaluation • Using performance measures for coreference resolution • B-CUBED(Bagga and Baldwin, 1998) • MUC score(Vilain et al.,1995) • Positive identification • Precision, recall and F1 • Using these metrics on the identification of the positive class • By using the pairwise decisions as the classifiers outputs them • Example (see next page) • Actual Positive identification • Precision, recall and F1 • Using these metrics on the identification of the positive class • By performing the clustering of the source NPs and then considering a pairwise decision to be positive if the two source NPs belong to the same cluster • Example (see next page)

  25. (source) 陳水扁 (NP) 陳水扁總統 (source) 馬英九 (NP) 市長馬英九 (source) 陳總統 , 阿扁 (NP) 陳總統 , 阿扁總統

  26. Evaluation

  27. Evaluation

  28. Evaluation

  29. Evaluation

  30. Evaluation

  31. Conclusion • As a first step toward opinion summarization • To target the problem of source coreference resolution • To show that this problem can be tackled effectively as noun coreference resolution • To create a baseline • Next step • To develop a method that utilizes the unlabeled NPs in the corpus using a structured rule learner

More Related