1 / 46

Document-level Semantic Orientation and Argumentation

Document-level Semantic Orientation and Argumentation. Presented by Marta Tatu CS7301 March 15, 2005.  or ? Semantic Orientation Applied to Unsupervised Classification of Reviews. Peter D. Turney ACL-2002. Overview.

basil
Download Presentation

Document-level Semantic Orientation and Argumentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document-level Semantic Orientation and Argumentation Presented by Marta Tatu CS7301 March 15, 2005

  2.  or ? Semantic Orientation Applied to Unsupervised Classification of Reviews Peter D. Turney ACL-2002

  3. Overview • Unsupervised learning algorithm for classifying reviews as recommended or not recommended • The classification is based on the semantic orientation of the phrases in the review which contain adjectives and adverbs

  4. Algorithm Input: review • Identify phrases that contain adjectives or adverbsby using a part-of-speech tagger • Estimate the semantic orientation of each phrase • Assign a class to the given review based on the average semantic orientation of its phrases Output: classification ( or )

  5. Step 1 • Apply Brill’s part-of-speech tagger on the review • Adjective are good indicators of subjective sentences. In isolation: • unpredictable steering () / plot () • Extract two consecutive words: one is an adjective or adverb, the other provides the context

  6. Step 2 • Estimate the semantic orientation of the extracted phrases using PMI-IR (Turney, 2001) • Pointwise Mutual Information (Church and Hanks, 1989): • Semantic Orientation: • PMI-IR estimates PMI by issuing queries to a search engine (Altavista, ~350 million pages)

  7. Step 2 – continued • Added 0.01 to hits to avoid division by zero • If hits(phrase NEAR “excellent”) and hits(phrase NEAR “poor”)≤4, then eliminate phrase • Added “AND (NOT host:epinions)” to the queries not to include the Epinions website

  8. Step 3 • Calculate the averagesemantic orientation of the phrases in the given review • If the average is positive, then  • If the average is negative, then 

  9. Experiments • 410 reviews from Epinions • 170 (41%) () • 240 (59%) () • Average phrases per review: 26 • Baseline accuracy: 59%

  10. Discussion • What makes the movies hard to classify? • The average SO tends to classify a recommended movies as not recommended • Evil characters make good movies • The whole is not necessarily the sum of the parts • Good beaches do not necessarily add up to a good vacation • But good automobile parts usually add up to a good automobile

  11. Applications • Summary statistics for search engines • Summarization of reviews • Pick out the sentence with the highest positive/negative semantic orientation given a positive/negative review • Filtering “flames” for newsgroups • When the semantic orientation drops below a threshold, the message might be a potential flame

  12. Questions ? • Comments ? • Observations ?

  13. ? Sentiment Classification using Machine Learning Techniques Bo Pang, Lillian Lee and Shivakumar Vaithyanathan EMNLP-2002

  14. Overview • Consider the problem of classifying documents by overall sentiment • Three machine learning methods besides the human-generated lists of words • Naïve Bayes • Maximum Entropy • Support Vector Machines

  15. Experimental Data • Movie-review domain • Source: Internet Movie Database (IMDb) • Stars or numerical value ratings converted into positive, negative, or neutral » no need to hand label the data for training or testing • Maximum of 20 reviews/author/sentiment category • 752 negative reviews • 1301 positive reviews • 144 reviewers

  16. List of Words Baseline • Maybe there are certain words that people tend to use to express strong sentiments • Classification done by counting the number of positive and negative words in the document • Random-choice baseline: 50%

  17. Machine Learning Methods • Bag-of-features framework: • {f1,…,fm} predefined set of m features • ni(d) = number of times fi occurs in document d • (Naïve Bayes)

  18. Machine Learning Methods – continued • (Maximum Entropy) where Fi,c is a feature/class function: • Support vector machines: Find hyperplane that maximizes the margin. The constraint optimization problem: • cj is the correct class of document dj

  19. Evaluation • 700 positive-sentiment and 700 negative-sentiment documents • 3 equal-sized folds • The tag “NOT_” was added to every word between a negation word (“not”, “isn’t”, “didn’t”) and the first punctuation mark • “good” is opposite to “not very good” • Features: • 16,165 unigrams appearing at least 4 times in the 1400-document corpus • 16,165 most often occurring bigrams in the same data

  20. Results • POS information added to differentiate between: “I love this movie” and “This is a love story”

  21. Conclusion • Results produced by the machine learning techniques are better than the human-generated baselines • SVMs tend to do the best • Unigram presence information is the most effective • Frequency vs. presence: “thwarted expectation”, many words indicative of the opposite sentiment to that of the entire review • Some form of discourse analysis is necessary

  22. Questions ? • Comments ? • Observations ?

  23. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status Simone Teufel and Marc Moens CL-2002

  24. Overview • Summarization of scientific articles: restore the discourse context of extracted material by adding the rhetorical status of each sentence in the document • Gold standard data for summaries consisting of computational linguistics articles annotated with the rhetorical status and relevance for each sentence • Supervised learning algorithm which classifies sentences into 7 rhetorical categories

  25. Why? • Knowledge about the rhetorical status of the sentence enables the tailoring of the summaries according to user’s expertise and task • Nonexpert summary: background information and the general purpose of the paper • Expert summary: no background, instead differences between this approach and similar ones • Contrasts or complementarity among articles can be expressed

  26. Rhetorical Status • Generalizations about the nature of scientific texts + information to enable the construction of better summaries • Problem structure: problems (research goals), solutions (methods), and results • Intellectual attribution: what the new contribution is, as opposed to previous work and background (generally accepted statements) • Scientific argumentation • Attitude toward other people’s work: rival approach, prior approach with a fault, or an approach contributing parts of the authors’ own solution

  27. Metadiscourse and Agentivity • Metadiscourse is an aspect of scientific argumentation and a way of expressing attitude toward previous work • “we argue that”, “in contrast to common belief, we” • Agent roles in argumentation: rivals, contributors of part of the solution (they), the entire research community, or the authors of the paper (we)

  28. Citations and Relatedness • Just knowing that an article cites another is often not enough • One needs to read the context of the citation to understand the relation between the articles • Article cited negatively or contrastively • Article cited positively or in which the authors state that their own work originates from the cited work

  29. Rhetorical Annotation Scheme • Only one category assigned to each full sentence • Nonoverlapping, nonhierarchical scheme • The rhetorical status is determined on the basis of the global context of the paper

  30. Relevance • Select important content from text • Highly subjective » low human agreement • Sentence is considered relevant if it describes the research goal or states a difference with a rival approach • Other definitions: relevant sentence if it shows a high level of similarity with a sentence in the abstract

  31. Corpus • 80 conference articles • Association for Computational Linguistics (ACL) • European Chapter of the Association for Computational Linguistics (EACL) • Applied Natural Language Processing (ANLP) • International Joint Conference on Artificial Intelligence (IJCAI) • International Conference on Computational Linguistics (COLING). • XML markups added

  32. The Gold Standard • 3 tasked-trained annotators • 17 pages of guidelines • 20 hours of training • No communication between annotators • Evaluation measures of the annotation: • Stability • Reproducibility

  33. Results of Annotation • Kappa coefficient K(Siegel and Castellan, 1988) where P(A)= pairwise agreement and P(E)= random agreement • Stability: K=.82, .81, .76(N=1,220 and k=2) • Reproducibility: K=.71

  34. The System • Supervised machine learning Naïve Bayes

  35. Features • Absolute location of a sentence • Limitations of the author’s own method can be expected to be found toward the end, while limitations of other researchers’ work are discussed in the introduction

  36. Features – continued • Section structure: relative and absolute position of sentence within section: • First, last, second or third, second-last or third-last, or either somewhere in the first, second, or last third of the section • Paragraph structure: relative position of sentence within a paragraph • Initial, medial, or final

  37. Features – continued • Headlines: type of headline of current section • Introduction, Implementation, Example, Conclusion, Result, Evaluation, Solution, Experiment, Discussion, Method, Problems, Related Work, Data, Further Work, Problem Statement, or Non-Prototypical • Sentence length • Longer or shorter than 12 words (threshold)

  38. Features – continued • Title word contents: does the sentence contain words also occurring in the title? • TF*IDF word contents • High values to words that occur frequently in one document, but rarely in the overall collection of documents • Do the 18 highest-scoring TF*IDF words belong to the sentence? • Verb syntax: voice, tense, and modal linguistic features

  39. Features – continued • Citation • Citation (self), citation (other), author name, or none + location of the citation in the sentence (beginning, middle, or end) • History: most probable previous category • AIM tends to follow CONTRAST • Calculated as a second pass process during training

  40. Features – continued • Formulaic expressions: list of phrases described by regular expressions, divided into 18 classes, comprising a total of 644 patterns • Clustering prevents data sparseness

  41. Features – continued • Agent: 13 types, 167 patterns • The placeholder WORK_NOUN can be replaced by a set of 37 nouns including theory, method, prototype, algorithm • Agent classes with a distribution very similar with the overall distribution of target categories were excluded

  42. Features – continued • Action: 365 verbs clustered into 20 classes based on semantic concepts such as similarity, contrast • PRESENTATION_ACTIONs: present, report, state • RESEARCH_ACTIONs: analyze, conduct, define, and observe • Negation is considered

  43. System Evaluation • 10-fold-cross-validation

  44. Feature Impact • The most distinctive single feature is Location, followed by SegAgent, Citations, Headlines, Agent and Formulaic

  45. Questions ? • Comments ? • Observations ?

  46. Thank You !

More Related