1 / 60

807 - TEXT ANALYTICS

807 - TEXT ANALYTICS. Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining). FACTS AND OPINIONS. Two main types of textual information on the Web: FACTS and OPINIONS Current search engines search for facts (assume they are true) Facts can be expressed with topic keywords .

hisoki
Download Presentation

807 - TEXT ANALYTICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 807 - TEXT ANALYTICS Massimo PoesioLecture 4: Sentiment analysis (aka Opinion Mining)

  2. FACTS AND OPINIONS • Two main types of textual information on the Web: FACTS and OPINIONS • Current search engines search for facts (assume they are true) • Facts can be expressed with topic keywords.

  3. THERE IS PLENTY OF OPINIONS IN THE WEB

  4. SENTIMENT ANALYSIS (also known as opinion mining) Attempts to identify the opinion/sentiment that a person may hold towards an object

  5. Components of an opinion • Basic components of an opinion: • Opinion holder: The person or organization that holds a specific opinion on a particular object. • Object: on which an opinion is expressed • Opinion: a view, attitude, or appraisal on an object from an opinion holder.

  6. SENTIMENT ANALYSIS GRANULARITY • At the document (or review) level: • Task: sentiment classification of reviews • Classes: positive, negative, and neutral • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder.

  7. DOCUMENT-LEVEL SENTIMENT ANALYSIS EXAMPLE

  8. SENTIMENT ANALYSIS GRANULARITY • At the document (or review) level: • Task: sentiment classification of reviews • Classes: positive, negative, and neutral • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder. • At the sentence level: • Task 1: identifying subjective/opinionated sentences • Classes: objective and subjective (opinionated) • Task 2: sentiment classification of sentences • Classes: positive, negative and neutral. • Assumption: a sentence contains only one opinion; not true in many cases. • Then we can also consider clauses or phrases.

  9. SENTENCE-LEVEL SENTIMENT ANALYSIS EXAMPLE Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

  10. SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

  11. SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

  12. SENTIMENT ANALYSIS GRANULARITY • At the feature level: • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). • Task 2: Determine whether the opinions on the features are positive, negative or neutral. • Task 3: Group feature synonyms. • Produce a feature-based opinion summary of multiple reviews.

  13. SENTIMENT ANALYSIS GRANULARITY • At the feature level: • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). • Task 2: Determine whether the opinions on the features are positive, negative or neutral. • Task 3: Group feature synonyms. • Produce a feature-based opinion summary of multiple reviews. • Opinion holders: identify holders is also useful, e.g., in news articles, etc, but they are usually known in the user generated content, i.e., authors of the posts.

  14. FEATURE-LEVEL SENTIMENT ANALYSIS

  15. ENTITY AND ASPECT (Hu and Liu, 2004; Liu, 2006)

  16. OPINION TARGET

  17. A DEFINITION OF OPINION (Liu, Ch. in NLP handbook, 2010)

  18. SENTIMENT ANALYSIS: THE TASK

  19. Applications • Businesses and organizations: • product and service benchmarking. • market intelligence. • Business spends a huge amount of money to find consumer sentiments and opinions. • Consultants, surveys and focused groups, etc • Individuals: interested in other’s opinions when • purchasing a product or using a service, • finding opinions on political topics • Ads placements: Placing ads in the user-generated content • Place an ad when one praises a product. • Place an ad from a competitor if one criticizes a product. • Opinion retrieval/search: providing general search for opinions.

  20. DOCUMENT-LEVEL SENTIMENT ANALYSIS

  21. DOCUMENT-LEVEL SENTIMENT ANALYSIS

  22. DOCUMENT-LEVEL SENTIMENT ANALYSIS = TEXT CLASSIFICATION

  23. ASSUMPTIONS AND GOALS

  24. LEXICON-BASED APPROACHES • Use sentiment and subjectivity lexicons • Rule-based classifier • A sentence is subjective if it has at least two words in the lexicon • A sentence is objective otherwise

  25. SUPERVISED CLASSIFICATION • Treat sentiment analysis as a type of classification • Use corpora annotated for subjectivity and/or sentiment • Train machine learning algorithms: • Naïve bayes • Decision trees • SVM • … • Learn to automatically annotate new text

  26. TYPICAL SUPERVISED APPROACH

  27. FEATURES FOR SUPERVISED DOCUMENT-LEVEL SENTIMENT ANALYSIS • A large set of features have been tried by researchers (see e.g., work here at Essex by RoselineAntai) • Terms frequency and different IR weighting schemes as in other work on classification • Part of speech (POS) tags • Opinion words and phrases • Negations • Syntactic dependency

  28. EASIER AND HARDER PROBLEMS • Tweets from Twitter are probably the easiest • short and thus usually straight to the point • Reviews are next • entities are given (almost) and there is little noise • Discussions, comments, and blogs are hard. • Multiple entities, comparisons, noisy, sarcasm, etc

  29. ASPECT-BASED SENTIMENT ANALYSIS • Sentiment classification at the document or sentence (or clause) levels are useful, but do not find what people liked and disliked. • They do not identify the targets of opinions, i.e., ENTITIES and their ASPECTS • Without knowing targets, opinions are of limited use.

  30. ASPECT-BASED SENTIMENT ANALYSIS • Much of the research is based on online reviews • For reviews, aspect-based sentiment analysisis easier because the entity (i.e., product name) is usually known • Reviewers simply express positive and negative opinions on different aspects of the entity. • For blogs, forum discussions, etc., it is harder: • both entity and aspects of entity are unknown • there may also be many comparisons • and there is also a lot of irrelevant information.

  31. BRIEF DIGRESSION • Regular opinions: Sentiment/opinion expressions on some target entities • Direct opinions: The touch screen is really cool • Indirect opinions: “After taking the drug, my pain has gone” • COMPARATIVE opinions: Comparisons of more than one entity. • “iPhone is better than Blackberry”

  32. Find entities (entity set expansion) • Although similar, it is somewhat different from the traditional named entity recognition (NER). (See next lectures) • E.g., one wants to study opinions on phones • given Motorola and Nokia, find all phone brands and models in a corpus, e.g., Samsung, Moto,

  33. Feature/Aspect extraction • May extract frequent nouns and noun phrases • Sometimes limited to a set known to be related to the entity of interest or using part discriminators • e.g., for a scanner entity “scanner”, “scanner has” • opinion and target relations • Proximity or syntactic dependency • Standard IE methods • Rule-based or supervised learning • Often HMMs or CRFs (like standard IE)

  34. Aspect extraction using dependency grammar

  35. RESOURCES FOR SENTIMENT ANALYSIS • Annotated corpora • Used in statistical approaches (Hu & Liu 2004, Pang & Lee 2004) • MPQA corpus (Wiebe et. al, 2005) • Tools • Algorithm based on minimum cuts (Pang & Lee, 2004) • OpinionFinder (Wiebe et. al, 2005) • Lexicons • General Inquirer (Stone et al., 1966) • OpinionFinder lexicon (Wiebe & Riloff, 2005) • SentiWordNet (Esuli & Sebastiani, 2006)

  36. Lexical resources for Sentiment and Subjectivity Analysis Overview

  37. Sentiment (or opinion) lexica

  38. Sentiment lexica

  39. Sentiment-bearing words ICWSM 2008 • AdjectivesHatzivassiloglou & McKeown 1997, Wiebe 2000, Kamps & Marx 2002, Andreevskaia & Bergler 2006 • positive:honest important mature large patient • Ron Paul is the only honest man in Washington. • Kitchell’s writing is unbelievably mature and is only likely to get better. • To humour me my patient father agrees yet again to my choice of film

  40. Negative adjectives ICWSM 2008 • Adjectives • negative: harmful hypocritical inefficient insecure • It was a macabre and hypocritical circus. • Why are they being so inefficient ? bjective: curious, peculiar, odd, likely, probably

  41. Subjective adjectives ICWSM 2008 • Adjectives • Subjective (but not positive or negative sentiment): curious, peculiar, odd, likely, probable • He spoke of Sue as his probable successor. • The two species are likely to flower at different times.

  42. Otherwords ICWSM 2008 • Other parts of speechTurney & Littman 2003, Riloff, Wiebe & Wilson 2003, Esuli & Sebastiani 2006 • Verbs • positive:praise, love • negative: blame, criticize • subjective: predict • Nouns • positive: pleasure, enjoyment • negative: pain, criticism • subjective:prediction, feeling

  43. Phrases ICWSM 2008 • Phrases containing adjectives and adverbsTurney 2002, Takamura, Inui & Okumura 2007 • positive: high intelligence, low cost • negative: little variation, many troubles

  44. Creating sentiment lexica ICWSM 2008 Humans Semi-automatic Fully automatic

  45. (Semi) Automatic creation of sentiment lexica ICWSM 2008 • Find relevant words, phrases, patterns that can be used to express subjectivity • Determine the polarity of subjective expressions

  46. FINDING POLARITY IN CORPORA USING PATTERNS

  47. USING PATTERNS ICWSM 2008 Lexico-syntactic patternsRiloff & Wiebe 2003 way with <np>:… to ever let China use force to have its way with … expense of <np>: at the expense of the world’s security and stability underlined <dobj>: Jiang’s subdued tone … underlined his desire to avoid disputes …

  48. DICTIONARY-BASED METHODS

  49. SEMI-SUPERVISED LEARNING(Esuti and Sebastiani, 2005)

  50. Corpora for Sentiment and Subjectivity Analysis Overview

More Related