807 text analytics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
807 - TEXT ANALYTICS PowerPoint Presentation
Download Presentation
807 - TEXT ANALYTICS

Loading in 2 Seconds...

play fullscreen
1 / 60

807 - TEXT ANALYTICS - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

807 - TEXT ANALYTICS. Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining). FACTS AND OPINIONS. Two main types of textual information on the Web: FACTS and OPINIONS Current search engines search for facts (assume they are true) Facts can be expressed with topic keywords .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '807 - TEXT ANALYTICS' - hisoki


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
807 text analytics

807 - TEXT ANALYTICS

Massimo PoesioLecture 4: Sentiment analysis (aka Opinion Mining)

facts and opinions
FACTS AND OPINIONS
  • Two main types of textual information on the Web: FACTS and OPINIONS
  • Current search engines search for facts (assume they are true)
    • Facts can be expressed with topic keywords.
sentiment analysis
SENTIMENT ANALYSIS

(also known as opinion mining)

Attempts to identify the opinion/sentiment that a person may hold towards an object

components of an opinion
Components of an opinion
  • Basic components of an opinion:
    • Opinion holder: The person or organization that holds a specific opinion on a particular object.
    • Object: on which an opinion is expressed
    • Opinion: a view, attitude, or appraisal on an object from an opinion holder.
sentiment analysis granularity
SENTIMENT ANALYSIS GRANULARITY
  • At the document (or review) level:
    • Task: sentiment classification of reviews
    • Classes: positive, negative, and neutral
    • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder.
sentiment analysis granularity1
SENTIMENT ANALYSIS GRANULARITY
  • At the document (or review) level:
    • Task: sentiment classification of reviews
    • Classes: positive, negative, and neutral
    • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder.
  • At the sentence level:
    • Task 1: identifying subjective/opinionated sentences
      • Classes: objective and subjective (opinionated)
    • Task 2: sentiment classification of sentences
      • Classes: positive, negative and neutral.
      • Assumption: a sentence contains only one opinion; not true in many cases.
      • Then we can also consider clauses or phrases.
sentence level sentiment analysis example
SENTENCE-LEVEL SENTIMENT ANALYSIS EXAMPLE

Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too.

It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

sentence level sentiment analysis
SENTENCE-LEVEL SENTIMENT ANALYSIS

Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too.

It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

sentence level sentiment analysis1
SENTENCE-LEVEL SENTIMENT ANALYSIS

Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too.

It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

sentiment analysis granularity2
SENTIMENT ANALYSIS GRANULARITY
  • At the feature level:
    • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer).
    • Task 2: Determine whether the opinions on the features are positive, negative or neutral.
    • Task 3: Group feature synonyms.
      • Produce a feature-based opinion summary of multiple reviews.
sentiment analysis granularity3
SENTIMENT ANALYSIS GRANULARITY
  • At the feature level:
    • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer).
    • Task 2: Determine whether the opinions on the features are positive, negative or neutral.
    • Task 3: Group feature synonyms.
      • Produce a feature-based opinion summary of multiple reviews.
  • Opinion holders: identify holders is also useful, e.g., in news articles, etc, but they are usually known in the user generated content, i.e., authors of the posts.
applications
Applications
  • Businesses and organizations:
    • product and service benchmarking.
    • market intelligence.
    • Business spends a huge amount of money to find consumer sentiments and opinions.
      • Consultants, surveys and focused groups, etc
  • Individuals: interested in other’s opinions when
    • purchasing a product or using a service,
    • finding opinions on political topics
  • Ads placements: Placing ads in the user-generated content
    • Place an ad when one praises a product.
    • Place an ad from a competitor if one criticizes a product.
  • Opinion retrieval/search: providing general search for opinions.
lexicon based approaches
LEXICON-BASED APPROACHES
  • Use sentiment and subjectivity lexicons
  • Rule-based classifier
    • A sentence is subjective if it has at least two words in the lexicon
    • A sentence is objective otherwise
supervised classification
SUPERVISED CLASSIFICATION
  • Treat sentiment analysis as a type of classification
  • Use corpora annotated for subjectivity and/or sentiment
  • Train machine learning algorithms:
    • Naïve bayes
    • Decision trees
    • SVM
  • Learn to automatically annotate new text
features for supervised document level sentiment analysis
FEATURES FOR SUPERVISED DOCUMENT-LEVEL SENTIMENT ANALYSIS
  • A large set of features have been tried by researchers (see e.g., work here at Essex by RoselineAntai)
    • Terms frequency and different IR weighting schemes as in other work on classification
    • Part of speech (POS) tags
    • Opinion words and phrases
    • Negations
    • Syntactic dependency
easier and harder problems
EASIER AND HARDER PROBLEMS
  • Tweets from Twitter are probably the easiest
    • short and thus usually straight to the point
  • Reviews are next
    • entities are given (almost) and there is little noise
  • Discussions, comments, and blogs are hard.
    • Multiple entities, comparisons, noisy, sarcasm, etc
aspect based sentiment analysis
ASPECT-BASED SENTIMENT ANALYSIS
  • Sentiment classification at the document or sentence (or clause) levels are useful, but do not find what people liked and disliked.
  • They do not identify the targets of opinions, i.e., ENTITIES and their ASPECTS
  • Without knowing targets, opinions are of limited use.
aspect based sentiment analysis1
ASPECT-BASED SENTIMENT ANALYSIS
  • Much of the research is based on online reviews
  • For reviews, aspect-based sentiment analysisis easier because the entity (i.e., product name) is usually known
    • Reviewers simply express positive and negative opinions on different aspects of the entity.
  • For blogs, forum discussions, etc., it is harder:
    • both entity and aspects of entity are unknown
    • there may also be many comparisons
    • and there is also a lot of irrelevant information.
brief digression
BRIEF DIGRESSION
  • Regular opinions: Sentiment/opinion expressions on some target entities
    • Direct opinions: The touch screen is really cool
    • Indirect opinions: “After taking the drug, my pain has gone”
  • COMPARATIVE opinions: Comparisons of more than one entity.
    • “iPhone is better than Blackberry”
find entities entity set expansion
Find entities (entity set expansion)
  • Although similar, it is somewhat different from the traditional named entity recognition (NER). (See next lectures)
  • E.g., one wants to study opinions on phones
    • given Motorola and Nokia, find all phone brands and models in a corpus, e.g., Samsung, Moto,
feature aspect extraction
Feature/Aspect extraction
  • May extract frequent nouns and noun phrases
    • Sometimes limited to a set known to be related to the entity of interest or using part discriminators
    • e.g., for a scanner entity “scanner”, “scanner has”
  • opinion and target relations
    • Proximity or syntactic dependency
  • Standard IE methods
    • Rule-based or supervised learning
    • Often HMMs or CRFs (like standard IE)
resources for sentiment analysis
RESOURCES FOR SENTIMENT ANALYSIS
  • Annotated corpora
    • Used in statistical approaches (Hu & Liu 2004, Pang & Lee 2004)
    • MPQA corpus (Wiebe et. al, 2005)
  • Tools
    • Algorithm based on minimum cuts (Pang & Lee, 2004)
    • OpinionFinder (Wiebe et. al, 2005)
  • Lexicons
    • General Inquirer (Stone et al., 1966)
    • OpinionFinder lexicon (Wiebe & Riloff, 2005)
    • SentiWordNet (Esuli & Sebastiani, 2006)
sentiment bearing words
Sentiment-bearing words

ICWSM 2008

  • AdjectivesHatzivassiloglou & McKeown 1997, Wiebe 2000, Kamps & Marx 2002, Andreevskaia & Bergler 2006
    • positive:honest important mature large patient
      • Ron Paul is the only honest man in Washington.
      • Kitchell’s writing is unbelievably mature and is only likely to get better.
      • To humour me my patient father agrees yet again to my choice of film
negative adjectives
Negative adjectives

ICWSM 2008

  • Adjectives
    • negative: harmful hypocritical inefficient insecure
      • It was a macabre and hypocritical circus.
      • Why are they being so inefficient ? bjective: curious, peculiar, odd, likely, probably
subjective adjectives
Subjective adjectives

ICWSM 2008

  • Adjectives
    • Subjective (but not positive or negative sentiment): curious, peculiar, odd, likely, probable
      • He spoke of Sue as his probable successor.
      • The two species are likely to flower at different times.
other words
Otherwords

ICWSM 2008

  • Other parts of speechTurney & Littman 2003, Riloff, Wiebe & Wilson 2003, Esuli & Sebastiani 2006
    • Verbs
      • positive:praise, love
      • negative: blame, criticize
      • subjective: predict
    • Nouns
      • positive: pleasure, enjoyment
      • negative: pain, criticism
      • subjective:prediction, feeling
phrases
Phrases

ICWSM 2008

  • Phrases containing adjectives and adverbsTurney 2002, Takamura, Inui & Okumura 2007
    • positive: high intelligence, low cost
    • negative: little variation, many troubles
creating sentiment lexica
Creating sentiment lexica

ICWSM 2008

Humans

Semi-automatic

Fully automatic

semi automatic creation of sentiment lexica
(Semi) Automatic creation of sentiment lexica

ICWSM 2008

  • Find relevant words, phrases, patterns that can be used to express subjectivity
  • Determine the polarity of subjective expressions
using patterns
USING PATTERNS

ICWSM 2008

Lexico-syntactic patternsRiloff & Wiebe 2003

way with <np>:… to ever let China use force to have its way with …

expense of <np>: at the expense of the world’s security and stability

underlined <dobj>: Jiang’s subdued tone … underlined his desire to avoid disputes …

definitions and annotation scheme
Definitions and Annotation Scheme

ICWSM 2008

  • Manual annotation: human markup of corpora (bodies of text)
  • Why?
    • Understand the problem
    • Create gold standards (and training data)

Wiebe, Wilson, Cardie LRE 2005

Wilson & Wiebe ACL-2005 workshop

Somasundaran, Wiebe, Hoffmann, Litman ACL-2006 workshop

Somasundaran, Ruppenhofer, Wiebe SIGdial 2007

Wilson 2008 PhD dissertation

overview
Overview

ICWSM 2008

  • Fine-grained: expression-level rather than sentence or document level
  • Annotate
    • Subjective expressions
    • material attributed to a source, but presented objectively
corpus
Corpus

ICWSM 2008

  • MPQA: www.cs.pitt.edu/mqpa/databaserelease (version 2)
  • English language versions of articles from the world press (187 news sources)
  • Also includes contextual polarity annotations (later)
  • Themes of the instructions:
    • No rules about how particular words should be annotated.
    • Don’t take expressions out of contextand think about what they could mean,but judge them as they are used in that sentence.
gold standards
Gold Standards

ICWSM 2008

  • Derived from manually annotated data
  • Derived from “found” data (examples):
    • Blog tags Balog, Mishne, de Rijke EACL 2006
    • Websites for reviews, complaints, political arguments
      • amazon.com Pang and Lee ACL 2004
      • complaints.com Kim and Hovy ACL 2006
      • bitterlemons.com Lin and Hauptmann ACL 2006
  • Word lists (example):
    • General Inquirer Stone et al. 1996
readings
READINGS
  • Bo Pang & Lillian Lee, 2008 – Opinion Mining and Sentiment Analysis – Foundations and Trends in Information Retrieval, v. 2, 1-2
    • On the website
acknowledgments
ACKNOWLEDGMENTS
  • Some slides borrowed from
    • JanyceWiebe’s tutorials
    • Bing Liu’s tutorials
    • Ronen Feldman’s IJCAI 2013 tutorial