Opinion mining
This presentation is the property of its rightful owner.
Sponsored Links
1 / 59

Opinion Mining PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Opinion Mining. James G. Shanahan [email protected] Clairvoyance Corporation Pittsburgh, PA. Clairvoyance Corporate Research. CC Spunoff from Carnegie Mellon University in 1992, Acquired by Justsystem (Japan) in 1996 4 P Research Philosopy (Pertinence), Profit, Pain Killer Patent

Download Presentation

Opinion Mining

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Opinion mining

Opinion Mining

James G. Shanahan

[email protected]

Clairvoyance Corporation

Pittsburgh, PA


Clairvoyance corporate research

Clairvoyance Corporate Research

  • CC Spunoff from Carnegie Mellon University in 1992, Acquired by Justsystem (Japan) in 1996

  • 4 P Research Philosopy

    • (Pertinence), Profit, Pain Killer

    • Patent

    • Prototype

    • Publish

  • Pertinence

    • Corporate knowledge/information management

    • High performance Information retrieval

    • Cross language information retrieval

    • Machine Learning

    • Ontologies


Opinion mining outline

Opinion Mining Outline

  • Background

  • Monolingual Opinion Mining

  • Multilingual Opinion Mining

  • Conclusions


Opinion mining1

Opinion Mining

Motivation and Background

  • Current information management systems operate at a low level with only some semantics

  • Much of product feedback is web-based

    • provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM Portals.

  • Market research is becoming unwieldy

    • Sources are heterogeneous and, increasingly, multilingual in nature


Examples of opinion on www

Examples of Opinion on WWW


Examples of opinion on www1

Examples of Opinion on WWW


Amazon co jp

Amazon.co.jp


Affect in a reporting point of view

Affect in a Reporting Point of View

“Microsoft Togetherness”

Economist, January 22–28th, 2000, Business

There is both more and less than meets the eye to the decision of Bill Gates to pass the chief executive’s mantle to his best friend, Steve Ballmer. It is still business as usual at the world’s biggest software company. … Nor does the move presage a change in strategy. A belligerent Mr Ballmer reaffirmed the company’s hardline approach to defending the continuing antitrust action, predictably describing the break-up of the company that the government is rumoured to favour as reckless and irresponsible. Although Mr Gates spoke excitedly about Next Generation Windows Services (NGWS), a new idea that he would be working on, it is, in effect, just an ugly umbrella name for the grand Internet strategy under development at Redmond for some time. …


Crm support desk inquiries

CRM: Support Desk Inquiries

I spoke today with an hp technican and he really upset me.

He told me that sj 4100 (usb) will be not supported.

There won't be any patches.

Can someone confirm that because I'm really pissed off.


Monolingual opinion mining

Monolingual Opinion Mining

WWW

Opinion

Spider

E-opinion

Sites

CRM

Opinion

Classifier

Product XYZ

Very Popular in US

Opinion

Aggregator


Monolingual opinion mining1

Monolingual Opinion Mining

  • Build a positive and negative opinion classifier

  • Spider web for opinion on a product/person

    • Go to well-known opinion sites

    • Search web for related text

  • Classify each piece of text

  • Aggregate classifications


Outline

Outline

  • Background

  • Monolingual Opinion Mining

  • Multilingual Opinion Mining

  • Conclusions


Monolingual opinion mining2

Monolingual Opinion Mining

  • Lexicon-based approaches

  • Supervised learning approaches

  • Mixed Learning approaches

  • Hybrid approaches

Human input


Monolingual opinion mining3

Monolingual Opinion Mining

  • Lexicon-based approaches

  • Supervised learning approaches

  • Mixed Learning approaches

  • Hybrid approaches

Human input


Lexicon based approaches

Lexicon-based Approaches

  • Human provides linguistic resources

    • A linguist(s) characterizes each word along two dimensions: centrality (indicating the degree a word belongs to an affect category); and intensity (representing the strength of the affect level described by that entry). [Subasic and Huettner, 2000; Liu et al. 2003; Sano 2003; Tong 2001],

    • Liu et al. explore the use of the Open Mind Commonsense database as means of constructing a model for measuring the affective qualities of writing emails. This model is based upon a six state affect lexicon that is manually constructed. [Liu et al. 2003]


Clairvoyance fuzzy affect lexicon

<lexical entry> <POS> <category> <centrality> <intensity>

"arrogance" sn "superiority" 0.7 0.9

Clairvoyance Fuzzy Affect Lexicon

[Subasic and Huettner, 2000]


Categories scales

Categories/Scales

CategoryOpposing Category

absurdityreasonableness

advantagedisadvantage

amityanger

attractionrepulsion

avoidancedesire

boredomexcitement

clarityconfusion

conflictcooperation

couragefear

creationdestruction

crimepublic-spiritedness

death

deceptionhonesty

desireavoidance

[83 Total]


Pos centralities intensities

POS, Centralities, Intensities

"emasculate" vb "weakness"

"emasculate" vb "lack"

"emasculate" vb "violence"

0.7

0.8

0.4

0.9

0.3

0.4


Subjective judgments give ranks

Subjective Judgments Give Ranks


Distribution of affect typing

Distribution of Affect Typing


Fuzzy thesaurus

attraction love0.80

Fuzzy Thesaurus

admiration sn attraction0.80 0.50

admire vb attraction0.80 0.50

…dazzle vb attraction0.80 0.90...

magnetism sn attraction1.00 0.50

adoration sn love0.90 1.00

adore vb love0.90 1.00

…dazzle vblove0.901.00

...

passionate adj love0.70 0.90


Opinion mining

Fuzzy Tagging

macabre,adj,death,0.50,0.60

macabre,adj,horror,0.90,0.60

...

savage,adj,violence,1.00,1.00

...

secret,sn,slyness,0.50,0.50

secret,sn,deception,0.50,0.50

prosperous,adj,surfeit,0.50,0.50

rat,sn,disloyalty,0.30,0.90

rat,sn,horror,0.20,0.60

rat,sn,repulsion,0.60,0.70

...

portent,sn,promise,0.70,0.90

portent,sn,warning,1.00,0.80

...

surrealistic,adj,absurdity,0.80,0.50

surrealistic,adj,creation,0.30,0.40

surrealistic,adj,insanity,0.50,0.30

surrealistic,adj,surprise,0.30,0.30

success,sn,success,1.00,0.60

whisper,vb,slyness,0.40,0.50

whisper,vb,slander,0.40,0.40

...

greed,sn,desire,0.60,1.00

greed,sn,greed,1.00,0.70

lust,vb,desire,0.80,0.90

envy,sn,desire,0.7,0.6

envy,sn,greed,0.7,0.6

envy,sn,inferiority,0.4,0.4

envy,sn,lack,0.5,0.5

envy,sn,slyness,0.5,0.6

fill,sn,surfeit,0.70,0.40

Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws.

Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws.

violence 1.0

humor 1.0

warning 1.0

anger 1.0

success 1.0

slander 1.0

greed 1.0

horror 0.90

aversion 0.90

absurdity 0.80

excitement 0.80

desire 0.80

pleasure 0.70

promise 0.70

surfeit 0.70

repulsion 0.60

fear 0.60

lack 0.50

death 0.50

slyness 0.50

intelligence 0.50

deception 0.50

insanity 0.50

clarity 0.40

innocence 0.40

inferiority 0.40

pain 0.30

disloyalty 0.30

failure 0.30

creation 0.30

surprise 0.30


Assessing affect in n dimensions

Excitement

0.6

Humor

Intelligence

0.7

0.2

0.4

0.8

Love

Fear

Assessing Affect(In N Dimensions)


Opinion mining

Affect Selective Visualization

Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws.


Opinion mining

Affect Total Visualization


Opinion mining

Fuzzy Retrieval

Retrieve D1


Lexicon based approaches1

Lexicon-based Approaches

  • Human provides linguistic resources

    • A linguist(s) characterizes each word along two dimensions: centrality (indicating the degree a word belongs to an affect category); and intensity (representing the strength of the affect level described by that entry). [Subasic and Huettner, 2000; Liu et al. 2003; Sano 2003],

    • Liu et al. explore the use of the Open Mind Commonsense database as means of constructing a model for measuring the affective qualities of writing emails. This model is based upon a six state affect lexicon that is manually constructed. [Liu et al. 2003]

  • These approaches, while being interesting, are labor intensive and can be vulnerable to error and high maintenance costs.

  • However, can grow lexicon automatically

    • PMI Semantic orientation by association


Monolingual opinion mining4

Monolingual Opinion Mining

  • Lexicon-based approaches

  • Supervised learning approaches

  • Mixed Learning approaches

  • Hybrid approaches

Human input


Opinion classifier

Opinion Classifier

Requirements

  • A labeled database of opinion

    • Download ratings from Amazon.com, epinions.com etc.

  • Build a binary opinion classifier

    • From positive and negative ratings

      • Merge 1 and 2 stars to negative and 3, 4 and 5 to positive

    • Using a thresholded SVM (support vector machine)


Supervised learning approaches

Supervised learning approaches

  • Generating systems automatically for affect and opinion modeling.

    • Pang et al.’s work on classifying movie ratings (Pang et al. 2002).

    • Machine learning and information retrieval approaches were compared for the task of product ratings classification (Dave et al. 2003).

    • Das and Chen used a classifier on investor bulletin boards to see if apparently positive ratings are correlated with positive stock price.

    • Shanahan et al. use a thresholded SVM (support vector machine) [Shanahan et al. 2003]

  • Require labeled data


Monolingual opinion mining5

Monolingual Opinion Mining

  • Lexicon-based approaches

  • Supervised learning approaches

  • Mixed Learning approaches

  • Hybrid approaches

Human input


Mixed learning approaches

Mixed Learning approaches

  • Learning semantic orientation and intensity of terms

    • A word is characterised by the company it keeps [Firth 1957]

    • Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315-346

    • Clustering and classification, [Hatzivassiloglou and Mc Keown 1997]

    • “Extending affect lexicon”, AAAI EAAT Spring Symposium 2003, Grefenstette, Evans, Qu, Shanahan

  • Potentially a very cheap, general and powerful approach

  • Needs further evaluation


Mixed learning

Mixed Learning

Semantic Orientation by Assocation

  • “A word is characterised by the company it keeps” [Firth 1957]

  • Each word is characterised by its orientation (nice, nasty) and intensity (okay, fabulous)

  • Provide a set of labeled positive (Pwords) and negative (Nword) oriented words

  • The semantic orientation (i.e., positive or negative) of a word is calculated from the strength of its association with a set of positive words, minus the strength of association with negative words


Labeled semantic orientation words

Labeled Semantic Orientation Words

  • Pwords =

    • {good, nice, excellent, positive, fortunate, correct, superior}

  • Nwords =

    • {bad, nasty, poor, negative, unfortunate, wrong, inferior}.

  • Labeled words provided by linguist

    • The sets consist of opposing pairs

    • Words are insensitive to context (i.e., almost always have the same meaning)


Semantic orientation by assocation

Semantic Orientation by assocation

  • Various approach to calculate the semantic association of two words

    • Pointwise Mutual Information (PMI) [Church and Hanks 1989]

    • Latent Semantic Indexing (LSI) Dumais et al. 1990]

    • Likelihood Ratios [Dunning 1993]


Semantic orientation

Semantic Orientation

Pointwise Mutual Information (PMI)

  • Introduced by [Church and Hanks 1989]

  • PMI is a form of correlation measure

    • Positive when words co-occur and negative otherwise

  • Given two words, word1 and word2

Pwords = {good, nice, excellent, positive, fortunate, correct, superior}

Nwords = {bad, nasty, poor, negative, unfortunate, wrong, inferior}.


Pointwise mutual information pmi

Pointwise Mutual Information (PMI)

  • PMI can be calculated by issuing queries to a search engine

  • Given two words, word1 and word2


Semantic orientation results

Semantic Orientation Results


Semantic orientation by assocation1

Semantic Orientation by Assocation

Summary

  • A word is characterised by the company it keeps [Firth 19957]

  • Bootstrap from 14 paradigm words

    • Use LSA or Pointwise mutual information (PMI) to calculate the SO and intensity of expressions/sentences/documents

  • Very encouraging results, but needs further evaluation


Hybrid approach

Hybrid Approach

  • The Clairvoyance Affect lexicon is incomplete

    • 4,000 entries

  • Automatically, compute the orientation and intensity of unseen terms using PMI

    • “Extending affect lexicon”, AAAI EAAT Spring Symposium 2003, Grefenstette, Evans, Qu, Shanahan


Commercial efforts

Commercial Efforts

  • Commercial efforts (lexicon-based)

    • Justsystem’s CB Market Intelligence system that organizes feedback data in an affect map (Sano 2003).

    • NEC’s SurveyAnalyzer, which mines the reputations of products (Morinaga et al. 2003).

    • SPSS’s TextSmart

  • Other Applications

    • Opinion Timelines

    • Flame control

      • Emails/chat/comunication

      • Newsgroups

    • Directed Search

    • Survey analysis

    • CRM


Outline1

Outline

  • Background

  • Monolingual Opinion Mining

  • Multilingual Opinion Mining

  • Conclusions


Opinion mining2

Opinion Mining

Motivation and Background

  • Much of product feedback is web-based

    • provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM.

  • Market research is becoming unwieldy

    • Sources are heterogeneous and, increasingly, multilingual in nature


Multilingual opinion mining

Multilingual Opinion Mining

  • The approaches presented above all focus on the monolingual aspects of affect and opinion.

  • Multilingual Opinion Mining

    • Practical Solution

      • Build a classifier for each language from labeled data

    • Research Solution

      • Examine a combination of classification and translation to build opinion classifiers


Monolingual opinion mining6

Monolingual Opinion Mining

WWW

Opinion

Spider

E-opinion

Sites

Opinion

Classifier

Product XYZ

Very Popular in US

Opinion

Aggregator


Opinion classifier1

Opinion Classifier

Requirements

  • A labeled database of opinion

    • Download ratings from Amazon.com, epinions.com etc.

  • Build a binary opinion classifier

    • From positive and negative ratings

      • Merge 1 and 2 stars to negative and 3, 4 and 5 to positive

    • Using an SVM (support vector machine)


Multilingual opinion mining 0

Multilingual Opinion Mining (0)

  • For each language build a corresponding monolingual opinion classifier


Multilingual opinion mining1

Multilingual Opinion Mining

WWW

Opinion

Spider

E-opinion

Sites

Chinese

English

Japanese

Opinion

Classifier

Product XYZ

Very Popular in US

Not so Popular in Japan

Overall so so world popularity

Opinion

Aggregator


Multilingual opinion mining 1

Multilingual Opinion Mining (1)

Labeled Ratings

Classification + Translation

Translate Corpus

Lang3

Classifier3

Lang 2

Classifier2

Train Classifiers

Classifier1

Lang 1

Product XYZ

Popular in US

Popular in Japan

WWW

Mine Opinion


Multilingual opinion mining 2

Multilingual Opinion Mining (2)

Classification + Translation

Labeled Ratings

Train Classifiers

Classifier3

Classifier2

Translate Classifiers

Classifier1

Product XYZ

Popular in US

Popular in Japan

WWW

Mine Opinion


Multilingual opinion mining2

Multilingual Opinion Mining

Classification + Translation

  • Approach 0

    • Build classifier for each language using labeled native documents

  • Approach 1

    • Translate source language documents and build classifiers from translated documents

  • Approach 2

    • Build source language classifier and translate classifier into target languages using cross language IR (CLIR)

  • Approach 3

    • Build target language classifiers from target language documents and translated source language documents


Results

Results


Experimental setup

Experimental Setup


Amazon com ratings corpus

Amazon.com Ratings Corpus

  • Downloaded reviews from Amazon.com

    • For a variety of books.

  • Reduced five star scale to two classes:

    • a positive class corresponding to scalar values of 3 (3 stars), 4, and 5; and a negative class (1 and 2 stars).

  • Results


Outline2

Outline

  • Background

  • Monolingual Opinion Mining

  • Multilingual Opinion Mining

  • Conclusions


Conclusions

Conclusions

  • Various strategies for building (multilingingual) opinion miners

  • Many applications

  • Ongoing work

    • Build and evaluate proposed approaches

    • AAAI 2004 Spring Symposium on Affect Analysis


Interests and ongoing work

Interests and Ongoing Work

  • Adhoc information retrieval (TREC HARD,

  • Anticipatory information systems

    • Document Souls

  • Clustering

  • Evaluation

  • Machine learning (support vector machines, probabilistic models)

  • Multilingual Opinion Mining


The end

The End


  • Login