1 / 32

Mining the peanut gallery: Opinion extraction and semantic classification of product reviews

Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Kushal Dave Steve Lawrence David M. Pennock IBM Google Overture. Work done at NEC Laboratories. Problem. Many reviews spread out across Web Product-specific sites

jalila
Download Presentation

Mining the peanut gallery: Opinion extraction and semantic classification of product reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal Dave Steve Lawrence David M. Pennock IBM Google Overture Work done at NEC Laboratories

  2. Problem • Many reviews spread out across Web • Product-specific sites • Editorial reviews at C|net, magazines, etc. • User reviews at C|net, Amazon, Epinions, etc. • Reviews in blogs, author sites • Reviews in Google Groups (Usenet) • No easy aggregation, but would like to get numerous diverse opinions • Even at one site, synthesis difficult

  3. Solution! • Find reviews on the web and mine them… • Filtering (find the reviews) • Classification (positive or negative) • Separation (identify and rate specific attributes)

  4. Existing work Classification varies in granularity/purpose: • Objectivity • Does this text express fact or opinion? • Words • What connotations does this word have? • Sentiments • Is this text expressing a certain type of opinion?

  5. Objectivity classification • Best features: relative frequencies of parts of speech (Finn 02) • Subjectivity is…subjective (Wiebe 01)

  6. Word classification • Polarity vs. intensity • Conjunctions • (Hatzivassiloglou and McKeown, 97) • Colocations • (Turney and Littman 02) • Linguistic colocations • (Lin 98), (Pereira 93)

  7. Sentiment classification • Manual lexicons • Fuzzy logic affect sets (Subasic and Huettener 01) • Directionality – block/enable (Hearst 92) • Common Sense and emotions (Liu et al 03) • Recommendations • Stocks in Yahoo (Das and Chen 01) • URLs in Usenet (Terveen 97) • Movie reviews in IMDb (Pang et al 02)

  8. Applied tools • AvaQuest’s GoogleMovies • http://24.60.188.10:8080/demos/GoogleMovies/GoogleMovies.cgi • Clipping services • PlanetFeedback, Webclipping, eWatch, TracerLock, etc. • Commonly use templates or simple searches • Feedback analysis: • NEC’s Survey Analyzer • IBM’s Eclassifier • Targeted aggregation • BlogStreet, onfocus, AllConsuming, etc. • Rotten Tomatoes

  9. Our approach

  10. Train/test • Existing taggedcorpora! • C|net – two cases • even, mixed10-fold test • 4,480 reviews • 4 categories • messy, by category7-fold test • 32,970 reviews

  11. Other corpora • Preliminary work with Amazon • (Potentially) easier problem • Pang et al IMDb movie reviews corpus • Our simple algorithm insignificantly worse 80.6 vs. 82.9 (unigram SVM) • Different sort of problem

  12. Domain obstacles • Typos, user error, inconsistencies, repeated reviews • Skewed distributions • 5x as many positive reviews • 13,000 reviews of MP3 players,350 of networking kits • fewer than 10 reviewsfor ½ of products • Variety of language, sparse data • 1/5 of reviews have fewer than 10 tokens • More than 2/3 of terms occur in fewer than 3 documents • Misleading passages (ambivalence/comparison) • Battery life is good, but... Good idea, but...

  13. Base Features • Unigrams • great, camera, support, easy, poor, back, excellent • Bigrams • can't beat, simply the, very disappointed, quit working • Trigrams, substrings • highly recommend this, i am very happy, not worth it, i sent it back, it stopped working • Near • greater would, find negative, refund this, automatic poor, company totally

  14. Generalization • Replacing product names, (metadata) • the iPod is an excellent ==> the _productname is an excellent • domain-specific words, (statistical) • excellent [picture|sound] quality ==> excellent _producttypeword quality • rare words (statistical) • overshadowed by ==> _unique by

  15. Generalization (2) • Finding synsets in WordNet • anxious ==> 2377770 • nervous ==> 2377770 • Stemming • was ==> be • compromised ==> compromis

  16. Qualification • Parsing for colocation • this stupid piece of garbage ==> (piece(N):mod:stupid(A))... • this piece is stupid and ugly ==> (piece(N):mod:stupid(A))... • Negation • not good or useful ==>NOTgood NOTor NOTuseful

  17. Optimal features • Confidence/precision tradeoff • Try to find best-length substrings • How to find features? • Marginal improvement when traversing suffix tree • Information gain worked best • How to score? • [p(C|w) – p(C’|w)] * df during construction • Dynamic programming or simple during use • Still usually short ( n < 5 ) • Although… “i have had no problems with”

  18. Clean up • Thresholding • Reduce space with minimal impact • But not too much! (sparse data) • > 3 docs • Smoothing: allocate weight to unseen features, regularize other values • Laplace, Simple Good-Turing, Witten-Bell tried • Laplace helps Naïve Bayes

  19. Scoring • SVMlight, naïve Bayes • Neither does better in both tests • Our scoring is simpler, less reliant on confidence,document length, corpus regularity • Give each word a score ranging –1 to 1 • Our metric: • Tried Fisher discriminant, information gain, odds ratio • If sum > 0, it’s a positive review • Other probability models – presence • Bootstrapping barely helps p(fi|C) – p(fi|C’) score(fi) = p(fi|C) + p(fi|C’)

  20. Reweighting • Document frequency • df, idf, normalized df, ridf • Some improvement from logdf and gaussian (arbitrary) • Analogues for term frequency, product frequency, product type frequency • Tried bootstrapping, different ways of assigning scores

  21. Test 1 (clean) Best Results Baseline 85.0 % Naïve Bayes + Laplace 87.0 % Weighting (log df) 85.7 % Weighting (Gaussian tf) 85.7 % Baseline 88.7 % + _productname 88.9 % Unigrams Trigrams

  22. Test 2 (messy) Best Results Baseline 82.2 % Prob. after threshold 83.3 % Presence prob. model 83.1 % + Odds ratio 83.3 % Baseline 84.6 % + Odds ratio 85.4 % + SVM 85.8 % Baseline 85.1 % + _productname 85.3 % Unigrams Bigrams Variable

  23. Extraction Can we use the techniques from classification to mine reviews?

  24. Obstacles • Words have different implications in general usage • “main characters” is negatively biased in reviews, but has innocuous uses elsewhere • Much non-review subjective content • previews, sales sites, technical support, passing mentions and comparisons, lists

  25. Results • Manual test set + web interface • Substrings better…bigrams too general? • Initial filtering is bad • Make “review” part of the actual search • Threshold sentence length, score • Still get non-opinions and ambiguous opinions • Test corpus, with red herrings removed, 76% accuracy in top confidence tercile

  26. Attribute heuristics • Look for _producttypewords • Or, just look for things starting with “the” • Of course, some thresholding • And stopwords

  27. Results • No way of judging accuracy, arbitrary • For example, Amstel Light • Ideally:taste, price, image... • Statistical method: beer, bud, taste, adjunct, other, brew, lager, golden, imported... • Dumb heuristic:the taste, the flavor, the calories, the best…

  28. Future work • Methodology • More precise corpus • More tests, more precise tests • Algorithm • New combinations of techniques • Decrease overfitting to improve web search • Computational complexity • Ambiguity detection? • New pieces • Separate review/non-review classifier • Review context (source, time) • Segment pages

More Related