1 / 14

Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005. About this problem. To label scales Differ from “thumbs up” or not

Download Presentation

Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seeing stars: Exploiting class relationships for sentiment categorization withrespect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005

  2. About this problem • To label scales • Differ from “thumbs up” or not • Differ from identifying opinion strength • Differ from ranking (+classification) • Movie reviews from Rotten Tomatoes • Study on human subjects • Three algorithms

  3. Problem validation and formulation (1) • Check how human performs to compare with machine’s performance • Use reviews of one author to factor out the effects of cross-author divergence • A notch equals half star/four or five stars; 10 points/100 points • Random-choice baseline 33%

  4. Problem validation and formulation (2) • A three-class task seems like one that most people would do quite well at. • For balance issue, reduce their problem from 5-class to 4-class

  5. A scale dataset • Movie reviews from four corpora • Remove rating indicators • Remove objective sentences • A total of 1,770, 902, 1,307, 1,027 documents of four authors

  6. Algorithm (1) • Using SVMlight package • Algorithm 1: One-vs-all (OVA) • An SVM binary classifier distinguishing label l to label not-l • Algorithm 2: Regression • Find the hyperplane best fits the training data (within distance epsilon incur no loss) • Similar items, similar labels

  7. Algorithm (2) • Algorithm 3: Metric labeling • Algorithm 1 or 2 + Similarity measure • Distance metric on labels • K nearest neighbors of item x according to sim • Item-similarity function sim • Locally-weighted learning

  8. Algorithm (3) • Finding a label-correlated item-similarity function: vocabulary overlap (ex. Cosine) is not suitable.

  9. Algorithm (PSP) • Using PSP (positive-sentence percentage) • A NB classifier trained on 10,062 movie-review snippets (exact one sentence long striking) • Apply this classifier on their test data

  10. Algorithm (PSP) = Distinguish terms: appear more than 20 times and appear in a single class 50% or more

  11. Experiment Results (1)

  12. Experiment Results (2) • Adding PSP is useful, however, PSP it self is not good enough.

  13. Multi-authors • Get comparable results

  14. Future Work • Varying the kernel in SVM • Use mixture models (combine “positive” and “negative” language models) to capture class relationships. • Multi-class but no-scale-based categorization problem (positive vs. negative vs. neutral) • Transductive setting (a small amount of labeled data and uses relationships between unlabeled items), well-suited to the metric-labeling approach

More Related