1 / 17

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. John Blitzer, Mark Dredze and Fernando Pereira University of Pennsylvania, ACL 07’. Research Purposes. How to adapt classifiers across domains? Books, DVDs, electronics, and kitchen applications

Download Presentation

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University of Pennsylvania, ACL 07’

  2. Research Purposes • How to adapt classifiers across domains? • Books, DVDs, electronics, and kitchen applications • SCL • How to select domains to annotate that would be good proxies to many other domains? • A-distance

  3. excellent Cell phone review Good-quality reception Domain adaptation: SCL • SCL: structural correspondence learning • Two type of words: • Excellent and awful: pivot features • New words: features Computer review Fast dual-core

  4. SCL & SCL-MI • Select pivot features • Select m pivot features which occur frequently in both domain: frequency. • Using frequency is good in POS tagging because they are very often function words, but not the same in sentiment classification. • Choose the ones with the highest mutual information to the source label (pos, neg).

  5. SCL & SCL-MI (exclusive pivots) • Top pivots selected by SCL, but not SCL-MI (left) and vice-versa (right) • Observe feature vector x. • Weight k:pivots, d:features • Apply the projection • Learn the predictor

  6. Dataset • Amazon product reviews: books, DVDs, electronics and kitchen appliances. • Rating (0-5): 0-2 negative, 4-5 positive, 3 dropped. • Balanced composition (labeled): 1,000 positive and 1,000 negative examples for each domain. • Unlabeled: 3,685 DVDs and 5,945 kitchen instances.

  7. Baseline & experiment settings • Linear predictors on unigram and bigram features for classification. • Trained to minimize stochastic gradient descent. • For SCL & SCL-MI: • Pivots must occur in more than five docs in each domain.

  8. Experiments • Labeled set: 1600 training instances and 400 instances. • Baseline: linear classifier trained without adaptation • Upper bound: inside test • Ex. Baseline 72.8%, SCL-MI adaptation 79.7%, inside 80.4%: adaptation loss for baseline 7.6%, adaptation loss for SCL-MI 0.7%, relative reduction in error due to adaptation for SCL-MI 90.8%

  9. Experiment Results

  10. Results analysis

  11. Correcting Misalignments • Supervised training objective • Vs: source model weight vector • 50 target domain labeled instances (for a single engineer to label with minimal effort)

  12. Experiment results: loss • Show adaptation from only the two domains on which SCL-MI performed the worst relative to the supervised baseline.

  13. Experimental Results: +50-tag

  14. Measuring Adaptability • The A-distance: two domains can differ in arbitrary ways, we are only interested in the differences that affect classification accuracy. (A: sets on which a linear classifier returns positive value)

  15. Use the Huber loss as a proxy for the A-distance. • Given two domains, compute SCL representation, create and train a linear classifier. • Compute the empirical average per-instance Huber loss, then calculate 100*(1-loss). Refer this as A-distance.

  16. Proxy A-distance & adaptation loss • Select books or DVDs, but not both.

  17. Conclusion and future work • Domain adaptation: useful in sentiment classification, improve SCL by using MI, correct misalignments by using small labeled target domain data. • Select labeled domain by A-distance. • Future work: addressing the ranking problem.

More Related