1 / 95

Modeling and Exploiting Review Helpfulness for Summarization

Modeling and Exploiting Review Helpfulness for Summarization. Diane Litman Professor , Computer Science Department Senior Scientist, Learning Research & Development Center Co-Director , Intelligent Systems Program University of Pittsburgh

miroslav
Download Presentation

Modeling and Exploiting Review Helpfulness for Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling and Exploiting Review Helpfulness for Summarization Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development Center Co-Director, Intelligent Systems Program University of Pittsburgh Joint work with WentingXiong, Computer Science (PhD Dissertation; now at IBM)

  2. Online reviews • Online reviews are influential in customer decision-making

  3. Online peer reviews • Student peer reviews have been used for grading assignments in Massive Open Online Courses (MOOCs) • Online peer-review software • E.g. SWoRD Developed at the University of Pittsburgh

  4. While reviews thrive on the internet… Overwhelming!

  5. While reviews thrive on the internet… Overwhelming! Mixed quality!

  6. Review metadata includes user-provided quality assessments (e.g., helpfulness votes)

  7. Review metadata includes user-provided quality assessments (e.g., helpfulness votes) Research Problem 1: What if helpfulness metadata is not available?

  8. Helpfulness metadata, in turn, has been used to facilitate review exploration

  9. Helpfulness metadata has been used to facilitate review exploration Research Problem 2: What about helpfulness for summarization?

  10. Outline • Introduction • Challenges for NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions

  11. Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews

  12. Product review examples More helpful review Personal experience Product support Less helpful review Comparison with iPad

  13. Peer review examples Criticism • Expert-rated helpfulness = 5 • I thought there were some good opportunities to provide further data to strengthen your argument. For example the statement “These methods of intimidation, and the lack of military force offered by the government to stop the KKK, led to the rescinding of African American democracy.” Maybe here include data about how… • (omit 126 words) Solution Problem localization • Expert-rated helpfulness = 2 • The author also has great logic in this paper. How can we consider the United States a great democracy when everyone is not treated equal. All of the main points were indeed supported in this piece. Praise • Problem localization and solutions are significantly correlated with the likelihood of feedback implementation <Nelson and Schunn 2009>

  14. Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot

  15. Review content from multiple sources The external content is highlighted in green • Product reviews The Nikon D3100 is a very good entry-level digital SLR. Clearly targeted toward the beginner, its combination of Guide Modes, assist images, and help screens easily makes it the most accessible of any D-SLR out there.

  16. Review content from multiple sources The external content is highlighted in green • Movie reviews • Peer reviews …Schultz tells Django to pick out whatever he likes. Django looks at the smiling white man in disbelief. You’re gonna let me pick out my own clothes? Django can’t believe it. The following shot delivered one of the biggest laughs from the audience I watched the film with. … The paragraph about Abraham Lincoln's actions towards the former slaves is not clear. Which social and political reforms were not made quickly by Lincoln? It may well be true that Lincoln did not accomplish everything he intended before his assassination, but this sentence is too vague to know whether the writer is historically accurate.

  17. Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level

  18. Identifying review helpfulness in fine-granularity • An example I really like this camera.It has 10x optical, image stabilization, a 3.0inch lcd with 230,000 pixels, and more. The size is great for a 10x zoom camera.Image stabilization and is great for getting shots that would come out blurry with my Canon Powershot A620.My other favorite feature besides the zoom and image stabilization, is the wide angle.It is great to finally get cityscapes and have the whole skyline in one shot!!And with the camera set to 16X9, I can get a 24mm shot!

  19. Identifying review helpfulness in fine-granularity • Sentence-level review helpfulness prediction

  20. Identifying review helpfulness in fine-granularity • Highlight the most helpful sentences

  21. Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level • Existing summarization heuristics are not designed for reviews • E.g. Similarity of word distributions

  22. Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level • Existing summarization heuristics are not designed for reviews • E.g. Similarity of word distributions • Specialized subject pools are needed for user studies • E.g. Students or teachers for peer reviews

  23. Research questions • Can we model review helpfulness based on review textual content automatically? • Can we improve summarization performance by introducing review helpfulness?

  24. Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions

  25. Automatically assessing peer-review helpfulness Our approach – Adaptation • From product reviews <Kim et al 2006> to peer reviews • Introduce peer-review domain knowledge

  26. Annotated peer-review corpus Collected from a college level history introductory class • 22 papers and 267 reviews • Paper ratings • Review helpfulness ratings provided by experts • Prior annotations <Nelson and Schunn 2009> • Feedback types -- praise, summary, criticism Kappa = .92 • For criticisms • Localization information of the problem • pLocalization, Kappa = .69 • Concrete solution to problems • Solution, Kappa = .87 Annotation feedbackType = criticism pLocalization = True Solution = True I thought there were some good opportunities to provide further data to strengthen your argument. For example the statement “These methods of intimidation, and the lack of military force offered by the government to stop the KKK, led to the rescinding of African American democracy.” Maybe here include data about how…(omit 126 words)

  27. Adaptation from product reviews to peer reviews • Generic features motivated by prior work on product reviews <Kim et al 2006> • Topic words are automatically extracted from students’ papers using publicly available software (by Annie Louis 2008) • Sentiment words are extracted from General Inquirer Dictionary

  28. Introducing domain knowledge • Peer-review specialized features

  29. Experiment 1 • Comparison • Generic features vs. peer-review specialized features • Algorithm • SVM Regression (SVMlight) • Evaluation • 10-fold cross validation • Pearson correlation coefficient r

  30. Results – Analysis of the generic features • Most helpful features: STR • Best feature combination: STR+UGR+META

  31. Feature redundancy effect Results – Analysis of the generic features • Most helpful features: STR • Best feature combination: STR+UGR+META • Combining all features together does not add up their predictive power

  32. Results – Analysis of the peer-review features • Introducing peer-review specific features enhances performance • Feature redundancy effect is reduced after replacing UGR with Lexical Categories

  33. Results – Analysis of the peer-review features • Introducing peer-review specific features enhances performance • Feature redundancy effect is reduced after replacing UGR with Lexical Categories

  34. Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions

  35. Modeling review helpfulness based on content patterns of multiple sources • High-level representation of review content patterns • Differentiating review content sources

  36. Content patterns – LU Linguistic Inquiry Word Count <Pennebaker, et al. 2007> • To examine review language usage patterns

  37. Content patterns – CD Language entropy over word distribution <Stark, et al. 2012>

  38. Content patterns -- rRT Statistical topic modeling — sLDA <Blei et al 2007> • Introduce document information as supervision Helpfulness rating

  39. Content patterns – rRTTopic words learned from peer reviews

  40. Differentiating review content sources Feature extraction with respect to different content sources • Internal content: reviewers’ judgments • External content: reviewers’ references to the review item • Consider review external content as external topic words • Topic signature acquisition algorithm <Lin and Hovy, 2000> • Software:TopicS<Nenkova and Louis, 2008> …Schultz tells Django to pick out whatever he likes. Django looks at the smiling white man in disbelief. You’re gonna let me pick out my own clothes? Django can’t believe it. The following shot delivered one of the biggest laughs from the audience I watched the film with. …

  41. Data • Three domains • Camera reviews • From Amazon.com<Jindal and Liu 2008> • Each camera/movie review is voted by more than 3 people • Movie reviews • Collected from IMDB.com • Educational peer reviews • <Xiong and Litman 2011> • Helpfulness gold standard • Camera/Movie reviews <Kim et al. 2006> • Peer reviews • 5-point expert ratings <Nelson and Schunn 2009>

  42. Experiment 2 • Comparison • Content patterns (LU, CD, hRT) vs. unigram • Content patterns + others vs. unigram + others • Content sources: F, I, E, I+E • Algorithm • SVM Regression (SVMlight) • Evaluation • 10-fold cross validation • Pearson correlation coefficient r

  43. Experiment 2 – Feature results • The proposed features outperform unigrams for movie and peer reviews • The best result is in bold • Significant improvement over baselines are noted with+ • Unigrams work best for camera reviews • Same pattern when performed down-sampling • Domain difficulty: movie > peer > camera (?)

  44. Experiment 2 – Feature results • The proposed features outperform unigrams for movie and peer reviews • The best result is in bold • Significant improvement over baselines are noted with+ • Unigrams work best for camera reviews • Same pattern when performed down-sampling • Domain difficulty: movie > peer > camera (?)

  45. Experiment 2 – Feature results • Content patterns + othersvs. unigram + others • Same pattern holds

  46. Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)

  47. Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)

  48. Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)

  49. Lessons learned • Techniques used in predicting product review helpfulness can be effectively adapted to the new peer-review domain • Prediction performance can be further improved by incorporating featuresthat capture helpfulness information specific to peer-reviews • Content features which capture review content patterns at a high-level work better than unigrams for predicting review helpfulness • Review content source also matters to modeling review helpfulness, differentiating which yields better performance

  50. Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions

More Related