exploiting social context for review quality prediction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Exploiting Social Context for Review Quality Prediction PowerPoint Presentation
Download Presentation
Exploiting Social Context for Review Quality Prediction

Loading in 2 Seconds...

  share
play fullscreen
1 / 28
binah

Exploiting Social Context for Review Quality Prediction - PowerPoint PPT Presentation

87 Views
Download Presentation
Exploiting Social Context for Review Quality Prediction
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Exploiting Social Context for Review Quality Prediction Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros Ntoulas Microsoft Research Livia Polanyi Microsoft • April 28, WWW’2010 • Raleigh, NC

  2. Why do we care about PredictingReview Quality? User “helpfulness” votes help prioritize reading User reviews (1764) But not all reviews have votes New reviews Reviews aggregated from multiple sources

  3. What has been done? • As classification or regression problem Labeled × ? √ ? ? Unlabeled ? ? √ ? ? ? ? • Textual features • Meta-data features [Zhang&Varadarajan`06] [Kim et al. `06] [Liu et al. `08] [Ghose&Ipeirotis `10]

  4. Reviews are NOTStand-Alone Documents Our Work: Exploiting Social Contextfor Review Quality Prediction We also observe… Reviewer Identity + Social Network = Social Context

  5. Roadmap • Motivation • Review Quality Prediction Algorithms • Experimental Evaluation • Conclusions

  6. Text Statistics • Syntactic • Conformity • Sentiment Text-only Baseline • NumSent • NumTokens • SentLen • CapRatio • UniqWordRatio • POS:RB • POS:PP • POS:V • POS:CD • POS:JJ • POS:NN • POS:SYM • POS:COM • POS:FW • KLDiv • SentiPositive • SentiNegative FeatureVector( )= Textual Features

  7. Base Model: Linear Regression i i w = argmin Quality( ) = argmin{ } Closed-form: w= =Weights×FeatureVector( )

  8. Straight-forward Approach: Adding Social Context as Features FeatureVector( )= Textual Features Social Context Features Disadvantages: • Social context features not always available • Anonymous reviews? • A new reviewer? • Need more training data

  9. Our Approach: Social Context as Constraints Quality( ) Social Network How to combine such intuitions with Textual info? Reviewer Identity Quality( ) Our Intuitions: is related to is related to its Social Network Quality( )

  10. Formally: Graph-based Regularizers We will define four regularizers base on four hypotheses. Advantages: • Semi-supervised: make use of unlabeled data • Applicable to reviews without social context • Baseline • Loss function • Trade-off • parameter • Designed to “favor” • our intuitions w = argmin • { + β× Graph Regularizer} Labeled Unlabeled

  11. 1.Reviewer Consistency Hypothesis Quality( ) 2 Quality( ) ~ 1 Quality( ) Quality( ) ~ 3 • 4 Reviewers are consistent! 2 1 3 4

  12. Regularizer for Reviewer Consistency • Reviewer Regularizer 2 1 Same-Author Graph (A) Sum over all data (train + test) for all pairs reviews in the same-author graph =∑ [ Quality( ) - Quality( ) ]2 2 1 Closed-form solution! [Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06] • w= 3 4 Review-Feature Matrix Graph Laplacian

  13. 2.Trust Consistency Hypothesis Quality( ) - Quality( ) ≤ 0 Defined as AVG ( Quality( ) ) I trust people with quality at least as good as mine!

  14. Regularizer for Trust Consistency • Trust Regularizer =∑max[0, Quality( ) - Quality( )]2 Trust Graph Sum over all data (train + test) for all pairs of reviewers connected in the trust graph No closed-form solution… Still convexGradient Descent

  15. 3.Co-Citation Consistency Hypothesis Quality( ) - Quality( ) → 0 Trust Graph Co-citation Graph I am consistent with my “trust standard”!

  16. Regularizer for Co-citation Consistency • Co-citation Regularizer =∑[ Quality( ) - Quality( ) ]2 Co-citation Graph (C) Sum over all data (train + test) for all pairs of reviewers connected in the co-citation graph Closed-form solution! • w= Review-Reviewer Matrix

  17. 4.Link Consistency Hypothesis Quality( ) - Quality( ) → 0 Trust Graph Link Graph I trust people with similar quality as mine!

  18. Regularizer for Link Consistency • Link Regularizer =∑[ Quality( ) - Quality( ) ]2 Link Graph Sum over all data (train + test) for all pairs of reviewers connected in the co-citation graph Closed-form solution!

  19. Roadmap • Motivation • Review Quality Prediction Algorithms • Experimental Evaluation • Conclusions

  20. Data from Ciao UK

  21. Hypotheses Testing:Reviewer Consistency From same reviewer From different reviewers Density Qg( ) Qg( ) 2 3 Qg( ) - 1 Qg( ) - 1 Difference in Review Quality Reviewer Consistency Hypothesis supported by data (Cellphone)

  22. Hypotheses Testing:Social Network-based Consistencies Qg( ) - Qg( ) B A Density B is not linked to A B trusts A B is co-cited with A B is linked to A Social Network-based Consistencies supported by data Difference in Reviewer Quality (Cellphone)

  23. Prediction Performance:Exploiting Social Context AddFeatures is most effective given sufficient training data • AddFeatures With limited training data, Reg methods work best • Reg:Reviewer • Reg:Cocitation • Reg:Trust Reg:Link % of MSE Difference Reg:Reviewer > Reg:Trust > Reg:Cocitation > Reg:Link 10% 25% 50% 100% Percentage of Training Data Better (Cellphone)

  24. Prediction Performance:Compare Three Categories Reviews/Reviewer ratio = 1.06 Cellphone Beauty Digital Camera % of MSE Difference Improvement on Digital Camera is smaller due to sparse social context Reg:Link • Reg:Trust • Reg:Cocitation • Reg:Reviewer Better

  25. Parameter Sensitivity consistently better than Baseline when parameter < 0.1 Text-only Baseline Mean Squared Error Regularization Parameter (Cellphone) (Beauty) Better

  26. Conclusions • Improve Review Quality Prediction using Social Context • Formalize into a Semi-supervised Graph Regularization framework • Utilize both labeled and unlabeled data • Applicable on data with no social context • Promising results on real world data • Esp. limited labels, rich social context

  27. Future Work • Combine multiple regularizers • Optimize by nDCG instead of MSE • Infer trust network • Spam detection

  28. Thank you!&Questions?