Utility scoring of product reviews
1 / 24

Utility Scoring of Product Reviews - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Utility Scoring of Product Reviews. Zhu Zhang Department of Management Information System University of Arizona Balaji Varadarajan Department of Computer Science University of Arizona CIKM’06. Outline. Introduction Problem Definition Data Collection

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Utility Scoring of Product Reviews

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Utility Scoring of Product Reviews

Zhu Zhang

Department of Management Information System University of Arizona

Balaji Varadarajan

Department of Computer Science University of Arizona



  • Introduction

  • Problem Definition

  • Data Collection

  • Utility Scoring Using Statistical Regression Models

  • Experiment and Analysis

  • Conclusion and Future Work

1. Introduction (1/4)

  • Product review data have been heavily used for subjectivity classification, polarity prediction, and opinion extraction, due to their linguistic properties and their implications for E-commerce applications

  • Product reviews are not equally “useful”

  • Below is a customer review for a digital camera, Canon Powershot SD300, it is found helpful by 170 out of 178 Amazon shoppers:

1. Introduction (2/4)

1. Introduction (3/4)

  • On the other hand, imagine product reviews as simple as


  • They certainly encode strong polarity and reflect strong opinion of their authors

  • However, they are not particularly reliable or useful in informing their readers’ shopping decisions

1. Introduction (4/4)

  • We envision the following “weighted average” framework:

  • In which the overall evaluation of a product P is a weighted average of the polarity of each individual review

  • Much previous research has been focusing on predicting the polarity of text,

  • We try to approach the utility of reviews,

2. Problem Definition (1/2)

  • Utility (or, reliability, usefulness informativeness) is not the same as indifference

  • Totally indifferent or neutral reviews are useless; well-grounded subjective opinions can be convincing and illuminating

  • The utility of a product review is a property orthogonal to its polarity or embedded opinions

  • The goal is to build a computational model to predict the utility of reviews

2. Problem Definition (2/2)

  • The problem viewed as one of regression.

  • The task is to approximate a function:

  • Given an estimated function F, we can use the following metrics to evaluate its quality:

    • Squared correlation coefficient

    • Mean squared error

T: product review : the real utility scores : the predicted utility scores

and : the mean of the corresponding sample respectively

3. Data Collection (1/3)

  • Download reviews and related data in three different domains from CNET and Amazon: Electronics, Video, and Books.

    • For electronics and books, we used keyword-based search to identify Canon products (Canon) and engineering books (Engineering)

    • For videos, we used the “AudienceRating” field to retrieve PG- 13 movies (PG-13 )

3. Data Collection (2/3)

  • Relevant to this study, downloaded the following data:

    • Customer Reviews for product items whose ASIN (Amazon Standard Identification Number) is seen in the search response to the ItemSearch queries corresponding to the three domains above

    • Product meta data (including both product description and Amazon Editorial Reviews) corresponding to the product items

    • All reviews written by a customer whose review appears in the data collected in 1)

3. Data Collection (3/3)

  • For all customer-written reviews, we also collected the votes they get regarding there usefulness (“x out of y people found the following review helpful”)

  • The basic statistics of the three collections are summarized in Table 1

4. Utility Scoring Using Statistical Regression Models (1/5)

4.1 Learning Algorithms

  • Experiment with two types of regression algorithms:

    • -Support Vector Regression ( -SVR) implemented in LIBSVM

      • Just as SVM represents the state of the art in machine learning, SVR is attractive due to the power of different kernel functions

      • We use the radial basis kernel function (RBF), which handles the potentially non-linear relationship between the target value and features

    • Simple linear regression (SLR) implemented in WEKA [22]

[22] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques.

Morgan Kaufmann, San Francisco, 2005.

4. Utility Scoring Using Statistical Regression Models (2/5)

4.2 Features

  • 4.2.1 Lexical Similarity Features (LexSim)

    • Clearly an informative customer review should not be a literal copy or loyal rephrase of the product specification S

    • On the other hand, a good review is supposed to base subjective judgment on objective observation

    • Similar situation is conceivable between a customer review and an editorial review E, the latter of which approximates a relatively objective and authoritative view of the product

    • Measure the similarity between customer review and product specification, , and that between customer review and editorial review,

    • Using the standard cosine similarity in vector space model, with TF*IDF term weighting

4. Utility Scoring Using Statistical Regression Models (3/5)

  • 4.2.2 Shallow Syntactic Features (ShallowSyn)

    • Compute counts of words with the following part-of-speech tags in T, in order to characterize the subjectivity-objectivity-mixture of the text at a shallow syntactic level:

      • Proper nouns‧Comparative and superlative adjectives

      • Numbers‧Comparative and superlative adverbs

      • Modal verbs‧Wh-determiners, wh-pronouns, possessive wh-pronouns,

      • Interjections wh-adverbs

    • Also compute simple counts such as number of words and number of sentences in the review text T

4. Utility Scoring Using Statistical Regression Models (4/5)

  • 4.2.3 Lexical Subjectivity Clues (LexSubj)

    • We use to capture the subjectivity-objectivity-mixture at a lexical semantic level, by taking advantage of lexical resources created by other researchers

    • Calculate counts of words in the following clue lists respectively:

      • The list of subjective adjectives learned in [18]

      • The list of subjective adjectives learned in [4], it contains the following categories:

        • Dynamic adjectives (e.g., “careful”, “serious”)

        • Polarity Plus adjectives (e.g., “amusing”), manually or automatically identified

        • Polarity Minus adjectives (e.g., “awful”), manually or automatically identified

        • Gradability Plus adjectives (e.g., “appropriate”), manually or automatically identified

        • Gradability Minus adjectives (e.g., “potential”), manually or automatically identified

[4] V. Hatzivassiloglou and J. M. Wiebe. Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the 18th conference on Computational linguistics, pages 299–305, Morristown, NJ, USA, 2000. Association for Computational Linguistics.

[18] J. Wiebe. Learning subjective adjectives from corpora. In AAAI/IAAI, pages 735–740, 2000.

4. Utility Scoring Using Statistical Regression Models (5/5)

  • The list of strong subjective nouns and weak subjective nouns generated by Basilisk [16]

  • The list of strong subjective nouns (e.g., “domination”, “evil”) and weak subjective nouns (e.g., “reaction”, “security”) generated by MetaBoot [13]

  • We do not count the frequency of each individual word, only count the total occurrences of words in each list

  • For example, if the review text T contains 5 weak subjective nouns, we give a value “5” to the feature “WeakSubjNoun” instead of assigning the value “1” to 5 binary features

  • In total, we have 14 word lists, therefore 14 features in this category

  • [13] E. Riloff, J. Wiebe, and T. Wilson. Learning subjective nouns using extraction pattern bootstrapping. In W. Daelemans and M. Osborne, editors, Proceedings of CoNLL-2003, pages 25–32. Edmonton, Canada, 2003.

    [16] M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214–221, Philadelphia, July 2002. Association for Computational Linguistics.

    5. Experiment and Analysis (1/6)

    5.1 Data Treatment

    • Almost every Amazon customer review comes with a vote regarding its usefulness (“x out of y people found the following review helpful”)

    • The utility defined as

    • Using the reviews with at least 10 votes (i.e., y > 10)

    5. Experiment and Analysis (2/6)

    5.2 Experimental Results

    • One might intuitively expect that the utility of a review strongly correlates with its length

    5. Experiment and Analysis (3/6)

    • The regression performance on the three review collections (Canon, PG-13, and Engineering) are summarized in Table 4, 5, and 6 respectively

    • All results presented are based on 10-fold cross validation

    5. Experiment and Analysis (4/6)

    5. Experiment and Analysis (5/6)

    5.3 Discussion

    • The strongest model for each collection always achieves

      > 0.30 and < 0.10, and apparently outperforms the length-based baseline

    • SVR significantly outperforms SLR, mostly due to its regularized model and the non-linear power of its kernel function

    • Since the prediction model only depends on a subset of the training data, SVR is more resistant to outliers, from which SLR sometimes suffer (e.g., producing very large mean squared error)

    5. Experiment and Analysis (6/6)

    • Looking at the feature vector for the regression models, different groups of features have different effect on the final output:

      • The set of lexical similarity features play a very minor role in the regression model

        • how useful a review is does not necessarily correlate with how “similar” it is to the corresponding product specification or authoritative review

      • The lexical subjectivity clues have very limited influence on the utility scoring

        • The perceived “usefulness” of a product review barely correlates with the subjectivity or polarity embedded in the text

      • The shallow syntactic features account for most predicting power of the regression model

        • High-utility reviews do stand out due to the linguistic styles in which they are written

    6. Conclusion and Future Work (1/2)

    • This study identified a new task in the ongoing research in text sentiment analysis: predicting utility scores of product reviews

    • They viewed the problem as one of regression, and built regression models by incorporating a diverse set of features, which achieved highly competitive performance on three Amazon product review collections

    • The shallow syntactic features turned out to be the most influential predictors, which indicates that the perceived utility of a product review highly depends on its linguistic style

    6. Conclusion and Future Work (2/2)

    • The following directions are identified for future work:

      • Utility prediction is certainly desirable to consider it in concert with polarity classification, opinion extraction, strength scoring, and other aspects, in an integrated framework

      • In the context of online shopping, it makes sense to incorporate stronger user profiling (e.g., demographic info, shopping history, etc.) into text sentiment analysis

  • Login