1 / 21

Mining and Summarizing Customer Reviews

Mining and Summarizing Customer Reviews. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Minqing Hu and Bing Liu. 2004 SIGKDD. Outline. Motivation Objective Introduction Feature-based opinion summarization Experimental Evaluation Conclusions Personal Opinion. Motivation.

poncem
Download Presentation

Mining and Summarizing Customer Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining and Summarizing Customer Reviews Advisor :Dr. Hsu Reporter:Chun Kai Chen Author:Minqing Hu and Bing Liu 2004 SIGKDD

  2. Outline • Motivation • Objective • Introduction • Feature-based opinion summarization • Experimental Evaluation • Conclusions • Personal Opinion

  3. Motivation • As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly • difficult for a potential customer to read them to make an informed decision on whether to purchase the product • difficult for the manufacturer of the product to keep track and to manage customer opinions

  4. Objective • In this research, we aim to mine and to summarize all the customer reviews of a product • only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative • do not summarize the reviews

  5. Introduction(1/2) • Given a set of customer reviews of a particular product, the task involves three subtasks: • identifying features of the product • customers have expressed their opinions on (called product features) • for each feature, identifying review sentences • give positive or negative opinions • producing a summary • using the discovered information

  6. Introduction(2/2) • Our task is different from traditional text summarization [15, 39, 36] in a number of ways • First • a summary in our case is structured rather than another (but shorter) free text document as produced by most text summarization systems • Second • only interested in features of the product • do not summarize the reviews • by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in traditional text summarization

  7. Feature-based opinion summarization <S> <NG><W C='PRP' L='SS' T='w' S='Y'> I</W> </NG> <VG> <W C='VBP'> am</W><W C='RB'> absolutely </W> </VG> 3.1 3.2 Apriori algorithm (only N) Compactness pruning Redundancy pruning 3.3 形容詞“The pictures are very clear.” 3.4 WordNet (同義/反義)形容詞 3.5 3.6 positive orientation(e.g., beautiful, awesome) negative orientation (e.g., disappointing) 3.7

  8. Part-of-Speech Tagging (POS) • Product features are usually nouns or noun phrases in review sentences • used the NLProcessor linguistic parser [31] to parse each review to split text into sentences and to produce the part-of-speech tag for each word • A transaction file is then created for the generation of frequent features in the next step • includes only the identified nouns and noun phrases of the sentence • Some pre-processing of words is also performed • which includes removal of stopwords, stemming and fuzzy matching

  9. Frequent Features Identification • Due to the difficulty of natural language understanding, some types of sentences are hard to deal with • “The pictures are very clear.” • “While light, it will not easily fit in pockets.” (size) • we focus onfinding features that appear explicitly as nouns or noun phrases in the reviews • we focus on finding frequent features, (finding infrequent features will be discussed later) • We run the association miner CBA [26] • based on the Apriori algorithm in [1] on the transaction set of noun/noun phrases produced in the previous step

  10. Frequent Features Identification-Feature Pruning • Compactness pruning • Focus on removing feature that contain at lease two words • Association rule doesn’t consider the position of the items • aims to prune those candidate whose words do not appear together in a specific order • Redundancy pruning • Focus on removing features that contain single word • Use p-support (pure support) to describe redundant features • For instance, life by itself is not a useful feature while battery life is a meaningful feature phrase.

  11. Opinion Words Extraction • We now identify opinion words • people use to express a positive or negative opinion • primarily used to express subjective opinions • this paper uses adjectives as opinion words • For example • “The strap is horrible and gets in the way of parts of the camera you need access to.” • horribleis the effective opinion of strap • Effective opinions will be useful when we predict the orientation of opinion sentences

  12. Orientation Identification for Opinion Words(1/2) • For each opinion word • identify its semantic orientation (by training) • be used to predict the semantic orientation of each opinion sentence • Words that encode a orientation state • a positive orientation(e.g., beautiful, awesome) • a negative orientation (e.g., disappointing) • no orientation (e.g., external, digital) [17]. • In this work • we are interested in only positive and negative orientations

  13. Orientation Identification for Opinion Words(2/2) • Unfortunately • dictionaries and similar sources do not include semantic orientation information for each word • In this research • utilizing the adjectivesynonym set and antonym setin WordNet [29] to predict the semantic orientations of adjectives • WordNet cannot recognize • they are discarded as they may not be valid words • cannot find orientations • they will also be removed from the opinion words list

  14. Infrequent Feature Identification • There are some features that only a small number of people talked about • association mining is unable to identify such features • How to extract these infrequent features • use the nearest noun/noun phrase • could also find nouns/noun phrases that are irrelevant to the given product

  15. Predicting the Orientations of Opinion Sentences • In general • we use the dominant orientation of the opinion words in the sentence to determine the orientation of the sentence. • In the case where there is the same number of positive and negative opinion words • case 1 • The user likes or dislikes most or all the features in one sentence • “overall this is a good camera with a really good picture clarity & an exceptional close-up shooting capability.” • case 2 • The user likes or dislikes most of the features in one sentence, but there is an equal number of positive and negative opinion words • “the auto and manual along with movie modes are very easy to use, but the software is not intuitive.” • All the other cases

  16. Summary Generation • After all the previous steps, we are ready to generate the final feature-based review summary • A count is computed to show how many reviews give positive/negative opinions to the feature • ranked according to the frequency

  17. Experimental Evaluation • We now evaluate FBS from three perspectives • The effectiveness of feature extraction • The effectiveness of opinion sentence extraction • The accuracy of orientation prediction of opinion sentences • We have conducted experiments on the customer reviews of five electronics products • 2 digital cameras, 1 DVD player, 1 mp3 player, and 1 cellular phone. • The two websites where we collected the reviews • Amazon.com and C|net.com.

  18. To evaluate the discovered features • a human tagger manually read all the reviews and produced a manual feature list for each product • Columns 3-8 demonstrate • clearly the effectiveness of these two pruning techniques • Columns 9 and 10 • after infrequent feature identification is done. The recall is improved dramatically

  19. Conclusion • In this paper • proposed a set of techniques for mining and summarizing product reviews based on data mining and natural language processing methods • The objective • to provide a feature-based summary of a large number of customer reviews of a product sold online • Experimental results • indicate that the proposed techniques are very promising in performing their tasks

  20. Personal Opinion • Strength • proposed a new valid method of mining customer reviews • Weakness • feature must be explicitly mentioned • opinion words must be adjectives • Application • Future Work

More Related