1 / 21

Summarization of Multiple , Metadata Rich , Product Reviews

Department of Informatics – Aristotle University of Thessaloniki LPIS Group: http://lpis.csd.auth.gr. Summarization of Multiple , Metadata Rich , Product Reviews. Fotis Kokkoras, Efstratia Lampridou , Konstantinos Ntonas, Ioannis Vlahavas. MS o D a '08

galia
Download Presentation

Summarization of Multiple , Metadata Rich , Product Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department of Informatics – Aristotle University of Thessaloniki LPIS Group:http://lpis.csd.auth.gr Summarization of Multiple, MetadataRich,Product Reviews Fotis Kokkoras, Efstratia Lampridou, Konstantinos Ntonas, Ioannis Vlahavas MSoDa '08 ECAI 2008 Workshop onMining Social Data

  2. Introduction • Modern, successful on-line shops allow consumers to express their opinion on products and services they purchased. • These reviews are valuable for new customers. • If there are dozens, or even hundreds, of reviews for a single product, their utilization is time-consuming. • The need for automatically generated summaries of these reviews is obvious.

  3. Summarization Background • Types of summary: • Extractive: use sentences from the original text • Abstractive: reuse sentence fragments • Text features usually used: • frequency and location of words, sentence location in article, syntactic rules, dictionaries of important words • Various Techniques/Approaches • Machine Learning Techniques • LSA (Latent Semantic Analysis) • Lexical Chains • Cluster-based • They perform well on article-style texts.

  4. The Special Nature of Reviews • On-line product reviews in e-shops, are quite different than article-style texts: • They are usually short and do not obey to strict syntactic rules. • They convey only the subjective opinion of each reviewer. • there are a lot of reviewers! • They include a lot of repeated content. • There are usually too many reviews.

  5. What is the problem? • Traditional summarization techniques do not work very well of such data. • Why? • a frequently mentioned problem can be reported many times in the summary of summarizers that work on the sentence level • reuse of sentence fragments to construct new sentences is risky because reviews are short with weak/poor syntax • it is difficult to detect biased reviews based on their text only

  6. Motivation • On-line reviews are usually accompanied by various metadata, such as: • buyer's technology level, • ownership of the product, • overall judgment for the product or service, in some scale, • labeled (positive or negative) or unlabeled comments, • usefulness of the review to other customers, etc. • How can these metadata help in summarization?

  7. Our Approach • ReSum Algorithm (Review Summarizer) • Creates extractive summary • Uses dictionary of important words and metadata • Is applied separately for (+) and (-) comments • For each product two summaries are created • How it works • Scores the sentences based on their words • Adjusts the initial score based on the metadata • Selects sentences avoiding repetition of concepts • Tested on newegg.com

  8. Requirements • A dictionary D of important words for the domain: • automatically created from a few thousands reviews of the domain in question • concatenation of reviews • removal of common (500) English words • selection of the top 150 most frequent words • Access to the reviews (and their metadata): • we use DEiXTo, an in-house developed, web content extraction system • HTML/DOM based extraction rules

  9. ReSum – Initial Scoring • Step 1: • Concatenate all positive (or negative) comments and divide them into separate sentences. • Remove stop words, punctuation, numbers, etc • Count frequency fv of every word v. • Step 2: • Scoreevery sentence i based on its words and the dictionary D:

  10. ReSum – Metadata Contribution • Metadata used: • Reviewer’s Technology Level (w1) • Ownership duration of the product (w2) • Usefulness of a review to other users (w3) • Step 3: • Initial score Ri is adjusted based on the metadata, in a weighted fashion: • weights are initialized using multicriteria techniques (will be explained later)

  11. ReSum – Redundancy Elimination • Step 4: • Select the sentence with the highest score S. • Penalize the rest sentences that share common words with the selected. • This eliminates redundancy. • The step is repeated until the desired number of sentences is reached.

  12. Weight Initialization (1/3) • Subjective task • we need a consistent way for weight initialization • Analytic Hierarchy Process (AHP–Saaty ‘99) • multicriteria method • provides a methodology to calculate consistent weights for selection criteria, according to the importance we assign to them • importance values are selected from a predefined scale (defined by AHP)

  13. Weight Initialization (2/3) • Fundamental Scale of AHP • Subjective Importance Values we used

  14. Weight Initialization (3/3) • Calculated weights: w’1=0.14, w’2=0.24, w’3=0.62 • Initial weights were further adjusted based on the metadata values:

  15. Experimental Results (1/2) • Dataset: • 1587 reviews from newegg.com • 3 domains (monitors, printers, cpu coolers) • 9 products (3 from each domain) • Reference Summary • manually generated by 3 human experts • Comparison Systems • Two commercial summarizers: • TextAnalyst (Megaputer Intelligence Inc) • Copernic (Copernic Inc) • Naive ReSum • contribution of metadata (step 3) was removed

  16. Experimental Results (2/2) • Average Recall: 91.7 (78.8), 69.5, 54 • Average Precision:73.3 (62.8), 58.3, 53.3

  17. Interesting Facts in our Summaries • Neither biased nor abusive comments appeared • it did happened in the other 3 systems • Comments with low frequency but with significant meaning were included • was not the case for the other 3 systems • Repetition of concepts was minimal or absent thanks to the redundancy elimination step • that’s why naive ReSum performed so well • repetition in Copernic and TextAnalyst was evident

  18. Conclusions • Metadata can contribute to a better summary. • We proposed an algorithm for summarizing on-line, metadata rich, product reviews. • Is Statistical in it's nature. • Assumes labeled comments (pros & cons). • Works at the sentence level: • Ranks sentences based on some "importance” measure and selects the N most important of them. • Uses metadata to make "good" ranking.

  19. Future Work • Generalize our methodology to adapt to the availability or not of the various metadata. • the scoring algorithm is modular – can easily add or remove weights/metadata • Remove the requirement for categorized reviews (positive and negative)

  20. Department of Informatics – Aristotle University of Thessaloniki LPIS Group:http://lpis.csd.auth.gr Summarization of Multiple, MetadataRich,Product Reviews Thank you! Fotis Kokkoras, Efstratia Lampridou, Konstantinos Ntonas, Ioannis Vlahavas MSoDa '08 ECAI 2008 Workshop onMining Social Data

  21. Monitor A - ReSum • PROS • Great resolution, clear picture, very very good price, 24in monitors are gigantic, widescreen aspect ratio makes dvds look awesome • Very, VERY bright, HDMI, no dead pixels, looks much nicer than online photos, unbeatable viewing angle • Excellent color reproduction; fantastic image and text quality; very good brightness and contrast; HDMI input; unbeatable value • Several things stood out above all other monitors I'd considered: Almost non-existent issues of dead/stuck pixels • Resolution & sharpness is amazing In my opinion, sleek design Functional speakers (not the best) Audio output is available Multiple inputs • CONS • So when Windows power management turns off the monitor signal, instead of turning off the monitor goes to bluescreen and says ""no signal"" on the HDMI input • no height or rotation adjustments; flimsy base; awkward location of OSD buttons; no DVI connection (no DVI to HDMI cable included) • Weak stand, awful menu controls, no audio out, no USB ports, low buzzing sound when brightness turned down • This monitor is so darn tall it strains my neck a bit to view it - but that's simply a natural consequence of its size • Doesn't come with a DVI to HDMI cable that you will need to run this with a computer to get a good picture (don't use the vga port)

More Related