Download
stylistics in customer reviews of cultural objects n.
Skip this Video
Loading SlideShow in 5 Seconds..
Stylistics in Customer Reviews of Cultural Objects PowerPoint Presentation
Download Presentation
Stylistics in Customer Reviews of Cultural Objects

Stylistics in Customer Reviews of Cultural Objects

95 Views Download Presentation
Download Presentation

Stylistics in Customer Reviews of Cultural Objects

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. THE ANDREW W. MELLON FOUNDATION Stylistics in Customer Reviews of Cultural Objects Xiao Hu, J. Stephen Downie The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign

  2. Agenda • Motivation • Customer reviews in epinions.com • Experiments • Genre classification • Rating classification • Usage classification • Feature studies • Conclusions & Future Work

  3. Motivation • Online customer reviews on culture objects: • User-generated user-centered retrieval • Detailed descriptions contextual info. • Large amount rich resource • Self-organized ground truth • Text mining: • Mature techniques and Handy tools • Review mining: a place to play Stylistics Text Analysis!

  4. Description 1 Description 1 Description 1 D1 D2 D3 D1 D2 D3 Description 1 Description 1 Description 1 Motivation Classify Reviews Identify User Descriptions Connect to Objects Class 1 Customer Reviews Genres Ratings Usages User-centered access points Prominent Features Class 2 Epinions.com Amazon.com …..

  5. Customer Reviews • Published on www.epinions.com • Focused on the book, movie and music • Each review associated with: • a genre label • a numerical quality rating • a recommended usage (for music reviews)

  6. numerical rating associated full text, to be analyzed recommended usage

  7. 28 Major Genre Categories Jazz, Rock, Country, Classical, Blues, Gospel, Punk, .… Renaissance, Medieval, Baroque, Romantic, … Genre Taxonomy (music)

  8. Experiments • to build and evaluate a prototype system that could automatically : • predict thegenre of the work being reviewed • predict thequality rating assigned to the reviewed item • predict theusage recommended by the reviewer • discover distinctive features contributing to each of the above

  9. Models and Methods • Prediction problem: • Naïve Bayesian (NB) Classifier • Computationally efficient • Empirically effective • Hierarchical clustering (for usage prediction only) • Feature analysis: • Frequent pattern mining • Naïve Bayesian feature ranking

  10. Data Preprocessing • HTML tags were stripped out; • Stop words were NOT stripped out; • Punctuation was NOT stripped out; • They may contain stylistic information • Tokens were stemmed

  11. Genre Classifications • Data set

  12. Genres Examined

  13. Genre Classification Results 5 fold random cross validation for book and movie reviews 3 fold random cross validation for music reviews

  14. Confusion : Book Reviews

  15. Confusion : Movie Reviews

  16. Confusion : Music Reviews

  17. Rating Classification • Five-class classification • 1 star vs. 2 stars vs. 3 stars vs. 4 stars vs 5 stars • Binary Group classification • 1 star + 2 stars vs. 4 stars + 5 stars • ad extremis classification • 1 star vs. 5 stars 5 fold random cross validation for Book and Movie review experiments 5 fold cross validation for Music review experiments

  18. Rating : Book Reviews

  19. Rating : Movie Reviews

  20. Rating : Music Reviews

  21. Confusion : Book Reviews

  22. Confusion : Movie Reviews

  23. Confusion : Music Reviews

  24. Usage Classification • Each music review has one usage suggested by the reviewer • It can be chosen from a ready-made list of 13 usages • Chose the most popular 11 usages for experiments

  25. Usage Categories and Counts

  26. Data and initial result 10 fold cross validation

  27. Confusion matrix

  28. Usage super-classes • Frequent confusions: a measure of similarity • Hierarchical clustering based on the confusion matrix

  29. R1 Relaxing R2 Stimulating S1 S2 Hierarchical clustering Going to sleep Listening Reading or studying Romancing Cleaning the house At work Hanging out with friends Getting ready to go out Driving Waking up Exercising

  30. Classifications on usage super-classes 10 fold cross validation

  31. Feature studies • What makes the classes distinguishable? • What are important features? • How important are they? • Two techniques applied • Frequent Pattern Mining • Naïve Bayesian Feature Ranking • Focus on music reviews

  32. Items Transactions Frequent Pattern Mining (FPM) • Originally used to discover association rules • Finds patterns consisting of items that frequently occur together in individual transactions • Items =candidate words (terms) depending on specific questions • Transactions = review sentences

  33. Positive and negative descriptive patterns • Recall: rating classification on music reviews

  34. Positive and negative descriptive patterns Mining frequent descriptive patterns in positive and negative reviews adjectives, adverbs and verbs, negatives no nouns, no stopwords

  35. Single term patterns Good = Bad?! Digging deeper ----

  36. good in a negative context Negation:“Nothing is good.” “It just doesn't sound good.” Song titles: “Good Charlotte, you make me so mad.” “Feels So Goodis dated and reprehensibly bad.” Rhetoric: “And this is a good ruiner: …” “What a waste of my good two dollars…” Faint praise: “…the only good thing… is the packaging.” Expressions: “You all have heard … the good old cliché.”

  37. Double term patterns Good  Bad?! Digging deeper and deeper --

  38. Triple term patterns

  39. Noun patterns in genre classification • Recall: genre classification on music reviews

  40. Noun patterns in genre classification • Studied four popular genres • Only nouns considered

  41. Single term patterns

  42. Double term patterns

  43. > 0, di is in Cj < 0, di is not in Cj Naïve Bayesian Feature Ranking (NBFR) • Based on NB text categorization model Prediction in binary classification cases:

  44. Features in usage super-classes • Recall: classification on usage super-classes

  45. Top-ranked terms in super-classes Terms in ()’s were manually added for clarity

  46. Artist-usage relationship • Binomial exact test on artists with >10 reviews (p < 0.05)

  47. Data Preprocessing NB Classifier Implementation & T2K (demo) • Text-to-Knowledge (T2K) Toolkit A text mining framework Ready-to-use modules and itineraries Natural Language Processing tools integrated Supporting fast prototyping of text mining

  48. Conclusions • Text analysis of user-generated reviews on culture objects • NB on genre, rating, and usage classification • Feature studies: FPM and NBFR • Customer reviews are good resources for connecting users’ opinions to cultural objects and thus facilitating information access via novel, user-oriented facets.

  49. Future work • More text mining techniques • Other critical text • blogs, wikis, etc • Feature studies • other kinds of features

  50. THE ANDREW W. MELLON FOUNDATION Questions? IMIRSEL Thank you!