The JDPA Sentiment Corpus for the Automotive Domain - PowerPoint PPT Presentation

libitha
the jdpa sentiment corpus for the automotive domain n.
Skip this Video
Loading SlideShow in 5 Seconds..
The JDPA Sentiment Corpus for the Automotive Domain PowerPoint Presentation
Download Presentation
The JDPA Sentiment Corpus for the Automotive Domain

play fullscreen
1 / 28
Download Presentation
The JDPA Sentiment Corpus for the Automotive Domain
704 Views
Download Presentation

The JDPA Sentiment Corpus for the Automotive Domain

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. The JDPA Sentiment Corpusfor the Automotive Domain Jason S. Kessler Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Indiana University

  2. Overview • 335 blog posts containing opinions about cars • 223K tokens of blog data • Goal of annotation project: • Examples of how words interact to evaluate entities • Annotations encode these interactions • Entities are invoked physical objects and their properties • Not just cars, car parts • People, locations, organizations, times

  3. Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”

  4. Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

  5. Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

  6. TARGET Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART TARGET He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

  7. Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART PART-OF PART-OF He also considered a BMW and was very grippy. PERSON CAR FEATURE-OF PART-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

  8. LESS MORE Honda Civic. John recently purchased a PERSON CAR DIMENSION had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

  9. LESS Entity-level sentiment: positive MORE Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET DIMENSION REFERS-TO REFERS-TO TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART Entity-level sentiment: mixed PART-OF PART-OF TARGET He also considered a BMW and was very grippy. PERSON CAR TARGET FEATURE-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

  10. Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

  11. Entity annotations REFERS-TO John recently purchased a Civic. It had a great engine and was priced well. REFERS-TO John Civic It engine priced CAR- FEATURE PERSON CAR CAR-PART • >20 semantic types from • ACE Entity Mention Detection Task • Generic automotive types

  12. Entity-relation annotations Entity-level sentiment: Positive • Relations between entities • Entity-level sentiment annotations • Sentiment flow between entities through relations • My car has a great engine. • Honda, known for its high standards, made my car. Civic CAR PART-OF FEATURE-OF engine priced CAR- PART CAR- FEATURE

  13. Entity annotation type: statistics • Inter-annotator agreement • Among mentions 83% • Refers-to: 68% • 61K mentions in corpus and 43K entities • 103 documents annotated by around 3 annotators MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH A1: …Kia Rio… A2: …Kia Rio…

  14. Sentiment expressions … a great engine Prior polarity: positive • Evaluations • Target mentions • Prior polarity: • Semantic orientation given target • positive, negative, neutral, mixed highly priced Prior polarity: negative highly spec’ed Prior polarity: positive

  15. Sentiment expressions • Occurrences in corpus: 10K • 13% are multi-word • like no other, get up and go • 49% are headed by adjectives • 22% nouns (damage, good amount) • 20% verbs (likes, upset) • 5% adverbs (highly)

  16. Sentiment expressions • 75% of sentiment expression occurrences have non evaluative uses in corpus • “light” • …the car seemed too light to be safe… • …vehicles in the light truck category… • 77% sentiment expression occurrences are positive • Inter-annotator agreement: • 75% spans, 66% targets, 95% prior polarity

  17. Modifiers -> contextual polarity NEGATORS INTENSIFIERS a car very good not a good car UPWARD a car kind of good not a very good car DOWNARD COMMITTERS NEUTRALIZERS I am the car is sure good if the car is good UPWARD I the car is the car is good I hope suspect good DOWNWARD

  18. Other annotations • Speech events (not sourced from author) • John thinks the car is good. • Comparisons: • Car X has a better engine than car Y. • Handles a variety of cases

  19. Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

  20. Possible tasks • Detecting mentions, sentiment expressions, and modifiers • Identifying targets of sentiment expressions, modifiers • Coreference resolution • Finding part-of, feature-of, etc. relations • Identifying errors/inconsistencies in data

  21. Possible tasks • Exploring how elements interact: • Some idiot thinks this is a good car. • Evaluating unsupervised sentiment systems or those trained on other domains • How do relations between entities transfer sentiment? • The car’s paint job is flawless but the safety record is poor. • Solution to one task may be useful in solving another.

  22. But wait, there’s more! • 180 digital camera blog posts were annotated • Total of 223,001 + 108,593 = 331,594 tokens

  23. Outline • Motivating example • Elements combine to render entity-level sentiment • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

  24. Other resources • MPQA Version 2.0 • Wiebe, Wilson and Cardie (2005) • Largely professionally written news articles • Subjective expression • “beliefs, emotions, sentiments, speculations, etc.” • Attitude, contextual sentiment on subjective expressions • Target, source annotations • 226K tokens (JDPA: 332K)

  25. Other resources • Data sets provided by Bing Liu (2004, 2008) • Customer-written consumer electronics product reviews • Contextual sentiment toward mention of product • Comparison annotations • 130K tokens (JDPA: 332K)

  26. Thank you! • Obtaining the corpus: • Research and educational purposes • ICWSM.JDPA.corpus@gmail.com • June 2010 • Annotation guidelines: http://www.cs.indiana.edu/~jaskessl • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden

  27. Top 20 annotations by type

  28. Inter-annotator agreement