290 likes | 311 Views
Explore a corpus of 335 blog posts sharing opinions about cars, containing 223K tokens of data. Annotations depict interactions among entities, including cars, car parts, and more. Discover entity-level sentiment analysis for a detailed evaluation.
E N D
The JDPA Sentiment Corpusfor the Automotive Domain Jason S. Kessler Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Indiana University
Overview • 335 blog posts containing opinions about cars • 223K tokens of blog data • Goal of annotation project: • Examples of how words interact to evaluate entities • Annotations encode these interactions • Entities are invoked physical objects and their properties • Not just cars, car parts • People, locations, organizations, times
Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”
Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources
Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART
TARGET Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART TARGET He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART
Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART PART-OF PART-OF He also considered a BMW and was very grippy. PERSON CAR FEATURE-OF PART-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART
LESS MORE Honda Civic. John recently purchased a PERSON CAR DIMENSION had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART
LESS Entity-level sentiment: positive MORE Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET DIMENSION REFERS-TO REFERS-TO TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART Entity-level sentiment: mixed PART-OF PART-OF TARGET He also considered a BMW and was very grippy. PERSON CAR TARGET FEATURE-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART
Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources
Entity annotations REFERS-TO John recently purchased a Civic. It had a great engine and was priced well. REFERS-TO John Civic It engine priced CAR- FEATURE PERSON CAR CAR-PART • >20 semantic types from • ACE Entity Mention Detection Task • Generic automotive types
Entity-relation annotations Entity-level sentiment: Positive • Relations between entities • Entity-level sentiment annotations • Sentiment flow between entities through relations • My car has a great engine. • Honda, known for its high standards, made my car. Civic CAR PART-OF FEATURE-OF engine priced CAR- PART CAR- FEATURE
Entity annotation type: statistics • Inter-annotator agreement • Among mentions 83% • Refers-to: 68% • 61K mentions in corpus and 43K entities • 103 documents annotated by around 3 annotators MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH A1: …Kia Rio… A2: …Kia Rio…
Sentiment expressions … a great engine Prior polarity: positive • Evaluations • Target mentions • Prior polarity: • Semantic orientation given target • positive, negative, neutral, mixed highly priced Prior polarity: negative highly spec’ed Prior polarity: positive
Sentiment expressions • Occurrences in corpus: 10K • 13% are multi-word • like no other, get up and go • 49% are headed by adjectives • 22% nouns (damage, good amount) • 20% verbs (likes, upset) • 5% adverbs (highly)
Sentiment expressions • 75% of sentiment expression occurrences have non evaluative uses in corpus • “light” • …the car seemed too light to be safe… • …vehicles in the light truck category… • 77% sentiment expression occurrences are positive • Inter-annotator agreement: • 75% spans, 66% targets, 95% prior polarity
Modifiers -> contextual polarity NEGATORS INTENSIFIERS a car very good not a good car UPWARD a car kind of good not a very good car DOWNARD COMMITTERS NEUTRALIZERS I am the car is sure good if the car is good UPWARD I the car is the car is good I hope suspect good DOWNWARD
Other annotations • Speech events (not sourced from author) • John thinks the car is good. • Comparisons: • Car X has a better engine than car Y. • Handles a variety of cases
Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources
Possible tasks • Detecting mentions, sentiment expressions, and modifiers • Identifying targets of sentiment expressions, modifiers • Coreference resolution • Finding part-of, feature-of, etc. relations • Identifying errors/inconsistencies in data
Possible tasks • Exploring how elements interact: • Some idiot thinks this is a good car. • Evaluating unsupervised sentiment systems or those trained on other domains • How do relations between entities transfer sentiment? • The car’s paint job is flawless but the safety record is poor. • Solution to one task may be useful in solving another.
But wait, there’s more! • 180 digital camera blog posts were annotated • Total of 223,001 + 108,593 = 331,594 tokens
Outline • Motivating example • Elements combine to render entity-level sentiment • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources
Other resources • MPQA Version 2.0 • Wiebe, Wilson and Cardie (2005) • Largely professionally written news articles • Subjective expression • “beliefs, emotions, sentiments, speculations, etc.” • Attitude, contextual sentiment on subjective expressions • Target, source annotations • 226K tokens (JDPA: 332K)
Other resources • Data sets provided by Bing Liu (2004, 2008) • Customer-written consumer electronics product reviews • Contextual sentiment toward mention of product • Comparison annotations • 130K tokens (JDPA: 332K)
Thank you! • Obtaining the corpus: • Research and educational purposes • ICWSM.JDPA.corpus@gmail.com • June 2010 • Annotation guidelines: http://www.cs.indiana.edu/~jaskessl • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden