The JDPA Sentiment Corpus for the Automotive Domain

The JDPA Sentiment Corpusfor the Automotive Domain Jason S. Kessler Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Indiana University

Overview • 335 blog posts containing opinions about cars • 223K tokens of blog data • Goal of annotation project: • Examples of how words interact to evaluate entities • Annotations encode these interactions • Entities are invoked physical objects and their properties • Not just cars, car parts • People, locations, organizations, times

Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”

Outline • Motivating example • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

TARGET Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART TARGET He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

Honda Civic. John recently purchased a PERSON CAR REFERS-TO REFERS-TO had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART PART-OF PART-OF He also considered a BMW and was very grippy. PERSON CAR FEATURE-OF PART-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

LESS MORE Honda Civic. John recently purchased a PERSON CAR DIMENSION had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART He also considered a BMW and was very grippy. PERSON CAR which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

LESS Entity-level sentiment: positive MORE Honda Civic. John recently purchased a PERSON CAR TARGET TARGET TARGET DIMENSION REFERS-TO REFERS-TO TARGET had a great a disappointing engine, mildly It stereo, CAR-PART CAR CAR-PART Entity-level sentiment: mixed PART-OF PART-OF TARGET He also considered a BMW and was very grippy. PERSON CAR TARGET FEATURE-OF which, while better stereo. priced highly had a CAR-FEATURE CAR-PART

Entity annotations REFERS-TO John recently purchased a Civic. It had a great engine and was priced well. REFERS-TO John Civic It engine priced CAR- FEATURE PERSON CAR CAR-PART • >20 semantic types from • ACE Entity Mention Detection Task • Generic automotive types

Entity-relation annotations Entity-level sentiment: Positive • Relations between entities • Entity-level sentiment annotations • Sentiment flow between entities through relations • My car has a great engine. • Honda, known for its high standards, made my car. Civic CAR PART-OF FEATURE-OF engine priced CAR- PART CAR- FEATURE

Entity annotation type: statistics • Inter-annotator agreement • Among mentions 83% • Refers-to: 68% • 61K mentions in corpus and 43K entities • 103 documents annotated by around 3 annotators MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH A1: …Kia Rio… A2: …Kia Rio…

Sentiment expressions … a great engine Prior polarity: positive • Evaluations • Target mentions • Prior polarity: • Semantic orientation given target • positive, negative, neutral, mixed highly priced Prior polarity: negative highly spec’ed Prior polarity: positive

Sentiment expressions • Occurrences in corpus: 10K • 13% are multi-word • like no other, get up and go • 49% are headed by adjectives • 22% nouns (damage, good amount) • 20% verbs (likes, upset) • 5% adverbs (highly)

Sentiment expressions • 75% of sentiment expression occurrences have non evaluative uses in corpus • “light” • …the car seemed too light to be safe… • …vehicles in the light truck category… • 77% sentiment expression occurrences are positive • Inter-annotator agreement: • 75% spans, 66% targets, 95% prior polarity

Modifiers -> contextual polarity NEGATORS INTENSIFIERS a car very good not a good car UPWARD a car kind of good not a very good car DOWNARD COMMITTERS NEUTRALIZERS I am the car is sure good if the car is good UPWARD I the car is the car is good I hope suspect good DOWNWARD

Other annotations • Speech events (not sourced from author) • John thinks the car is good. • Comparisons: • Car X has a better engine than car Y. • Handles a variety of cases

Possible tasks • Detecting mentions, sentiment expressions, and modifiers • Identifying targets of sentiment expressions, modifiers • Coreference resolution • Finding part-of, feature-of, etc. relations • Identifying errors/inconsistencies in data

Possible tasks • Exploring how elements interact: • Some idiot thinks this is a good car. • Evaluating unsupervised sentiment systems or those trained on other domains • How do relations between entities transfer sentiment? • The car’s paint job is flawless but the safety record is poor. • Solution to one task may be useful in solving another.

But wait, there’s more! • 180 digital camera blog posts were annotated • Total of 223,001 + 108,593 = 331,594 tokens

Outline • Motivating example • Elements combine to render entity-level sentiment • Overview of annotation types • Some statistics • Potential uses of corpus • Comparison to other resources

Other resources • MPQA Version 2.0 • Wiebe, Wilson and Cardie (2005) • Largely professionally written news articles • Subjective expression • “beliefs, emotions, sentiments, speculations, etc.” • Attitude, contextual sentiment on subjective expressions • Target, source annotations • 226K tokens (JDPA: 332K)

Other resources • Data sets provided by Bing Liu (2004, 2008) • Customer-written consumer electronics product reviews • Contextual sentiment toward mention of product • Comparison annotations • 130K tokens (JDPA: 332K)

Thank you! • Obtaining the corpus: • Research and educational purposes • ICWSM.JDPA.corpus@gmail.com • June 2010 • Annotation guidelines: http://www.cs.indiana.edu/~jaskessl • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden

Top 20 annotations by type

Inter-annotator agreement

The JDPA Sentiment Corpus for the Automotive Domain

The JDPA Sentiment Corpus for the Automotive Domain

Presentation Transcript

Automatic Domain Adaptive Sentiment Analysis Phase 1

The Uterine Corpus

The Uterine Corpus

The Uterine Corpus

The Uterine Corpus

The Uterine Corpus

GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain

The Case for Corpus Profiling

Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis

The Games Corpus

The CareGiver corpus

Domain-based Lexicon Enhancement for Sentiment Analysis

The Economic Sentiment Indicator

DNSSEC for the Domain

The Domain

Twitter Sentiment in Financial Domain

The JDPA Sentiment Corpus for the Automotive Domain

The SIMS Corpus