The jdpa sentiment corpus for the automotive domain
Download
1 / 28

The JDPA Sentiment Corpus for the Automotive Domain - PowerPoint PPT Presentation


  • 672 Views
  • Uploaded on

The JDPA Sentiment Corpus for the Automotive Domain . Jason S. Kessler. Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates. Indiana University. Overview. 335 blog posts containing opinions about cars 223K tokens of blog data Goal of annotation project:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The JDPA Sentiment Corpus for the Automotive Domain' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The jdpa sentiment corpus for the automotive domain

The JDPA Sentiment Corpusfor the Automotive Domain

Jason S. Kessler

Miriam Eckert, Lyndsie Clark, Nicolas Nicolov

J.D. Power and Associates

Indiana University


Overview
Overview

  • 335 blog posts containing opinions about cars

    • 223K tokens of blog data

  • Goal of annotation project:

    • Examples of how words interact to evaluate entities

    • Annotations encode these interactions

  • Entities are invoked physical objects and their properties

    • Not just cars, car parts

    • People, locations, organizations, times


Excerpt from the corpus
Excerpt from the corpus

“last night was nice. sean bought me caribou and we went to my house to watch the baseball game …

“… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”


Outline
Outline

  • Motivating example

  • Overview of annotation types

    • Some statistics

  • Potential uses of corpus

  • Comparison to other resources


The jdpa sentiment corpus

Honda Civic.

John recently purchased a

PERSON

CAR

REFERS-TO

REFERS-TO

had a

great

a

disappointing

engine,

mildly

It

stereo,

CAR-PART

CAR

CAR-PART

He also considered a

BMW

and was

very

grippy.

PERSON

CAR

which,

while

better

stereo.

priced

highly

had a

CAR-FEATURE

CAR-PART


The jdpa sentiment corpus

TARGET

Honda Civic.

John recently purchased a

PERSON

CAR

TARGET

TARGET

TARGET

had a

great

a

disappointing

engine,

mildly

It

stereo,

CAR-PART

CAR

CAR-PART

TARGET

He also considered a

BMW

and was

very

grippy.

PERSON

CAR

which,

while

better

stereo.

priced

highly

had a

CAR-FEATURE

CAR-PART


The jdpa sentiment corpus

Honda Civic.

John recently purchased a

PERSON

CAR

REFERS-TO

REFERS-TO

had a

great

a

disappointing

engine,

mildly

It

stereo,

CAR-PART

CAR

CAR-PART

PART-OF

PART-OF

He also considered a

BMW

and was

very

grippy.

PERSON

CAR

FEATURE-OF

PART-OF

which,

while

better

stereo.

priced

highly

had a

CAR-FEATURE

CAR-PART


The jdpa sentiment corpus

LESS

MORE

Honda Civic.

John recently purchased a

PERSON

CAR

DIMENSION

had a

great

a

disappointing

engine,

mildly

It

stereo,

CAR-PART

CAR

CAR-PART

He also considered a

BMW

and was

very

grippy.

PERSON

CAR

which,

while

better

stereo.

priced

highly

had a

CAR-FEATURE

CAR-PART


The jdpa sentiment corpus

LESS

Entity-level sentiment: positive

MORE

Honda Civic.

John recently purchased a

PERSON

CAR

TARGET

TARGET

TARGET

DIMENSION

REFERS-TO

REFERS-TO

TARGET

had a

great

a

disappointing

engine,

mildly

It

stereo,

CAR-PART

CAR

CAR-PART

Entity-level sentiment: mixed

PART-OF

PART-OF

TARGET

He also considered a

BMW

and was

very

grippy.

PERSON

CAR

TARGET

FEATURE-OF

which,

while

better

stereo.

priced

highly

had a

CAR-FEATURE

CAR-PART


Outline1
Outline

  • Motivating example

  • Overview of annotation types

    • Some statistics

  • Potential uses of corpus

  • Comparison to other resources


The jdpa sentiment corpus

Entity annotations

REFERS-TO

John recently purchased a Civic. It had a great engine and was priced well.

REFERS-TO

John

Civic

It

engine

priced

CAR-

FEATURE

PERSON

CAR

CAR-PART

  • >20 semantic types from

    • ACE Entity Mention Detection Task

    • Generic automotive types


The jdpa sentiment corpus

Entity-relation annotations

Entity-level sentiment: Positive

  • Relations between entities

  • Entity-level sentiment annotations

  • Sentiment flow between entities through relations

    • My car has a great engine.

    • Honda, known for its high standards, made my car.

Civic

CAR

PART-OF

FEATURE-OF

engine

priced

CAR-

PART

CAR-

FEATURE


The jdpa sentiment corpus

Entity annotation type: statistics

  • Inter-annotator agreement

    • Among mentions 83%

    • Refers-to: 68%

  • 61K mentions in corpus and 43K entities

  • 103 documents annotated by around 3 annotators

MATCH

A1: …Kia Rio…

A2: …Kia Rio…

NOT A MATCH

A1: …Kia Rio…

A2: …Kia Rio…


The jdpa sentiment corpus

Sentiment expressions

… a

great

engine

Prior polarity: positive

  • Evaluations

  • Target mentions

  • Prior polarity:

    • Semantic orientation given target

    • positive, negative, neutral, mixed

highly

priced

Prior polarity: negative

highly

spec’ed

Prior polarity: positive


Sentiment expressions
Sentiment expressions

  • Occurrences in corpus: 10K

  • 13% are multi-word

    • like no other, get up and go

  • 49% are headed by adjectives

  • 22% nouns (damage, good amount)

  • 20% verbs (likes, upset)

  • 5% adverbs (highly)


Sentiment expressions1
Sentiment expressions

  • 75% of sentiment expression occurrences have non evaluative uses in corpus

  • “light”

    • …the car seemed too light to be safe…

    • …vehicles in the light truck category…

  • 77% sentiment expression occurrences are positive

  • Inter-annotator agreement:

    • 75% spans, 66% targets, 95% prior polarity


Modifiers contextual polarity
Modifiers -> contextual polarity

NEGATORS

INTENSIFIERS

a

car

very

good

not

a good

car

UPWARD

a

car

kind of

good

not

a very

good car

DOWNARD

COMMITTERS

NEUTRALIZERS

I am

the car is

sure

good

if

the car is

good

UPWARD

I

the car is

the car is

good

I hope

suspect

good

DOWNWARD


Other annotations
Other annotations

  • Speech events (not sourced from author)

    • John thinks the car is good.

  • Comparisons:

    • Car X has a better engine than car Y.

    • Handles a variety of cases


Outline2
Outline

  • Motivating example

  • Overview of annotation types

    • Some statistics

  • Potential uses of corpus

  • Comparison to other resources


Possible tasks
Possible tasks

  • Detecting mentions, sentiment expressions, and modifiers

  • Identifying targets of sentiment expressions, modifiers

  • Coreference resolution

  • Finding part-of, feature-of, etc. relations

  • Identifying errors/inconsistencies in data


Possible tasks1
Possible tasks

  • Exploring how elements interact:

    • Some idiot thinks this is a good car.

  • Evaluating unsupervised sentiment systems or those trained on other domains

  • How do relations between entities transfer sentiment?

    • The car’s paint job is flawless but the safety record is poor.

  • Solution to one task may be useful in solving another.


But wait there s more
But wait, there’s more!

  • 180 digital camera blog posts were annotated

  • Total of 223,001 + 108,593 = 331,594 tokens


Outline3
Outline

  • Motivating example

    • Elements combine to render entity-level sentiment

  • Overview of annotation types

    • Some statistics

  • Potential uses of corpus

  • Comparison to other resources


Other resources
Other resources

  • MPQA Version 2.0

    • Wiebe, Wilson and Cardie (2005)

    • Largely professionally written news articles

    • Subjective expression

      • “beliefs, emotions, sentiments, speculations, etc.”

    • Attitude, contextual sentiment on subjective expressions

    • Target, source annotations

    • 226K tokens (JDPA: 332K)


Other resources1
Other resources

  • Data sets provided by Bing Liu (2004, 2008)

    • Customer-written consumer electronics product reviews

    • Contextual sentiment toward mention of product

    • Comparison annotations

    • 130K tokens (JDPA: 332K)


Thank you
Thank you!

  • Obtaining the corpus:

    • Research and educational purposes

    • ICWSM.JDPA.corpus@gmail.com

    • June 2010

    • Annotation guidelines:

      http://www.cs.indiana.edu/~jaskessl

  • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden