Justin martineau
Download
1 / 39

Automatic Domain Adaptive Sentiment Analysis Phase 1 - PowerPoint PPT Presentation


  • 508 Views
  • Uploaded on

Justin Martineau. Automatic Domain Adaptive Sentiment Analysis Phase 1. Outline. Introduction Problem Definition Thesis Statement Motivation Background and Related Work Challenges Approaches Research Plan Approach Evaluation Timeline Conclusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Domain Adaptive Sentiment Analysis Phase 1' - niveditha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Justin martineau

Justin Martineau

Automatic Domain Adaptive Sentiment Analysis Phase 1


Outline
Outline

  • Introduction

    • Problem Definition

    • Thesis Statement

    • Motivation

  • Background and Related Work

    • Challenges

    • Approaches

  • Research Plan

    • Approach

    • Evaluation

    • Timeline

  • Conclusion


Problem definition

1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion

Problem Definition

  • Sentiment Analysis is the automatic detection and measurement of sentiment in text segments by machines.

  • 3 Sub Tasks

    • Objective vs. Subjective

    • Topic Detection

    • Positive vs. Negative

  • Commonly applied to web data

  • Very Domain Dependent


Sentiment analysis example

1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion

Sentiment Analysis Example


Thesis statement

1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion

Thesis Statement

This dissertation will develop and evaluate techniques to discover and encode domain-specific, domain-independent, and semantic knowledge to improve both single and multiple domain sentiment analysis problems on textual data given low labeled data conditions.


Motivation private sector

1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion

Motivation: Private Sector

  • Market Research

    • Surveys

    • Focus Groups

    • Feature Analysis

    • Customer targeting (Free samples etc…)

  • Consumer Sentiment Search

    • Compare pros and cons

    • Overall opinion of products/services


Motivation public sector

1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion

Motivation: Public Sector

  • Political

    • Alternative Polling

    • Determine popular support for legislation

    • Choose campaign issues

  • National Security

    • Detect individuals at risk for radicalization

    • Determine local sentiment about US policy

    • Determine local values and sentimental icons

    • Portray actions positively using local flavor

  • Public Health

    • Detect potential suicide victims

    • Detect mentally unstable people


Challenges

1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion

Challenges

  • Text Representation

  • Unedited Text

  • Sentiment Drift

  • Negation

  • Sarcasm

  • Sentiment Target Identification

  • Granularity

  • Domain Dependence


Domain dependence 1 domain dependent sentiment

1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion

Domain Dependence 1Domain Dependent Sentiment

  • The same sentence can mean two very different things in different domains

    • Ex: “Read the book.” <= Good for books, bad for movies

    • Ex: “Jolting, heart pounding, You’re in for one hell of a bumpy ride!” Good for movies and books, bad for cars.

  • Sentimental word associations change with domain

    • Fuzzy cameras are bad, but fuzzy teddy bears are good.

    • Big trucks are good, but big iPods are bad.

    • Bad is bad, but bad villains are good.


Domain dependence 2 endless possibilities

1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion

Domain Dependence 2 Endless Possibilities


Domain dependence 3 organization and granularity

1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion

Domain Dependence 3Organization and Granularity


Theory of the three signals

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

Theory of the Three Signals

  • Authors communicate messages using three types of signals

    • Domain-Specific Signals

    • Domain-Independent Signals

    • Semantic Signals

  • More specific signals are generally more powerful than more generic signals


Domain specific signals

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

Domain-Specific Signals

  • Fuzzy teddy bears

  • Sharp pictures

  • Sharp knives

  • Smooth rides

  • New ideas

  • Fast servers

  • Fast cars

  • Slow roasted burgers

  • Slow motion

  • Small cameras

  • Big cars

  • Dependent on problem and domain

  • Considered more useful by readers

    • Tells what is good or bad about topic

    • Domain knowledge determines sentiment orientation

  • Very strong in context, but weak or misleading out of context

  • Can cause over generalization error when overvalued

  • New domain-specific signal words are ignored in CDT


Proposed approach

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

Proposed Approach

  • Sentiment Search is more than just a classification problem

  • Detecting and Using the three signals

    • Dynamic Domain Adapting Classifiers

    • Generic Feature Detection using unlabeled data

    • Semantic Feature Spaces


Dynamic domain adapting classifiers

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

Dynamic Domain Adapting Classifiers

  • A (preferably domain-independent) model is built using computationally intense algorithms before query time on a set of labeled data.

  • Users interact at a query box level

  • Query results define the domain of interest

  • Domain specific adaptations are calculated

    • compares how the domain of interest is different from known cases

    • uses semantic knowledge about word senses and relations

    • must be fast algorithm: users are waiting

  • Domain specific adaptations are woven into the domain independent model

    • resulting model is temporary

    • used to classify documents as positive, negative, or objective

  • Sentimental search results are processed for significant components and presented for human consumption


Overview

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

Query

Business

Intelligence

Query Results

Define a new Domain

Lucene

Index

Labeled data from

known domain

Dynamic

Domain

Adapter

Component

Analysis

Semantic

Knowledge

General

Model

Context

Specific

Model

Sentiment

Classifier

Sentimental

Search

Results

-

+

Overview

Key: User Level, Source Data, Knowledge,Labeled Data

Algorithms, Search Results


Subjective context scoring
Subjective Context Scoring

  • Multiply:

    • PMI(Word,Context)

    • IDF

    • Co-occurance with know generic sentiment seed words times their bias (From movie reviews)

  • Seeds:

    • bad,worst,stupid,ridiculous, terrible,poorly

    • great,best,perfect,wonderful, excellent,effective


Rocchio baseline
Rocchio Baseline

  • Rocchio - Query Expansion algorithm for search

    • Similar goals to ours, find more relevant words

    • Does not account for sentiment

  • The new query is a weight sum of

    • Matching document vectors

    • Query vector

    • Non-matching document vectors (negative value).







Ipod according to tfidf
iPod according to TFIDF

Positive Sentiment In Movie Reviews

Negative Sentiment in Movie Reviews


Sentimental context
Sentimental Context

  • Components:

    • PMI(Word,Context)

    • TF

    • IDF

    • Log( Actual Co Occur of Word,Seed, context / Prob by chance)

  • Values:

    • Abnormality to other docs

    • Popular words in context

    • Rare words in the corpus

    • Words that occur with sentiment words in the query documents










Google hits battery related
Google Hits (Battery Related):

  • iPod battery good ~ 13.5 Mill

  • iPod battery bad ~ 900 K

  • iPod nano battery good ~ 3 Mill

  • iPod nano battery bad ~ 785 K

  • iPod shuffle battery good ~ 1.6 Mill

  • iPod shuffle battery bad ~ 230 K

  • iPod shuffle battery price good ~ 2.6 Mill (not a typo)

  • iPod shuffle battery price bad ~ 230 K

  • iPod battery price good ~ 13.5 Mill

  • iPod battery price bad ~ 850 K

  • iPod nano battery price good ~ 3 Mill

  • iPod nano battery price bad ~ 785 K


Summary

1. Intro - 2. Related Work - 3. Research Plan -4. Conclusion

Summary

  • Interesting problem with many potential applications

  • Domain dependence is the core challenge

  • The keys to success are:

    • Vast quantities of unlabeled data

    • Semantic knowledge from freely available sources

    • Semantics must guide and influence but not overrule the statistics




Pmi pointwise mutual information

1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion

PMI - Pointwise Mutual Information

  • a.k.a. Specific Mutual Information

  • Do 2 variables occur more often with each other than chance?


ad