Deriving marketing intelligence from online discussion
Download
1 / 53

Deriving Marketing Intelligence from Online Discussion - PowerPoint PPT Presentation


  • 268 Views
  • Uploaded on

Deriving Marketing Intelligence from Online Discussion Natalie Glance and Matthew Hurst CMU Information Retrieval Seminar, April 19, 2006 Overview Motivation Content Segment: The Blogosphere Structural Aspects Topical Aspects Deriving market intelligence Conclusion Motivation Social

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Deriving Marketing Intelligence from Online Discussion' - ostinmannual


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Deriving marketing intelligence from online discussion l.jpg

Deriving Marketing Intelligence from Online Discussion

Natalie Glance and Matthew Hurst

CMU Information Retrieval Seminar, April 19, 2006

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Overview l.jpg
Overview

  • Motivation

  • Content Segment: The Blogosphere

    • Structural Aspects

    • Topical Aspects

  • Deriving market intelligence

  • Conclusion

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Motivation l.jpg
Motivation

Social

Media

Mobile phone

data

The celly 31 is awesome, but the screen is a bit too dim.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide4 l.jpg

The Blogosphere

© 2006 Nielsen BuzzMetrics, A VNU business affiliate



Slide6 l.jpg

Profile Analysis

Hurst, “24 Hours in the Blogosphere”, 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Hypotheses l.jpg
Hypotheses

  • Different hosts attract users with different capacity to disclose profile information (?)

  • Blogspot users are more disposed to disclose information (?)

  • Different interface implementations perform differently at extracting/encouraging information from users (?)

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Per capita spaces l.jpg
Per Capita: Spaces

  • variance in average age

  • variance in profiles with age

  • variance in per capita bloggers

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Per capita blogspot l.jpg
Per Capita: Blogspot

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide10 l.jpg

The graphical structure of the blogosphere

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Graphical structure of the blogosphere l.jpg
Graphical Structure of the Blogosphere

  • Citations between blogs indicate some form of relationship, generally topical.

  • A link is certainly evidence of awareness, consequently reciprocal links are evidence of mutual awareness.

  • Mutual awareness suggests some commonality, perhaps common interests.

  • The graph of reciprocal links can be considered a social network.

  • Areciprocal links suggest topical relationships, but not social ones.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate



Graph layout l.jpg
Graph Layout

  • Hierarchical Force Layout

    • Graph has 2 types of links: reciprocal links and areciprocal links

    • Create set of partitions P where each partition is a connected component in the reciprocal graph.

    • Create a graph whose nodes are the members of P and whose edges are formed from areciprocal links between (nodes within) member of P.

    • Layout the partition graph.

    • Layout each partition.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide14 l.jpg

Japanese

r = 2

p = 25

cooking

knitting

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide15 l.jpg

r = 2

p = 1

kbcafe/rss

scoble

engadget

instapundit

boingboing

gizmodo

powerline

michellemalkin

crooksandliars

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide16 l.jpg

r = 3

p = 100

technology

The English blogosphere is political.

social/politics

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Political blogosphere l.jpg
Political Blogosphere

L. Adamic and N. Glance, “The Political Blogosphere and the 2004 U.S. Election: Divided They Blog”, 2nd Annual Workshop on the Weblogging Ecosystem, Chiba, Japan, 2005.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Political blogs readership l.jpg
Political Blogs & Readership

  • Pew Internet & American Life Project Report, January 2005, reports:

    • 63 million U.S. citizens use the Internet to stay informed about politics (mid-2004, Pew Internet Study)

    • 9% of Internet users read political blogs preceding the 2004 U.S. Presidential Election

  • 2004 Presidential Campaign Firsts

    • Candidate blogs: e.g. Dean’s blogforamerica.com

    • Successful grassroots campaign conducted via websites & blogs

    • Bloggers credentialed as journalists & invited to nominating conventions

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Research goals questions l.jpg
Research Goals & Questions

  • Are we witnessing a cyberbalkination of the Internet?

    • Linking behavior of blogs may make it easier to read only like-minded bloggers

    • On the other hand, bloggers systematically react to and comment on each others’ posts, both in agreement and disagreement (Balkin 2004)

  • Goal: study the linking behavior & discussion topics of political bloggers

    • Measure the degree of interaction between liberal and conservative bloggers

    • Find any differences in the structure of the two communities: is there a significant difference in “cohesiveness” in one community over another?

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


The greater political blogosphere l.jpg
The Greater Political Blogosphere

  • Citation graph of greater political blogosphere

    • Front page of each blog crawled in February 2005

    • Directed link between blog A and blog B, if A links to B

    • Method biases blogroll/sidebar links (as opposed to links in posts)

  • Results

    • 91% of links point to blog of same persuasion (liberal vs. conservative)

    • Conservative blogs show greater tendency to link

      • 82% of conservative blogs are linked to at least once; 84% link to at least one other blog

      • 67% of liberal blogs are linked to at least once; 74% link to at least one other blog

    • Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative

    • Higher proportion of liberal blogs that are not linked to at all

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide21 l.jpg

Citations between blogs extracted from posts

(Aug 29th – Nov 15th, 2004)

  • All citations between A-list blogs in 2 months preceding the 2004 election

  • Citations between A-list blogs with at least 5 citations in both directions

  • Edges further limited to those exceeding 25 combined citations

Only 15% of the citations bridge communities


Are political blogs echo chambers l.jpg
Are political blogs echo chambers?

  • Performed pairwise comparison of URL citations and phrase usage from blog posts

  • Link-based similarity measure

    • Cosine similarity: cos(A,B) = vA.vB/(||vA||*||vB||), where vA is a binary vector. Each entry = 1 or 0, depending on whether blog A cites a particular URL

    • Average similarity(L,R) = 0.03; cos(R,R) = 0.083; cos(L,L) = 0.087

  • Phrase-based similarity measure

    • Extracted set of phrases, informative wrt background model

    • Entries in vA are TF*IDF weight for each phrase = (# of phrase mentions by blog)*log[(# blogs)/(# blogs citing the phrase)]

    • Average similarity(L,R) = 0.10; cos(R,R) = 0.54; cos(L,L) = 0.57

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Influence on mainstream media l.jpg
Influence on mainstream media

Notable examples of blogs breaking a story

  • Swiftvets.com anti-Kerry video

    • Bloggers linked to this in late July, keeping accusations alive

    • Kerry responded in late August, bringing mainstream media coverage

  • CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War

    • Powerline broke the story on Sep. 9th, launching flurry of discussion

    • Dan Rather apologized later in the month

  • “Was Bush Wired?”

    • Salon.com asked the question first on Oct. 8th, echoed by Wonkette & PoliticalWire.com

    • MSM follows-up the next day

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide24 l.jpg

Deriving Market Intelligence

N. Glance, M. Hurst, K. Nigam, M. Siegler, R. Stockton and T. Tomokiyo. Deriving Marketing Intelligence from Online Discussion. Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005).

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Automating market research l.jpg
Automating Market Research

  • Brand managers want to know:

    • Do consumers prefer my brand to another?

    • Which features of my product are most valued?

    • What should we change or improve?

    • Alert me when a rumor starts to spread!

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Comparative mentions halo 2 l.jpg
Comparative mentions: Halo 2

‘halo 2’

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Case study pdas l.jpg
Case Study: PDAs

  • Collect online discussion in target domain (order of 10K to 10M posts)

  • Classify discussion into domain-specific topics (brand, feature, price)

  • Perform base analysis over combination of topics: buzz, sentiment/polarity, influencer identification

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Dell axim 11 5 buzz 3 4 polarity l.jpg
Dell Axim, 11.5% buzz, 3.4 polarity

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Interactive analysis l.jpg
Interactive analysis

  • Top-down approach: drill down from aggregate findings to drivers of those findings

  • Global view of data used to determine focus

  • Model parent and child slice

  • Use data driven methods to identify what distinguishes one data set from the other

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide30 l.jpg

SD card

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Social network analysis for discussion about the dell axim l.jpg
Social network analysis for discussion about the Dell Axim

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Drilling down to sentence level l.jpg
Drilling down to sentence level

  • Discussion centers on poor quality of sound hardware & IR ports

    • “It is very sad that the Axim’s audio AND Irda output are so sub-par, because otherwise it is a great Pocket PC.”

    • “Long story made short: the Axim has a considerably inferior audio output than any other Pocket PC we have ever tested.”

    • “When we tested it we found that there was a problem with the audio output of the Axim.”

    • “The Dell Axim has a lousy IR transmitter AND a lousy headphone jack.”

  • Note: these examples are automatically extracted.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Technology l.jpg
Technology

  • Data Collection:

    • Document acquisition and analysis

    • Classification (relevance/topic)

  • Topical Analysis:

    • Topic classification using a hierarchy of topic classifiers operating at sentence level.

    • Phrase mining and association.

  • Intentional Analysis:

    • Interpreting sentiment/polarity

  • Community analysis

  • Aggregate metrics

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Topical analysis l.jpg
Topical Analysis

  • Hierarchy of topics with specific ‘dimensions’:

    • Brand dimension

      • Pocket PC:

        • Dell Axim

        • Toshiba

          • e740

      • Palm

        • Zire

        • Tungsten

    • Feature dimension:

      • Components

        • Battery

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Topical analysis35 l.jpg
Topical Analysis

  • Each topic is a classifier, e.g. a boolean expression with sentence and/or message scoped sub-expressions.

  • Measured precision of classifier allows for projection of raw counts.

  • Intersection of typed dimensions allows for a basic approach to association (e.g. find sentences discussing the battery of the Dell Axim).

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity what is it l.jpg
Polarity: What is it?

  • Opinion, evaluation/emotional state wrt some topic.

    • It is excellent

    • I love it.

  • Desirable or undesirable condition

    • It is broken (objective, but negative).

  • We use a lexical/syntactic approach.

  • Cf. related work on boolean document classification task using supervised classifiers.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity identification l.jpg
Polarity Identification

This car is really great

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity identification38 l.jpg
Polarity Identification

This car is really great

POS:

DT NN VB RR JJ

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity identification39 l.jpg
Polarity Identification

This car is really great

POS:

DT NN VB RR JJ

Lexical orientation:

0 0 0 0 +

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity identification40 l.jpg
Polarity Identification

This car is really great

POS:

DT NN VB RR JJ

Lexical orientation:

0 0 0 0 +

BNP BVP BADJP

Chunking:

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity identification41 l.jpg
Polarity Identification

This car is really great

POS:

DT NN VB RR JJ

Lexical orientation:

0 0 0 0 +

BNP BVP BADJP

Chunking:

(parsing):

Positive

Interpretation:

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity challenges l.jpg
Polarity Challenges

  • Methodological: ‘She told me she didn’t like it.’

  • Syntactic: ‘His cell phone works in some buildings, but it others it doesn’t.’

  • Valence:

    • ‘I told you I didn’t like it’,

    • ‘I heard you didn’t like it’,

    • ‘I didn’t tell you I liked it’,

    • ‘I didn’t hear you liked it’: man verbs (tell, hear, say, …) require semantic/functional information for polarity interpretation.

  • Association

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide43 l.jpg

Polarity Examples

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Polarity metric l.jpg
Polarity Metric

  • Function of counts of polar statements on a topic: f(size, f top, f top+pos, f top+neg)

  • Use empirical priors to smooth counts from observed counts (helps with low counts)

  • Use P/R of system to project true counts and provide error bars (requires labeled data)

  • Example: +/- ratio metric maps ratio to 0-10 score

© 2006 Nielsen BuzzMetrics, A VNU business affiliate



Slide46 l.jpg

Predicting Movie Sales from Blogger Sentiment

G. Mishne and N. Glance, “Predicting Movie Sales from Blogger Sentiment,” 2006 AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Blogger sentiment and impact on sales l.jpg
Blogger Sentiment and Impact on Sales

  • What we know:

    • There is a correlation between references to a product in the blogspace and its financial figures

      • Tong 2001: Movie buzz in Usenet is correlated with sales

      • Gruhl et. al.: 2005: Spikes in Amazon book sales follow spikes in blog buzz

  • What we want to find out:

    • Does taking into account the polarity of the references yield a better correlation?

  • Product of choice: movies

    • Methodology: compare correlation of references to sales with the correlation of polar references to sales

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Experiment l.jpg
Experiment

  • 49 movies

    • Budget > 1M$

    • Released between Feb. and Aug. 2005

    • Sales data from IMDB

      • “Income per Screen” = opening weekend sales / screens

  • Blog post collection

    • References to the movies in a 2-month window

    • Used IMDB link + simple heuristics

  • Measure:

    • Pearson’s-R between the Income per Screen and {references in blogs, positive/polar references in blogs}

    • Applied to various context lengths around the reference

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Results l.jpg
Results

Income per screen vs. positive references

  • For 80% of the movies, r > 0.75 for pre-release positive sentiment

  • 12% improvement compared with correlation of movie sales with simple buzz count (0.542 vs. 0.484)

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Conclusion l.jpg
Conclusion

  • The intersection of Social Media and Data/Text Mining algorithms presents a viable business opportunity set to replace traditional forms of market research/social trend analysis/etc.

    • Key elements include topic detection and sentiment mining.

  • The success of the blogosphere has driven interest in a distinct form of online content which has a long history but is becoming more and more visible.

  • The blogosphere itself is a fascinating demonstration of social content and interaction and will enjoy many applications of traditional and novel analysis.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Slide51 l.jpg

  • Internships: openings available for this summer

    • e-mail: Matthew.Hurst@buzzmetrics.com

  • Data set: weblog data for July 2005

    • e-mail: Natalie.Glance@buzzmetrics.com

  • 3rd Annual Workshop on the Weblogging Ecosystem

    • http://www.blogpulse.com/www2006-workshop

  • 1st International Conference on Weblogs on Social Media, March 2007

    • http://www.icwsm.com (under construction)

  • Company info

    • Company website: http://nielsenbuzzmetrics.com/

    • Blog search: http://www.blogpulse.com/

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Phrase finding l.jpg
Phrase Finding

  • Goal: find key phrases which discriminate between foreground corpus and background corpus

  • First step: KeyBigramFinder

    • Identifies phrases that score high in informativeness and phraseness

    • Informativeness: measure of ability to discriminate foreground from background

    • Phraseness: measure of collocation of consecutive words

© 2006 Nielsen BuzzMetrics, A VNU business affiliate


Phrase finding pipeline l.jpg
Phrase Finding Pipeline

  • Seeded by KeyBigramFinder

  • Sample pipeline

    • APrioriPhraseExpander: expands top N bigrams into longer phrases, adapting the APRIORI algorithm to text and features of text

    • ConstituentFinder: uses contextual evidence to identify noun phrases

  • Final list sorted either by frequency or informativeness score

© 2006 Nielsen BuzzMetrics, A VNU business affiliate