modeling political blog posts with response n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Modeling Political Blog Posts with Response PowerPoint Presentation
Download Presentation
Modeling Political Blog Posts with Response

Loading in 2 Seconds...

play fullscreen
1 / 34

Modeling Political Blog Posts with Response - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

Modeling Political Blog Posts with Response. Tae Yano Carnegie Mellon University taey@cs.cmu.edu IBM SMiLe Open House Yorktown Heights, NY October 8, 2009. Talk is about. How we are designing topic models for online political discussion. Political blogs .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Modeling Political Blog Posts with Response


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
modeling political blog posts with response

Modeling Political Blog Posts with Response

Tae Yano

Carnegie Mellon University

taey@cs.cmu.edu

IBM SMiLe Open House

Yorktown Heights, NY

October 8, 2009

talk is about
Talk is about

How we are designing topic models for online political discussion

political blogs
Political blogs

Why (should we) study political blogs?

  • An influential social phenomenon.
  • An important venue for civil discourse.
  • Blog text is relatively understudied.
  • Interest in text analysis from social/political science researchers
    • Monroe et al., 2009; Hopkins and King, 2009; many others
political blogs1
Political blogs

Why (should we) study political blogs?

A different / interesting type of text we don’t usually deal with in NLP

  • Spontaneous text: Often ungrammatical, copious misspelling and colloquialism
  • Elusive information needs (“popularity”, “influence”, “trustworthy”).
  • Difficult and costly in classical supervised approach.
  • The text is a composed of the mixture of diverse linguistic styles.
political blogs illustration1
Political blogs - Illustration

Posts are often coupled

with commentsections

Comment style is casual, creative,

less carefully edited

slide7

Political blogs - Illustration

Comments often meander

across several themes

“If the “President gets health care”

“Taxes and Fee”

On topic

“The rock that keeps things off the table”

Tangent

Ranting?

political blogs illustration2
Political blogs - Illustration

Posts tend to discuss multiple themes

House Republicans?

Government neglect?

Energy policy?

Oil companies?

slide9

Political blogs - Illustration

“I am in total agreement … In contrast … My understanding is….”

Comments can be constructive and formal

…or subjective and conversational

“ Iowa-Shiowa”

slide10

Political blogs - Illustration

Comments can be very long

“Absurd”

…or quite terse

slide11

Political blogs - Illustration

How should we approachthis sort of data?

Our approach is to treat it as an instance of Topic Modeling

Latent Dirichlet Allocation or LDA (Blei, Ng, and Jordan, 2003)

topic modeling
Topic modeling

What does this approach buy us?

  • Naturally express the idea that a text is comprised of several distinctive components:
    • A post and its reactions (comments)
    • A mixture of different themes within one post
    • Diverse personalstyles and petpeeves
  • A convenient choice for corpora with uncertainty
    • We can encode hypotheses, and have the model learn from data.
    • Modularity makes it easy to change the model
modeling political blogs

CommentLDA

Modeling political blogs

Our proposed political blog model:

z, z` = topic

w = word (in post)

w`= word (in comments)

u = user

D = # of documents; N = # of words in post; M = # of words in comments

modeling political blogs1

ß

d

a

zi

wi

Nd

D

CommentLDA

Modeling political blogs

Our proposed political blog model:

LHS is vanilla LDA

D = # of documents; N = # of words in post; M = # of words in comments

modeling political blogs2

CommentLDA

Modeling political blogs

RHS to capture the generation of reaction separately from the post body

Our proposed political blog model:

Two chambers share the same topic-mixture

Two separate sets of word distributions

D = # of documents; N = # of words in post; M = # of words in comments

modeling political blogs3

CommentLDA

Modeling political blogs

Our proposed political blog model:

User IDs of the commenters as a part of comment text

generate the words

in the comment section

D = # of documents; N = # of words in post; M = # of words in comments

modeling political blogs4

CommentLDA

Modeling political blogs

Three variations on user ID generation:

“Verbosity” (original model)

M = # of words in all comments

L = 1

“Comment frequency”

M = # of comments to the post

L = # of words in the comment

“Response”

M = # of participants to the post

L = # of words by one participant

L

slide18

:^)

Liberty

Democracy

Fraternity

Whatever

Think of this as encoding a hypothesis about which type of user ought to weigh more!

Equality

Commentfreq

….Liberty…

…Democracy…

….Fraternity…

…Equality…

…Whatever…

Verbosity

Response

modeling political blogs5

CommentLDA

Modeling political blogs

Another model we tried:

Took out the words from the comment section!

This is a model agnostic to the words in the comment section!

D = # of documents; N = # of words in post; M = # of words in comments

modeling political blogs6
Modeling political blogs

Another model we tried:

LinkLDA

(Erosheva et al, 2004)

The model is structurally (but not semantically) equivalent to the Link LDA from (Erosheva et al., 2004; Nallapati and Cohen, 2008)

D = # of documents; N = # of words in post; M = # of words in comments

topic discovery
Topic discovery

What topics did the models discover?

What differences are there between the post and comments?

  • Data sets: 5 major US blogs collected over a year - this data is available on our website (http://www.ark.cs.cmu.edu/blog-data).
  • Each site has 1000 to 2000 training posts; details about the data sets in Yano, Cohen, and Smith, 2009.
  • Inference is implemented with Gibbs sampling.
  • Following are some topics from Matthew Yglesias site.
comment prediction
Comment prediction

A guessing game:

Can we predict which users will react given an unseen post?

  • Infer the topic mixture for each test post using the fitted model
  • Rank users according to p(user | post, model) for each user
  • Envisioned useful for personalized blog filtering or recommendation system
comment prediction1

(MY)

27.54

20.54

14.83

12.56

CommentLDA (R,C)

(RS)

25.19

16.92

12.14

9.82

LinkLDA (R)

Comment prediction

CommentLDA performs consistently better for MY site, LinkLDA is a much better option for RS.

Does our model lack the expressive power to reflect site differences?

Our models perform at least as well as a word-based NB baseline

Precision at top 5, 10, 20, 30 user prediction

From left to right: Link LDA(-v, -r,-c) Comment LDA (-v, -r, -c)

comment prediction2
Comment prediction

Variation in user counting does make a difference.

Giving more weight to verbose users does not help for this task.

CommentLDA: (MY)

LinkLDA: (RS)

Verbosity vs. Response

From left to right: cut off n = 5,10, 20, and 30 top ranked users

future work
Future work

What forecasting task can our model do?

Using Comment LDA to predict the topics of the post given comments:

Useful for automatic text categorization or text search when post has no searchable text.

future work1
Future work

Can we automatically adjust how much the words influence the topics given the site?

  • Better comment prediction?
  • Inferential questions involving multiple sites

S

BG

future work2
Future work

Can we guess which posts will collect more responses (number of comments, volume of comments)?

  • A variant of SLDA (Blei and McAuliffe, 2007) with comments
  • Link LDA-type model also possible.

M

summary
Summary

Political blogs are an exciting new domain for language and learning research.

Topic modeling is a viable framework for analyzing the text of online political discussions.

It is convenient and competitive in tasks that have potential uses in real applications.

references
References
  • Our published version of this work includes a detailed profile of our data set, as well as more experiments.

http://www.aclweb.org/anthology/N/N09/N09-1054.pdf

  • Please refer back to the original LDA paper for the complete picture.

http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

  • The Gibbs sampling for LDA is detailed in Griffiths & Steyvers, 2004.

http://www.pnas.org/cgi/reprint/0307752101v1.pdf

  • Hierarchical Bayesian Compiler (HBC) used for Gibbs sampling:

http://www.cs.utah.edu/~hal/HBC

comment prediction3
Comment prediction

(MY)

20.54 %

Modest performance (16% to 32% precision), but compares favorably to the Naïve Bayes baseline

Comment LDA (R)

(RS)

(CB)

16.92 %

32.06 %

Link LDA (R)

Link LDA (C)

Precision at top 10 user prediction

From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB)