Modeling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 146

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms . Amr Ahmed Thesis Defense. The Infosphere. News Sources. Social Media. Research Publications. President Obama had an accident while playing a basketball match.

Download Presentation

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Modeling users and content structured probabilistic representation and scalable online inference algorithms

Modeling Users and Content:Structured Probabilistic Representationand Scalable Online Inference Algorithms

Amr Ahmed

Thesis Defense


Modeling users and content structured probabilistic representation and scalable online inference algorithms

The Infosphere


Modeling users and content structured probabilistic representation and scalable online inference algorithms

News Sources

Social Media

Research Publications

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

The Infosphere


Modeling users and content structured probabilistic representation and scalable online inference algorithms

The Infosphere

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

Soccer

Online inference

Car deals

Fashion


Modeling users and content structured probabilistic representation and scalable online inference algorithms

News Sources

Social Media

Research Publications

Thesis question


Modeling users and content structured probabilistic representation and scalable online inference algorithms

News Sources

Social Media

Research Publications

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

How to model users and content?

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

President Obama had an accident while playing a basketball match

Online inference

Car deals

Soccer

Fashion


Questions

Questions

What do we mean by Content?

What characterizes user and content?


Modeling users and content structured probabilistic representation and scalable online inference algorithms

ArXiv

Conference proceeding

Research Publications

Pubmed central

Journal transactions

Yahoo! news

CNN

Red state

Social Media

Blogs

Google news

Daily KOS

BBC


Modeling users and content structured probabilistic representation and scalable online inference algorithms

Multi-faceted nature

Temporal dynamics

Phy

Bio

CS

time

time

BP: “We will make this right."

Drill

explosion

“BP wasn't prepared for an oil spill at such depths”

Choice is a fundamental, constitutional right

Ban abortion with Constitutional amendment


What characterizes users

What Characterizes Users?

  • Long-term interests

    • Baseball

    • Graphical models

    • Music

  • Short-term interests

    • Buying a car

    • Getting a new camera

  • Spurious interests

    • What is the buzz about the oil spill?


Thesis question

Thesis Question

  • How to build a structured representation of Users and Content

    • Temporal Dynamics

      • How ideas/events evolve over time

      • How user interest change over time

    • Structural Correspondence

      • How ideas are addressed across modalities

        and communities

      • How to learn user interest from multimodal sources


Thesis approach

Thesis Approach

  • Models

    • Probabilistic graphical models

      • Topic models and Non-parametric Bayes

    • Principled, expressive and modular

  • Algorithms

    • Distributed

      • To deal with large-scale datasets

    • Online

      • To update the representation with new data


Outline

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • Research publications

    • User intents

  • Modeling multi-faceted Content

    • Ideological Perspective


What is a good model for documents

What is a Good Model for Documents?

  • Clustering

    • Mixture of unigram model

  • How to specify a model?

  • Generative process

    • Assume some hidden variables

    • Use them to generate documents

  • Inference

    • Invert the process

      • Given documents  hidden variables

f

p

K

ci

wi

N


Mixture of unigram

Mixture of Unigram

f1

fk

f

p

K

ci

wi

N

 pj

 pk

p1

wi

Generative Process

Is this a good model for documents?

  • For Document wi

    • Sample ci ~ Multi(p)

    • Sample wi~Mult(fci)

When is this a good model for documents?

  • When documents are single-topic

  • Not true in our settings


What do we need to model

0.6 0.3 0.1

MT Syntax Learning

Source

Target

SMT

Alignment

Score

BLEU

Parse

Tree

Noun

Phrase

Grammar

CFG

likelihood

EM

Hidden

Parameters

Estimation

argMax

What Do We Need to Model?

  • Q: What is it about?

  • A: Mainly MT, with syntax, some learning

A Hierarchical Phrase-Based Model

for Statistical Machine Translation

We present a statistical phrase-based

Translation model that uses hierarchical

phrases—phrases that contain sub-phrases.

The model is formally a synchronous

context-free grammar but is learned

from a bitext without any syntactic

information. Thus it can be seen as a

shift to the formal machinery of syntax

based translation systems without any

linguistic commitment. In our experiments

using BLEU as a metric, the hierarchical

Phrase based model achieves a relative

Improvement of 7.5% over Pharaoh,

a state-of-the-art phrase-based system.

Mixing Proportion

Topics

Unigram over vocabulary

Topic Models


Mixed membership models

Mixed-Membership Models

Prior

f1

fk

q

Generative Process

  • For each document d

    • Sample qd~Prior

    • For each word w in d

      • Sample z~Multi(qd)

      • Sample w~Multi(fz)

z

f

w

K

N

D

 qj

 qk

q1

wi

A Hierarchical Phrase-Based Model

for Statistical Machine Translation

We present a statistical phrase-based

Translation model that uses hierarchical

phrases. Thus it can be seen as a

shift to the formal machinery of syntax

based translation systems without any

linguistic commitment. In our experiments

using BLEU as a metric, the hierarchical

Phrase based model achieves a relative

Improvement of 7.5% over Pharaoh,

a state-of-the-art phrase-based system.


Outline1

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • Research publications

    • News

    • User intents

  • Modeling multi-faceted Content

    • Ideological Perspective


Chinese restaurant process crp

Chinese Restaurant Process (CRP)

  • Allows the number of mixtures to grow with the data

  • Also called non-parametric

    • Means the number of effective parameters grow with data

    • Still have hyper-parametersthat control the rate of growth

      • a:how fast a new cluster/mixture is born?

      • G0: Prior over mixture component parameters


The chinese restaurant process

The Chinese Restaurant Process

f1

f2

f3

Generative Process

  • For data point xi

    • Choose table j Njand Sample xi ~ f(fj)

    • Choose a new table K+1  a

      • Sample fK+1 ~ G0 and Sample xi ~ f(fK+1)

The rich gets richer effect

CANNOT handle sequential data


Recurrent crp rcrp ahmed and xing 2008

Recurrent CRP (RCRP) [Ahmed and Xing 2008]

  • Adapts the number of mixture components over time

    • Mixture components can die out

    • New mixture components are born at any time

    • Retained mixture components parametersevolve according to a Markovian dynamics


Recurrent crp rcrp

Recurrent CRP (RCRP)

  • Three equivalent constructions (see [Ahmed & Xing 2008])

Infinite limit of

fixed-dimensional

dynamic model.

Recurrent Chinese Restaurant

Time-dependent random measures


The recurrent chinese restaurant process

The Recurrent Chinese Restaurant Process

  • The restaurant operates in epochs

  • The restaurant is closed at the end of each epoch

  • The state of the restaurant at time epoch tdepends on that at time epoch t-1

    • Can be extended to higher-order dependencies.


The recurrent chinese restaurant process1

The Recurrent Chinese Restaurant Process

T=1

Dish eaten at table 3 at time epoch 1

OR the parameters of cluster 3 at time epoch 1

f1,1

f2,1

f3,1

Generative Process

  • Customers at time T=1 are seated as before:

    • Choose table j Nj,1 and Sample xi ~ f(fj,1)

    • Choose a new table K+1 a

      • Sample fK+1,1 ~ G0 and Sample xi ~ f(fK+1,1)


The recurrent chinese restaurant process2

The Recurrent Chinese Restaurant Process

f1,1

f1,1

f2,1

f2,1

f3,1

f3,1

T=1

N2,1=3

N3,1=1

N1,1=2

T=2


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,1

f2,1

f3,1

f1,1

f2,1

f3,1

T=2

N2,1=3

N3,1=1

N1,1=2


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,1

f2,1

f3,1

f1,1

f2,1

f3,1

T=2

N2,1=3

N3,1=1

N1,1=2


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,1

f2,1

f3,1

f1,1

f2,1

f3,1

T=2

N2,1=3

N3,1=1

N1,1=2


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,2

f2,1

f3,1

f1,1

f2,1

f3,1

T=2

N2,1=3

N3,1=1

N1,1=2

Sample f1,2 ~ P(.| f1,1)


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,2

f2,1

f3,1

f1,1

f2,1

f3,1

T=2

N2,1=3

N3,1=1

N1,1=2

And so on ……


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,2

f2,2

f3,1

f1,1

f2,1

f3,1

f4,2

T=2

N2,1=3

N3,1=1

N1,1=2

Died out cluster

Newly born cluster

At the end of epoch 2


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,1

f2,1

f3,1

f1,2

f1,2

f2,2

f2,2

f3,1

f4,2

f4,2

T=2

N2,1=3

N3,1=1

N1,1=2

N1,2=2

N2,2=2

N4,2=1

T=3


Modeling users and content structured probabilistic representation and scalable online inference algorithms

æ

ö

-

w

W

å

ç

÷

e

N

l

-

k

,

t

w

è

ø

=

w

1

RCRP

  • Can be extended to model higher-order dependencies

  • Can decay dependencies over time

    • Pseudo-counts for table k at time t is

History size

Number of customers sitting

at table K at time epoch t-w

Decay factory


Modeling users and content structured probabilistic representation and scalable online inference algorithms

T=1

f1,1

f2,1

f3,1

f1,2

f1,2

f2,2

f2,2

f3,1

f4,2

f4,2

T=2

N2,1=3

N3,1=1

N1,1=2

N2,3

T=3

æ

ö

-

w

W

å

ç

÷

e

N

l

N2,3 =

-

k

,

t

w

è

ø

=

w

1


Modeling users and content structured probabilistic representation and scalable online inference algorithms

RCRP

  • Can be extended to model higher-order dependencies

  • Can decay dependencies over time

    • Pseudo-counts for table k at time t is

  • (W, l, a) can generate interesting clustering configurations


Tdpm generative power

TDPM Generative Power

DPM

W=T

l = 

Power-law

curve

TDPM

W=4

l = .4

Independent DPMs

W= 0

l = ? (any)


Outline2

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Modeling temporal dynamics

Modeling Temporal Dynamics

RCRP

Infinite storylines

from streaming text

Evolution of

research ideas

Online scalable inference

Dynamic user interests

Online distributed inference


Outline3

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Understanding the news

Understanding the News

  • Clustering

    • Group similar articles together

  • Classification

    • High-level topics like sports and politics

  • Analysis

    • How a story develops over time

    • Who are the main entities

  • Challenges

    • Large scale and online

      • Almost one document per second


A unified model

A Unified Model

  • Jointly solves the three main tasks

    • Clustering,

    • Classification

    • Analysis

  • Building blocks

    • A Topic model

      • High-level concepts (unsupervised classification)

    • Dynamic clustering (RCRP)

      • Discover tightly-focused concepts

        • Named entities

        • Story developments


Dynamic clustering

Dynamic Clustering

  • Recurrent Chinese restaurant process (RCRP)

    • Discovers time-sensitive stories

Generative Process

  • For each document wd at time t

    • Sample wd ~ Multinomial(bs)

priors

Stories’ trend + prior at time t


Infinite dynamic cluster topic hybrid

Infinite Dynamic Cluster-Topic Hybrid

Politics

Government

Minister

Authorities

Opposition

Officials

Leaders

group

Accidents

Police

Attack

run

man

group

arrested

move

Sports

games

Won

Team

Final

Season

League

held

UEFA-soccer

Tax-Bill

Champions

Goal

Coach

Striker

Midfield

penalty

Juventus

AC Milan

Lazio

Ronaldo

Lyon   

Tax

Billion

Cut

Plan

Budget

Economy

Bush

Senate

Fleischer

White House

Republican

g


Infinite dynamic cluster topic hybrid1

Infinite Dynamic Cluster-Topic Hybrid

Politics

Government

Minister

Authorities

Opposition

Officials

Leaders

group

Accidents

Police

Attack

run

man

group

arrested

move

Sports

games

Won

Team

Final

Season

League

held

UEFA-soccer

Tax-Bill

Border-Tension

Champions

Goal

Coach

Striker

Midfield

penalty

Juventus

AC Milan

Lazio

Ronaldo

Lyon   

Tax

Billion

Cut

Plan

Budget

Economy

Nuclear

Border

Dialogue

Diplomatic

militant

Insurgency

missile

Bush

Senate

Fleischer

White House

Republican

Pakistan

India

Kashmir

New Delhi

Islamabad

Musharraf

Vajpayee

g


The graphical model

The Graphical Model

Tightly-focuses

High-level concepts


The graphical model1

The Graphical Model

Tightly-focuses

High-level concepts


The graphical model2

The Graphical Model

  • Each story has:

  • Distribution over words

  • Distribution over topics

  • Distribution over named entites


The graphical model3

The Graphical Model

  • Document’s mixing-vector is sampled from its story prior

  • Words inside a document can either come form global topics or the story specific topic


The generative process

The Generative Process


The generative process1

The Generative Process


The generative process2

The Generative Process


The generative process3

The Generative Process


Outline4

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

      • Online inference

      • Experiments

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Online inference algorithm

Online Inference Algorithm

  • A Particle filtering algorithm

  • Each particle maintains a hypothesis

    • What are the stories

    • Document-story associations

    • Topic-word distributions

  • Collapsed sampling

    • Sample (zd,sd) only for each document


Particle filter representation

Particle Filter Representation


Particle filter algorithm

Particle Filter Algorithm

  • sand z are tightly coupled

  • Alternative to MCMC

    • Sample s then sample z (high variance)

    • Sample z then sample s (doesn’t make sense)

  • Idea (following a similar trick by Jain and Neal)

    • Run a few iterations of MCMC over s and z

    • Take last sample as the proposed value

Fold the document into the structure of each filter


Particle filter algorithm1

Particle Filter Algorithm

How good each filter look now?


Particle filter algorithm2

Particle Filter Algorithm

Get rid of bad filter

Replicate good one


Particle filter algorithm3

Particle Filter Algorithm

Get rid of bad filter

Replicate good one


Mcmc over a given document

MCMC over a given document

  • Sample z

How likely k is to

generate word w

How likely k is in

document td

  • C is co- occurrence counts

  • Same as in LDA but with Story-specific prior


Mcmc over a given document1

MCMC over a given document

  • Sample s

is the set of words in document td generated from the story-specific topic (topic K+1)

Document td

entities

w

w

w

w

w

w

z

z

z

z

z

z


Mcmc over a given document2

MCMC over a given document

  • O(|S| |N|)  very slow

  • Solution: use a proposal to sample s

Document td

entities

w

w

w

w

w

w

z

z

z

z

z

z


A proposal to sample s

A proposal to sample s

Computed Once

(does not depend on z)

Expensive calculation only

Computed twice per iteration

Document td

entities

w

w

w

w

w

w

z

z

z

z

z

z


Mcmc algorithm

MCMC Algorithm


Efficient computation and storage

Efficient Computation and Storage

  • Particles get replicated

    • Use thread-safe Inheritance tree

    • Inverted representation for fastlookup

India: story 5

Pakistan: story 1

1

Root

India: story 1,

US: story 2, story 3

3

1

3

Bush: story 2

India: story 3

1

2

(empty)

Congress: story 2

1

3

2


Efficient computation and storage1

Efficient Computation and Storage

  • Why this is useful?

  • Only focus on stories that mention at least one entity

    • Otherwise pre-compute and reuse


Hyperparameters

Hyperparameters

  • Make a huge different

  • Optimize

  • Optimize

  • Optimization carried every 200 documents


Outline5

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

      • Online inference

      • Experiments

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Experiments

Experiments

  • Yahoo! News datasets over two months

    • Three sub-sampled sets with different characteristics

  • Editorially-labeled documents

    • Cannot-like and must-link pairs

  • Performance measures using clustering accuracy

  • Baseline

    • A strong single-link offlineclustering algorithm

      • Scaled with LSH to compute neighborhood graph (similar to Petrovic 2010)


Structured browsing

Structured Browsing

Politics

Government

Minister

Authorities

Opposition

Officials

Leaders

group

Accidents

Police

Attach

run

man

group

arrested

move

Sports

games

Won

Team

Final

Season

League

held

Border-Tension

Tax-bills

UEFA-soccer

Pakistan

India

Kashmir

New Delhi

Islamabad

Musharraf

Vajpayee

Nuclear

Border

Dialogue

Diplomatic

militant

Insurgency

missile

Bush

Senate

US

Congress

Fleischer

White House

Republican

Juventus

AC Milan  Real Madrid Milan

Lazio Ronaldo

Lyon   

Tax

Billion

Cut

Plan

Budget

Economy

lawmakers

Champions

Goal

Leg

Coach

Striker

Midfield

penalty


Structured browsing1

Structured Browsing

Border-Tension

Pakistan

India

Kashmir

New Delhi

Islamabad

Musharraf

Vajpayee

Nuclear

Border

Dialogue

Diplomatic

militant

Insurgency

missile

More Like India-Pakistan story

Based on topics

Nuclear+ topics [politics]

Middle-east-conflict

Nuclear programs

North Korea

South Korea

U.S

Bush

Pyongyang

Israel Palestinian

West bank

Sharon

Hamas

Arafat

Nuclear

summit

warning

policy

missile

program

Peace

Roadmap

Suicide

Violence

Settlements

bombing


Quantitative evaluation

Quantitative Evaluation

Number of topics = 100

Effect of number of topics


Scalability

Scalability


Optimization

Optimization

  • Fix = .01

  • Optimized

    • With accuracy = 0.8289


Effect of number of particles

Effect of Number of Particles

  • Usually when we care about “structure” = mode, then few particles are good enough

  • Also a good indication of the efficiency of the MCMC-based proposal


Model contribution

Model Contribution

  • Named entities are very important

  • Removing time increase processing up to 2 seconds per document


Modeling temporal dynamics1

Modeling Temporal Dynamics

RCRP

Infinite storylines

from streaming text

Evolution of

research ideas

Online scalable inference

Dynamic user interests

Online distributed inference


Outline6

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

      • Model

      • Online Distributed inference

      • Experiments

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Modeling users

Modeling Users

  • Understand user intents from user interactions

    • What is a good model?

  • Capture how intents change over time

    • Short-term vs. long-term

  • Learn those dynamic intents from user interactions

    • Online and large scale algorithms

Dimensions


Problem formulation

Problem formulation

Car

Deals

van

job

Hiring

diet


Problem formulation1

Problem formulation

Car

Deals

van

Auto

Price

Used

inception

Hiring

Salary

Diet

calories

job

Hiring

diet

Flight

London

Hotel

weather


Problem formulation2

Problem formulation

Car

Deals

van

Movies

Theatre

Art

gallery

Auto

Price

Used

inception

Hiring

Salary

Diet

calories

job

Hiring

diet

Flight

London

Hotel

weather


Problem formulation3

Problem formulation

Car

Deals

van

Movies

Theatre

Art

gallery

Auto

Price

Used

inception

Diet

Calories

Recipe

chocolate

Hiring

Salary

Diet

calories

job

Hiring

diet

Flight

London

Hotel

weather

School

Supplies

Loan

college


Problem formulation4

Problem formulation

Art

CARS

Car

Deals

van

Movies

Theatre

Art

gallery

Auto

Price

Used

inception

Jobs

Diet

Calories

Recipe

chocolate

Diet

Hiring

Salary

Diet

calories

job

Hiring

diet

Travel

Flight

London

Hotel

weather

School

Supplies

Loan

college

finance

College


Problem formulation5

Problem formulation

Input

  • Queries issued by the user or Tags of watched content

  • Snippet of page examined by user

  • Time stamp of each action (day resolution)

Output

  • Users’ daily distribution over intents

  • Dynamic intent representation

Travel

Flight

London

Hotel

weather

School

Supplies

Loan

college

finance

College


Formulation as a mixed membership models

Formulation as a Mixed-Membership Models

  • Job Hiring

  • speed price

  • part-time Camry

  • Career opening

  • bonus package

  • card diet calories

  • loan recipe milk

  • Weight lb kg

Objects

Degree of

membership

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Mixtures

Diet

Job

Cars

Finance


Mixed membership models1

Mixed-Membership Models

  • Job Hiring

  • speed price

  • part-timeCamry

  • Career opening

  • bonus package

  • carddiet calories

  • loanrecipe milk

  • Weight lb kg

Objects

Degree of

membership

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Mixtures

Diet

Job

Cars

Finance


Temporal user model

Temporal User Model

  • Generative process

    • Polya-Urn representation

    • Hierarchical Recurrent Chinese restaurant process


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Global topics trends

Food Chicken

Topic

word-distributions

User-specific topics trends

(mixing-vector)

Car speed offer

camryaccordcareer

User interactions: queries, keyword from pages viewed


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

………

Generative Process

Car speed offer

camryaccordcareer

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from the topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample word from the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza

………

Generative Process

Car speed offer

camryaccordcareer

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from the topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample word from the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza

………

Generative Process

Car speed offer

camryaccordcareer

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from the topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample word from the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza hiring

………

Generative Process

Car speed offer

camryaccordcareer

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from the topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample word from the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza millage

………

Generative Process

Car speed offer

camryaccordcareer

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample from word the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

At time t+1

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza millage

= *

Pseudo counts

Decay factor

Car speed offer

camryaccordcareer

Observation 1

  • Popular topics at time t are likely to be popular at time t+1

  • fk,t+1 is likely to smoothly evolve from fk,t


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

At time t+1

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza millage

Car

Altima

Accord

Book

Kelley

Prices

Small

Speed

Intuition

Captures current trend of the car industry

(new release for e.g.)

~

Car speed offer

camryaccordcareer

fk,t

fk,t+1 ~ Dir(bk,t+1)

Observation 1

  • Popular topics at time t are likely to be popular at time t+1

  • fk,t+1is likely to smoothly evolve from fk,t


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

At time t+1

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Altima

Accord

Blue

Book

Kelley

Prices

Small

Speed

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

pizza millage

How do we get a prior that captures both long and short term interest?

Car speed offer

camryaccordcareer

Observation 2

  • User prior at time t+1 is a mixture of the user short and long term interest


Modeling users and content structured probabilistic representation and scalable online inference algorithms

All

μ3

month

μ2

week

Long-term

μ

short-term

Prior for user actions at time t

food

chicken

Pizza

millage

Food

Chicken

pizza

Part-time

Opening

salary

Kelly

recipe

cuisine

recipe

job

hiring

t t+1

Time

Diet

Job

Cars

Finance

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

At time t+1

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Altima

Accord

Blue

Book

Kelley

Prices

Small

Speed

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Food Chicken

Pizza millage

short-term

priors

Car speed offer

camryaccordcareer

Generative Process

  • For each user interaction

    • Choose an intent from local distribution

      • Sample word from the topic’s word-distribution

    • Choose a new intent  a

      • Sample a new intent from the global distribution

        • Sample word from the new topic word-distribution


Modeling users and content structured probabilistic representation and scalable online inference algorithms

At time t

At time t+1

At time t+2

At time t+3

Global

process

m

m'

n

User 1

process

n'

User 2

process

User 3

process


The graphical model4

The Graphical Model

~

~

~

~

At time t

At time t+1

~

~


The graphical model5

The Graphical Model

~

~

~

~

Car

Blue

Book

Kelley

Prices

Small

Speed

large

Car

Altima

Accord

Book

Kelley

Prices

Small

Speed

~

~


The graphical model6

The Graphical Model

~

~

~

~

Food Chicken

Pizza millage

~

~


The graphical model7

The Graphical Model

~

~

~

~

~

~


The graphical model8

The Graphical Model

~

~

~

~

Topics evolve over time?

User’s intent evolve over time?

Capture long and term interests of users?

~

~

Large-scale: can it handle millions of users?

Can inference be done online?


Outline7

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

      • Model

      • Online Distributed inference

      • Experiments

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Work flow

Work Flow

today

System

state

User interactions

User interactions

Daily

Update

(inference)

new

Users’ models

User interactions

User interactions

User interactions

Current

Users’ models

tens of millions


Online scalable inference

Online Scalable Inference

  • Online algorithm

    • Greedy 1-particle filtering algorithm

    • Works well in practice

    • Collapse all multinomials except Ωt

      • This makes distributed inference easier

    • At each time t:

  • Distributed scalable implementation

    • Used [Smola and Nor. VLDB10] as a subroutine

    • Added synchronous sampling capabilities

~

~

~

~

~

~


Distributed inference

Distributed Inference

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Shared memory (memcached)

Car speed offer

camryaccordcareer

Food Chicken

Pizza millage

client

client


Distributed inference1

Distributed Inference

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Shared memory (memcached)

Ωt

Ωt

Ωt

Car speed offer

camryaccordcareer

Food Chicken

Pizza millage

client

client


Sampling equations

Sampling Equations

  • Sample:

  • Sample z:

~

~

~

~

How likely is topic k given current and prior state

of the user

How likely is it to generate word w from topic k

~

~


Sampling equations1

Sampling Equations

  • Sample:

  • Sample Ωt :

    • Use auxiliary-variable methods

    • Introduce auxiliary variable mkt

      • How many times the global distribution was visited

        by any user while sampling Z

    • ~ AnotniaK

~

~

~

~

~

~


Distributed inference2

Distributed Inference

Recipe

Chocolate

Pizza

Food

Chicken

Milk

Butter

Powder

Car

Blue

Book

Kelley

Prices

Small

Speed

large

job

Career

Business

Assistant

Hiring

Part-time

Receptionist

Bank

Online

Credit

Card

debt portfolio

Finance

Chase

Shared memory (memcached)

Ωt

Ωt

Ωt

Car speed offer

camryaccordcareer

Food Chicken

Pizza millage

client

client


Distributed sampling cycle

Distributed Sampling Cycle

Sample Z

For users

Sample Z

For users

Sample Z

For users

Sample Z

For users

~

~

Sample Ωt

Requires a reduction step

Write counts to

memcached

Write counts to

memcached

Write counts to

memcached

Write counts to

memcached

~

~

Barrier

Collect counts and sample W

Do nothing

Do nothing

Do nothing

Barrier

~

~

Read W from memcached

Read W from memcached

Read W from memcached

Read W from memcached


Distributed sampling cycle1

Distributed Sampling Cycle

Sample Z

For users

Sample Z

For users

Sample Z

For users

Sample Z

For users

~

~

Write counts to

memcached

Write counts to

memcached

Write counts to

memcached

Write counts to

memcached

~

~

Barrier

Collect counts and sample W

Do nothing

Do nothing

Do nothing

Barrier

~

~

Read W from memcached

Read W from memcached

Read W from memcached

Read W from memcached

Used a reverse-sense barrier algorithm whose delay is O(log number of clients)


Outline8

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

      • Model

      • Online Distributed inference

      • Experiments

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Experimental results

Experimental Results

  • Tasks is predicting convergence in display advertising

  • Use two datasets

    • 6 weeks of user history

    • Last week responses to Ads are used for testing

  • Baseline:

    • User raw data as features

    • Static topic model


Datasets

Datasets


Performance in display advertising

Performance in Display Advertising

ROC

Number of conversions


Performance in display advertising1

Performance in Display Advertising

ROC

Number of conversions


Performance in display advertising2

Performance in Display Advertising

Weighted ROC measure

Effect of number of topics


Scalability1

Scalability

Time per iteration in minutes

Used a reverse-sense barrier algorithm whose delay is O(log number of clients)


Interpretability

Interpretability

  • What does the model learn?

User-1

User-2


Outline9

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Modeling research publication

Modeling Research Publication

Global Menu T=1

Global Menu T=2

f5,1

f1,1

f1,2

f2,1

f3,1

f4,1

f 6,2

f2,2

f3,2

Epoch 1

Epoch 2

The recurrent Chinese franchise process UAI 2010


Modeling users and content structured probabilistic representation and scalable online inference algorithms

SOM

1987

1991

1990

1995

1994

1996

ICA

boosting

speech

RL

Memory

Neuro

sience

Bayesian

Kernels

Mixtures

NN

Generalizatoin

Classification

Classification

Clustering

Methods

Control

Control

PM

Prob. Models

image

speech

Kernels

Mixtures

ICA


Modeling users and content structured probabilistic representation and scalable online inference algorithms

1990

1996

1999

1993

1987

variables graph

tree probability field structure node distribution energy

probability variables tree field distribution graph nodes belief node inference propagation

field

code temperature tree boltzmann energy annealing node probability

field

tree

level

energy probability node annealing boltzmann variables

tree

variables node

level probability field distribution structure graph energy

PM

1994

Mixtures

1999

1990

1995

1999

ICA

mixture

em

likelihood missing experts mixtures gaussian parameters

mixture gaussian

em

likelihood parameters analysis density factor variables distribution

em

expert mixture gating missing experts gaussian parameters density

wavelet natural separation source

ica coefficients independent basis

source

ica

blind separation coefficients natural independent basis wavelet

matrix algorithms gradient convergence equation optimal method parameter

method solution energy

values gradient convergence equation algorithms

gradient weight method methods

local rate optimal descent solution

gradient matrix

weight algorithms local rate problems point equation

Methods


Modeling users and content structured probabilistic representation and scalable online inference algorithms

1996

1997

1998

1999

support kernel

svm regularization sv

vectors feature regression

kernel support

sv

svm machines regression vapnik feature solution

kernel support

Svm regression feature machines solution margin pca

Kernel svm

support regression solution machines matrix feature regularization

Kernels

  • Uniqueness of the SVM Solution,C. Burges and D.. Crisp

  • An Improved Decomposition Algorithm for Regression Support Vector Machines,P. Laskov

  • ..... Many more

  • Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing,V.Vapnik, S. E. Golowich and A.Smola

  • Support Vector Regression MachinesH. Drucker, C. Burges, L. Kaufman, A. Smola and V. Vapnik

  • Improving the Accuracy and Speed of Support Vector Machines,C. Burges and B. Scholkopf

  • From Regularization Operators to Support Vector Kernels,A. Smola and B. Schoelkopf

  • Prior Knowledge in Support Vector Kernels,B. Schoelkopf, P. Simard, A. Smola and V.Vapnik


Outline10

Outline

  • Background

    • Mixed-membership Models

  • Recurrent Chinese Restaurant Process

  • Modeling Temporal Dynamics

    • News

    • User intents

    • Research publications

  • Modeling multi-faceted Content

    • Ideological Perspective


Problem statement

Problem Statement

Given

Builds a model that could

answer following

Visualization

  • How does each ideology view mainstream events?

  • On which topics do they differ?

  • On which topics do they agree?


Problem statement1

Problem Statement

Given

Builds a model that could

answer following

Classification

  • Given a new news article or a blog post, the system should deice:

    • From which side it was written

    • Justify its answer on a topical level

      • E.g. because its view on abortion coincides with the pro-choice stance


Problem statement2

Problem Statement

Given

Builds a model that could

answer following

Structured browsing

  • Given a new news article or a blog post, the user can ask for :

    • Examples of other articles from the same ideology about the same topic

    • Documents that could exemplify alternative views from other ideologies


Approach build a factored model

Approach: Build a Factored Model

b1

f1,2

W1

f1,1

f1,k

f2,1

f2,k

f2,2

bk-1

W2

b1

bk

Ideology 1

Views

Ideology 2

Views

Topics


Graphical model

Graphical Model

b2

f1,2

W1

f1,1

f1,k

bk-1

bk

b1

W2

f2,1

f2,k

f2,2

Ideology 1

Views

Ideology 2

Views

Topics

l

l

1-l

1-l


Datasets1

Datasets

  • Bitterlemons:

    • Middle-east conflict, document written by Israeli and Palestinian authors.

    • ~300 documents form each view with average length 740

    • Multi author collection

    • 80-20 split for test and train

  • Political Blog-1:

    • American political blogs (Democrat and Republican)

    • 2040 posts with average post length = 100 words

    • Follow test and train split as in (Yano et al., 2009)

  • Political Blog-2 (test generalization to a new writing style)

    • Same as 1 but 6 blogs, 3 from each side

    • ~14k posts with ~200 words per post

    • 4 blogs for training and 2 blogs for test


Example bitterlemons corpus

Example: Bitterlemons corpus

US role

arafatstateleaderroadmapelection month iraqyasirsenior involvement clintonterrorism

bush US president americansharon administration prime pressure policy washington

powellminister colin visit internal policy statement express pro previous package work transfer european

Palestinian

View

Israelie

View

Roadmap process

palestinianisraeli

peace

year political process state

end

right government need conflict

way

security

palestinianisraeli

Peace

political occupation process

end security conflict

way government people

time year

force negotiation

process force terrorism unit provide confidence element interim discussion union succee point build positive recognize present timetable

roadmap phase security ceasefire state plan international step authority

end settlement implementation obligation stop expansion commitment fulfill unit illegal present previous assassination meet forward

Arab Involvement

peace strategic plohizballahislamic neighbor territorial radical iran relation think obvioucountri mandate greater conventional intifada affect jihad time

syriasyrian negotiate lebanon deal conference concession asad agreement regional october initiative relationship

track negotiation official leadership position withdrawal time victory present second stand circumstance represent sense talk strategy issue participant parti negotiator


Classification

Classification


Generalization to new blogs

Generalization to New Blogs


Getting alternative view

Getting Alternative View

  • Given a document written in one ideology, retrieve the equivalent

  • Baseline: SVM + cosine similarity


Can we use unlabeled data

Can We use Unlabeled data?

  • In theory this is simple

    • Add a step that samples the document view (v)

    • Doesn’t mix in practice because tight coupling between v and (x1,x2,z)

  • Solution

    • Sample v and (x1,x2,z) as a block using a Metropolis-Hasting step

    • This is a huge proposal!


Metropolis hasting algorithm

Metropolis-Hasting Algorithm

  • Approach

    • Construct V proposals: one for each possible view of the document

    • Get a sample from the above proposal using a restricted Gibbs scan

    • Sample a view uniformly and use its proposal to get sample v* and (x*1,x*2,z*)

    • Computer acceptance ratio as :

    • This is just the likelihood ratio test!

    • Accept the proposal with probability r


Results

Results

Evaluation

  • Using Blog-2

    • Use R% of the labels in training data and the rest (100-R)% unlabeled

    • Test classification performance by varying R

Note that to get the same performance of the semi-supervised model, one has to roughly double the number of labels!


Summary

Summary

  • Bayesian models are flexible framework

  • Very useful if you

    • Care about the hidden structure

    • Want to leverage the hidden structure in tasks for which you have few labels

    • Have partially labeled data

  • Bayesian and Hierarchical models are not slow

    • It can be scaled

    • Can be made to work online


Main contributions

Main Contributions

  • Models

    • Time-varying non-parametric framework

  • Inference

    • Distributed incremental inference algorithms

    • Online SMC algorithms

  • Applications

    • In research publications

    • Social media

    • User modeling


Work done in phd but not in thesis

Work done in PhD but not in thesis

  • Inferring the evolution of latent dynamic networks Modeling [PNAS 2009]

  • Semi-supervised Learning in Deep Architectures [ECCV 2008]


Thanks

Thanks!

Questions?


  • Login