Empirical Development of an Exponential Probabilistic Model

Download Presentation

Empirical Development of an Exponential Probabilistic Model

Loading in 2 Seconds...

- 57 Views
- Uploaded on
- Presentation posted in: General

Empirical Development of an Exponential Probabilistic Model

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Empirical Development of anExponential Probabilistic Model

Using Textual Analysis to Build a Better Model

Jaime Teevan & David R. Karger

CSAIL (LCS+AI), MIT

- Generative v. discriminative model
- Applies to many applications
- Information retrieval (IR)
- Relevance feedback
- Using unlabeled data

- Classification

- Information retrieval (IR)
- Assumptions explicit

Hyper-learn

- Define model
- Learn parameters from query
- Rank documents

- Better model improves applications
- Trickle down to improve retrieval
- Classification, relevance feedback, …

- Corpus specific models

- Related work
- Probabilistic models
- Example: Poisson Model
- Compare model to text

- Hyper-learning the model
- Exponential framework
- Investigate retrieval performance

- Conclusion and future work

- Using text for retrieval algorithm
- [Jones, 1972], [Greiff, 1998]

- Using text to model text
- [Church & Gale, 1995], [Katz, 1996]

- Learning model parameters
- [Zhai & Lafferty, 2002]

Hyper-learn the model from text!

- Rank documents by RV =Pr(rel|d)
- Naïve Bayesian models

RV =Pr(rel|d)

- Rank documents by RV =Pr(rel|d)
- Naïve Bayesian models

# occs in doc

= Pr(dt|rel)

features t

RV =Pr(rel|d)

Pr(d|rel)

8

words

- Open assumptions
- Feature definition
- Feature distribution family

Defines the model!

- Define model
- Learn parameters from query
- Rank documents

- Define model
- Learn parameters from query
- Rank documents

Pr(dt|rel) =

- Define model
- Learn parameters from query
- Rank documents

- Poisson Model
- θ: specifies term distribution

dt

-θ

θ e

Pr(dt|rel) =

dt!

+

θ=0.0006

Pr(dt|rel)

Pr(dt|rel)≈1E-15

Term occurs exactlydt times

- Define model
- Learn parameters from query
- Rank documents

- Learn a θ for each term
- Maximum likelihood θ
- Term’s average number of occurrence

- Incorporate prior expectations

- Define model
- Learn parameters from query
- Rank documents

- Define model
- Learn parameters from query
- Rank documents

- For each document, find RV
- Sort documents by RV

= Pr(dt|rel). words t

RV

- Define model
- Learn parameters from query
- Rank documents

Which step goes wrong?

- For each document, find RV
- Sort documents by RV

= Pr(dt|rel). words t

RV

- Define model
- Learn parameters from query
- Rank documents

- Define model
- Learn parameters from query
- Rank documents

dt

-θ

θ e

Pr(dt|rel) =

dt!

+

θ=0.0006

Pr(dt|rel)

15 times

Term occurs exactlydt times

+

θ=0.0006

Pr(dt|rel)

Misfit!

15 times

Term occurs exactlydt times

Hyper-learning a Better FitThrough Textual Analysis

Using an Exponential Framework

- Need framework for hyper-learning

Mixtures

Poisson

Bernoulli

Normal

- Need framework for hyper-learning
- Goal: Same benefits as Poisson Model
- One parameter
- Easy to work with (e.g., prior)

Mixtures

Poisson

Bernoulli

Normal

One parameter exponential families

- Well understood, learning easy
- [Bernardo & Smith, 1994], [Gous, 1998]
Pr(dt|rel) = f(dt)g(θ)e

- [Bernardo & Smith, 1994], [Gous, 1998]
- Functions f(dt) and h(dt) specify family
- E.g., Poisson: f(dt) = (dt!)-1,h(dt) = dt

- Parameter θ term’s specific distribution

θh(dt)

- Define model
- Learn parameters from query
- Rank documents

- Hyper-learn model
- Learn parameters from query
- Rank documents

- Hyper-learn model
- Learn parameters from query
- Rank documents

- Want “best” f(dt) and h(dt)
- Iterative hill climbing
- Local maximum
- Poisson starting point

- Hyper-learn model
- Learn parameters from query
- Rank documents

- Data: TREC query result sets
- Past queries to learn about future queries

- Hyper-learn and test with different sets

+

Pr(dt|rel)

15 times

Term occurs exactlydt times

+

h(dt)

Pr(dt|rel) =f(dt)g(θ)e

θh(dt)

dt

Hyper-learned Model - h(dt)

+

h(dt)

Pr(dt|rel) =f(dt)g(θ)e

θh(dt)

dt

+

Pr(dt|rel)

15 times

Term occurs exactlydt times

Hyper-learned Distribution

+

Pr(dt|rel)

15 times

Term occurs exactlydt times

Hyper-learned Distribution

+

Pr(dt|rel)

5 times

Term occurs exactlydt times

Hyper-learned Distribution

+

Pr(dt|rel)

30 times

Term occurs exactlydt times

Hyper-learned Distribution

+

Pr(dt|rel)

300 times

Term occurs exactlydt times

- Hyper-learn model
- Learn parameters from query
- Rank documents

Labeled docs

- Hyper-learn model
- Learn parameters from query
- Rank documents

θh(dt)

Pr(dt|rel) = f(dt)g(θ)e

- Learn θ for each term

- Sufficient statistics
- Summarize all observed data
- τ1: # of observations
- τ2: Σobservations d h(dt)

- Incorporating prior easy
- Map τ1 and τ2θ

20 labeled documents

- Hyper-learn model
- Learn parameters from query
- Rank documents

Results: Labeled Documents

Precision

Recall

Results: Labeled Documents

Precision

Recall

- Hyper-learn model
- Learn parameters from query
- Rank documents

Short query

Retrieval: Query

- Query = single labeled document
- Vector space-like equation
RV = Σa(t,d) + Σb(q,d)

- Problem: Document dominates
- Solution: Use only query portion
- Another solution: Normalize

t in doc q in query

Precision

Recall

Precision

Recall

Precision

Recall

- Probabilistic models
- Example: Poisson Model

- Hyper-learning the model
- Exponential framework
- Learned a better model
- Investigate retrieval performance

- Bad text model

- Easy to work with

- Heavy tailed!

- Better …

- Use model better
- Use for other applications
- Other IR applications
- Classification

- Correct for document length
- Hyper-learn on different corpora
- Test if learned model generalizes
- Different for genre? Language? People?

- Hyper-learn model better

Contact us with questions:

Jaime Teevan

teevan@ai.mit.edu

David Karger

karger@theory.lcs.mit.edu