Formal multinomial and multiple bernoulli language models
Download
1 / 20

formal multinomial and multiple-bernoulli language models - PowerPoint PPT Presentation


  • 269 Views
  • Uploaded on

Formal Multinomial and Multiple-Bernoulli Language Models. Don Metzler. Overview. Two formal estimation techniques MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03] Posterior expectations Language models considered Multinomial Multiple-Bernoulli (2 models).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'formal multinomial and multiple-bernoulli language models' - Thomas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Overview
Overview

  • Two formal estimation techniques

    • MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03]

    • Posterior expectations

  • Language models considered

    • Multinomial

    • Multiple-Bernoulli (2 models)


Bayesian framework map estimation
Bayesian Framework(MAP Estimation)

  • Assume textual data X (document, query, etc) is generated by sampling from some distribution P(X | θ) parameterized by θ

  • Assume some prior over θ.

  • For each X, we want to find the maximum a posteriori (MAP) estimate:

  • θX is our (language) model for data X


Multinomial
Multinomial

  • Modeling assumptions:

  • Why Dirichlet?

    • Conjugate prior to multinomial

    • Easy to work with



How do we set
How do we set α?

  • α= 1 => uniform prior => ML estimate

  • α= 2 => Laplacian smoothing

  • Dirichlet-like smoothing:


left – ML estimate – α = 1

center – Laplace – α = 2

right – α = μP(w | C)

μ= 10

X = A B B B

P(A | C) = 0.45

P(B | C) = 0.55


Multiple bernoulli
Multiple-Bernoulli

  • Assume vocabulary V = A B C D

  • How do we model text X = D B B D?

    • In multinomial, we represent X as the sequence D B B D

    • In multiple-Bernoulli we represent X as the vector [0 1 0 1] denoting terms B and D occur in X

    • Each X represented by single binary vector


Multiple bernoulli model a
Multiple-Bernoulli(Model A)

  • Modeling assumptions:

    • Each X is a single sample from a multiple-Bernoulli distribution parameterized by θ

    • Use conjugate prior (multiple-Beta)



Problems with model a
Problems with Model A

  • Ignores document length

    • This may be desirable in some applications

  • Ignores term frequencies

  • How to solve this?

    • Model X as a collection of samples (one per word occurrence) from an underlying multiple-Bernoulli distribution

    • Example:V = A B C D, X = B D D BRepresentation: { [0 1 0 0], [0 0 0 1], [0 0 0 1], [0 1 0 0] }


Multiple bernoulli model b
Multiple-Bernoulli(Model B)

  • Modeling assumptions:

    • Each X is a collection (multiset) of indicator vectors sampled from a multiple-Bernoulli distribution parameterized by θ

    • Use conjugate prior (multiple-Beta)



How do we set1
How do we set α, β?

  • α= β= 1 => uniform prior => ML estimate

  • But we want smoothed probabilities…

    • One possibility:


Multiple-Bernoulli Model B

left – ML estimate

α = β = 1 center – smoothed (μ= 1)

right – smoothed (μ= 10)

X = A B B B

P(A | C) = 0.45

P(B | C) = 0.55


Another approach
Another approach…

  • Another way to formally estimate language models is via:

  • Expectation over posterior

  • Takes more uncertainty into account than MAP estimate

  • Because we chose to use conjugate priors the integral can be evaluated analytically


Multinomial multiple bernoulli connection
Multinomial / Multiple-BernoulliConnection

  • Multinomial

  • Multiple-Bernoulli

  • Dirichlet smoothing


Bayesian framework ranking
Bayesian Framework(Ranking)

  • Query likelihood

    • estimate model θD for each document D

    • score document D by P(Q | θD)

    • measures likelihood of observing query Q given model θD

  • KL-divergence

    • estimate model for both query and document

    • score document D by KL(θQ || θD)

    • measures “distance” between two models

  • Predictive density



Conclusions
Conclusions

  • Both estimation and smoothing can achieved using Bayesian estimation techniques

  • Little difference between MAP and posterior expectation estimates – mostly depends on μ

  • Not much difference between Multinomial and Multiple-Bernoulli language models

  • Scoring multinomial is cheaper

  • No good reason to choose multiple-Bernoulli over multinomial in general


ad