formal multinomial and multiple bernoulli language models
Download
Skip this Video
Download Presentation
Formal Multinomial and Multiple-Bernoulli Language Models

Loading in 2 Seconds...

play fullscreen
1 / 20

formal multinomial and multiple-bernoulli language models - PowerPoint PPT Presentation


  • 269 Views
  • Uploaded on

Formal Multinomial and Multiple-Bernoulli Language Models. Don Metzler. Overview. Two formal estimation techniques MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03] Posterior expectations Language models considered Multinomial Multiple-Bernoulli (2 models).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'formal multinomial and multiple-bernoulli language models' - Thomas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Two formal estimation techniques
    • MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03]
    • Posterior expectations
  • Language models considered
    • Multinomial
    • Multiple-Bernoulli (2 models)
bayesian framework map estimation
Bayesian Framework(MAP Estimation)
  • Assume textual data X (document, query, etc) is generated by sampling from some distribution P(X | θ) parameterized by θ
  • Assume some prior over θ.
  • For each X, we want to find the maximum a posteriori (MAP) estimate:
  • θX is our (language) model for data X
multinomial
Multinomial
  • Modeling assumptions:
  • Why Dirichlet?
    • Conjugate prior to multinomial
    • Easy to work with
how do we set
How do we set α?
  • α= 1 => uniform prior => ML estimate
  • α= 2 => Laplacian smoothing
  • Dirichlet-like smoothing:
slide7

left – ML estimate – α = 1

center – Laplace – α = 2

right – α = μP(w | C)

μ= 10

X = A B B B

P(A | C) = 0.45

P(B | C) = 0.55

multiple bernoulli
Multiple-Bernoulli
  • Assume vocabulary V = A B C D
  • How do we model text X = D B B D?
    • In multinomial, we represent X as the sequence D B B D
    • In multiple-Bernoulli we represent X as the vector [0 1 0 1] denoting terms B and D occur in X
    • Each X represented by single binary vector
multiple bernoulli model a
Multiple-Bernoulli(Model A)
  • Modeling assumptions:
    • Each X is a single sample from a multiple-Bernoulli distribution parameterized by θ
    • Use conjugate prior (multiple-Beta)
problems with model a
Problems with Model A
  • Ignores document length
    • This may be desirable in some applications
  • Ignores term frequencies
  • How to solve this?
    • Model X as a collection of samples (one per word occurrence) from an underlying multiple-Bernoulli distribution
    • Example:V = A B C D, X = B D D BRepresentation: { [0 1 0 0], [0 0 0 1], [0 0 0 1], [0 1 0 0] }
multiple bernoulli model b
Multiple-Bernoulli(Model B)
  • Modeling assumptions:
    • Each X is a collection (multiset) of indicator vectors sampled from a multiple-Bernoulli distribution parameterized by θ
    • Use conjugate prior (multiple-Beta)
how do we set1
How do we set α, β?
  • α= β= 1 => uniform prior => ML estimate
  • But we want smoothed probabilities…
    • One possibility:
slide15

Multiple-Bernoulli Model B

left – ML estimate

α = β = 1 center – smoothed (μ= 1)

right – smoothed (μ= 10)

X = A B B B

P(A | C) = 0.45

P(B | C) = 0.55

another approach
Another approach…
  • Another way to formally estimate language models is via:
  • Expectation over posterior
  • Takes more uncertainty into account than MAP estimate
  • Because we chose to use conjugate priors the integral can be evaluated analytically
multinomial multiple bernoulli connection
Multinomial / Multiple-BernoulliConnection
  • Multinomial
  • Multiple-Bernoulli
  • Dirichlet smoothing
bayesian framework ranking
Bayesian Framework(Ranking)
  • Query likelihood
    • estimate model θD for each document D
    • score document D by P(Q | θD)
    • measures likelihood of observing query Q given model θD
  • KL-divergence
    • estimate model for both query and document
    • score document D by KL(θQ || θD)
    • measures “distance” between two models
  • Predictive density
conclusions
Conclusions
  • Both estimation and smoothing can achieved using Bayesian estimation techniques
  • Little difference between MAP and posterior expectation estimates – mostly depends on μ
  • Not much difference between Multinomial and Multiple-Bernoulli language models
  • Scoring multinomial is cheaper
  • No good reason to choose multiple-Bernoulli over multinomial in general
ad