1 / 12

Maximum Likelihood Estimation

Learning. Probabilistic Graphical Models. Parameter Estimation. Maximum Likelihood Estimation. Biased Coin Example. P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- . Tosses are independent of each other Tosses are sampled from the same distribution (identically distributed).

long
Download Presentation

Maximum Likelihood Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Probabilistic Graphical Models Parameter Estimation MaximumLikelihoodEstimation

  2. Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- • Tosses are independent of each other • Tosses are sampled from the same distribution (identically distributed) sampled IID from P

  3. IID as a PGM   X . . . X[1] X[M] Data m

  4. L(D:)  0 0.2 0.4 0.6 0.8 1 Maximum Likelihood Estimation • Goal: find [0,1] that predicts D well • Prediction quality = likelihood of D given 

  5. Maximum Likelihood Estimator • Observations: MH heads and MT tails • Find  maximizing likelihood • Equivalent to maximizing log-likelihood • Differentiating the log-likelihood and solving for :

  6. Sufficient Statistics • For computing  in the coin toss example, we only needed MH and MT since •  MH and MT are sufficient statistics

  7. Datasets Statistics Sufficient Statistics • A function s(D) is a sufficient statistic from instances to a vector in k if for any two datasets D and D’ and any  we have

  8. Sufficient Statistic for Multinomial • For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D • Sufficient statistic s(x) is a tuple of dimension k • s(xi)=(0,...0,1,0,...,0) i

  9. Sufficient Statistic for Gaussian • Gaussian distribution: • Rewrite as • Sufficient statistics for Gaussian: s(x)=<1,x,x2>

  10. Maximum Likelihood Estimation • MLE Principle: Choose  to maximize L(D:) • Multinomial MLE: • Gaussian MLE:

  11. Summary • Maximum likelihood estimation is a simple principle for parameter selection given D • Likelihood function uniquely determined by sufficient statistics that summarize D • MLE has closed form solution for many parametric distributions

  12. END END END

More Related