4 maximum likelihood
Download
1 / 11

4. Maximum Likelihood - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

4. Maximum Likelihood. Prof. A.L. Yuille Stat 231. Fall 2004. Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 4. Maximum Likelihood' - damisi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
4 maximum likelihood

4. Maximum Likelihood

Prof. A.L. Yuille

Stat 231. Fall 2004.


Learning probability distributions
Learning Probability Distributions.

  • Learn the likelihood functions and priors from datasets.

  • Two Main Strategies. Parametric and Non-Parametric.

  • This Lecture and the next will concentrate on Parametric methods.

    (This assumes a parametric form for the distributions).


Maximum likelihood estimation
Maximum Likelihood Estimation.

Assume distribution is of form

  • Independent Identically Distributed (I.I.D.) samples;

  • Choose


Supervised versus unsupervised learning
Supervised versus Unsupervised Learning.

  • Supervised Learning assumes that we known the class label for each datapoint.

  • I.e. We are given pairs

  • where is the datapoint and is the class label.

  • Unsupervised Learning does not assume that the class labels are specified. This is a harder task.

  • But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians).

  • Stat 231 is almost entirely concerned with supervised learning.


Example of mle
Example of MLE.

  • One-Dimensional Gaussian Distribution:

  • Solve for by differentiation:


MLE

  • The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data.

  • More usually, algorithms are required.

  • Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.


Mle and kullback leibler
MLE and Kullback-Leibler

  • What happens if the data is not generated by the model that we assume?

  • Suppose the true distribution is and our models are of form

  • The Kullback-Leiber divergence is:

  • This is

  • K-L is a measure of the difference between


Mle and kullback leibler1
MLE and Kullback-Leibler

  • Samples

  • Approximate

  • By the empirical KL:

  • Minimizing the empirical KL is equivalent to MLE.

  • We find the distribution of form


Mle example

MLE example

We denote the log-likelihood as

a function of q

q* is computed by solving equations

For example, the Gaussian family

gives close form solution.


Learning with a prior
Learning with a Prior.

  • We can put a prior on the parameter values

  • We can estimate this recursively (if samples are i.i.d):

  • Bayes Learning: estimate a probability distribution on



ad