1 / 17

Pattern Recognition

Chapter 3 Maximum Likelihood and Bayesian Estimation – Part1. Pattern Recognition. Practical Issues. We could design an optimal classifier if we knew: P(  i ) (priors) p(x/  i ) (class-conditional densities) In practice, we rarely have this complete information!

Download Presentation

Pattern Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Maximum Likelihood and Bayesian Estimation – Part1 Pattern Recognition

  2. Practical Issues • We could design an optimal classifier if we knew: • P(i) (priors) • p(x/i) (class-conditional densities) • In practice, we rarely have this complete information! • Design the classifier from a set of training examples. • Estimating P(i) is usually easy. • Estimating p(x/i) is more difficult: • Number of samples is often too small • Dimensionality of feature space is large

  3. Parameter Estimation • Assumptions • We are given a sample set D ={x1, x2, ...., xn}, where the samples were drawn according to p(x|wj) • p(x|wj) has a known parametric form, that is, it is determined by parameters q e.g., p(x/i) ~ N( i, i) • Parameter estimation problem • Given D, find the best possible q • This is a classical problem in statistics!

  4. Main Methods inParameter Estimation • Maximum Likelihood (ML) • It assumes that the values of the parameters are fixed but unknown. • Best estimate is obtained by maximizing the probability of obtaining the samples actually observed (i.e., training data) • Bayesian Estimation • It assumes that the parameters are random variables having some known a-priori distribution. • To determine the true value of parameters, it converts this to a posterior density using the samples.

  5. Maximum Likelihood (ML)Estimation - Assumptions • Suppose the training data is divided in c sets (i.e., one for each class): D1, D2, ...,Dc • Assume that samples in Dj have been drawn independently according to p(x/ωj). • Assume that p(x/ωj)has known parametric form with parameters θj, : e.g, θj =(μj , Σj) for Gaussian distributions or, in general, θj =(θ1 , θ2, …, θp)t

  6. ML Estimation - Problem Definition and Solution • Problem: given D1, D2, ...,Dc and a model for each class, estimate θ1, θ2,…, θc • If samples in Dj give no information about θi( ), we need to solve c independent problems (i.e., one for each class) • The ML estimate for D={x1,x2,..,xn} is the value that maximizes p(D/θ) (i.e., best supports the training data).

  7. ML Parameter Estimation (cont’d) θ=μ

  8. ML Parameter Estimation (cont’d) • How to find the maximum? • Easier to consider • The solution maximizes p(D/ θ) or ln p(D/ θ)

  9. Maximum A-Priori Estimator (MAP) • Assume that θis a random vector and that we know p(θ). • Given D, MAP converts p(θ) to p(θ/D): • The goal is to maximize p(θ/D) or p(D/θ)p(θ): • MAP is equivalent to ML when p(θ) is uniform

  10. ML for Gaussian Density:Case of Unknown θ=μ Consider ln p(x/μ) where Computing the gradient, we have where (by setting x=xk)

  11. ML for Gaussian Density:Case of Unknown θ=μ (cont’d) • Setting we have: • The solution is given by • The ML estimate is simply the “sample mean”.

  12. ML for Gaussian Density:Case of Unknown θ=(θ1,θ2)=(μ,σ) • Consider ln p(x/θ) where p(xk/θ) p(xk/θ) p(xk/θ)

  13. ML for Gaussian Density:Case of Unknown θ=(θ1,θ2)=(μ,σ) (cont’d) p(xk/θ)=0 =0 =0 The solutions are given by: θ1= θ2=

  14. ML for Gaussian Density:Case of Unknown θ=(μ,Σ) • In the general case (i.e., multivariate Gaussian) the solutions are:

  15. Biased and Unbiased Estimates • An estimate is unbiased when where θ is the true value. • The ML estimate is unbiased, i.e., • The ML estimates and are biased:

  16. Biased and Unbiased Estimates (cont’d) • The following are unbiased estimators for and • Note: the ML estimates of and are asymptotically unbiased:

  17. Some comments about ML • ML estimation is usually simpler than alternative methods. • Has good convergence properties as the number of training samples increases. • If the model chosen for p(x/θ) is correct, and independence assumptions among variables are true, ML will give very good results. • If the model is wrong, ML will give poor results.

More Related