240-650 Principles of Pattern Recognition

1 / 21

# 240-650 Principles of Pattern Recognition - PowerPoint PPT Presentation

240-650 Principles of Pattern Recognition. Montri Karnjanadecha [email protected] http://fivedots.coe.psu.ac.th/~montri. Chapter 3. Maximum-Likelihood and Bayesian Parameter Estimation. Introduction. We could design an optimum classifier if we know P( w i ) and p( x | w i )

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '240-650 Principles of Pattern Recognition' - kenna

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### 240-650 Principles of Pattern Recognition

[email protected]

http://fivedots.coe.psu.ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

### Chapter 3

Maximum-Likelihood and Bayesian Parameter Estimation

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Introduction
• We could design an optimum classifier if we know P(wi) and p(x|wi)
• We rarely have knowledge about the probabilistic structure of the problem
• We often estimate P(wi) and p(x|wi) from training data or design samples

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Maximum-Likelihood Estimation
• ML Estimation
• Always have good convergence properties as the number of training samples increases
• Simpler that other methods

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The General Principle
• Suppose we separate a collection of samples according to class so that we have c data sets, D1, …, Dcwith the samples in Djhaving been drawn independently according to the probability law p(x|wj)
• We say such samples are i.i.d.– independently and identically distributed random variable

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The General Principle
• We assume that p(x|wj) has a known parametric form and is determined uniquely by the value of a parameter vector qj
• For example
• We explicitly write p(x|wj) as p(x|wj, qj)

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Problem Statement
• To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors q1,…qc associated with each category

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Simplified Problem Statement
• If samples in Di give no information about qj if i = j
• We now have c separated problems of the following form:

To use a set D of training samples drawn independently from the probability density p(x|q) to estimate the unknown vector q.

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Suppose that D contains n samples, x1,…,xn.
• Then we have
• The Maximum-Likelihood estimate of q is the value of that maximizes p(D|q)

Likelihood of q with respect to the set of samples

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Let q = (q1, …, qp)t
• Let be the gradient operator

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Log-Likelihood Function
• We define l(q) as the log-likelihood function
• We can write our solution as

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

MLE
• From
• We have
• And
• Necessary condition for MLE

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The Gaussian Case: Unknown m
• Suppose that the samples are drawn from a multivariate normal populationwith mean m and covariance matrix S
• Let m is the only unknown
• Consider a sample point xk and find
• and

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The MLE of m must satisfy
• After rearranging

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Sample Mean
• The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean)
• If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

The Gaussian Case: Unknown mand S
• This is a more typical case where mean and covariance matrix are unknown
• Consider the univariate case with q1=m and q2=s2

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

And its derivative is
• Set to 0
• and

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

With a little rearranging, we have

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

MLE for multivariate case

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

Bias
• The MLE for the variance s2 is biased
• The expected value over all data sets of size n of the sample variance is not equal to the true variance
• An Unbiased estimator for S is given by

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation