Parameter Estimation: Bayesian Estimation Techniques in Pattern Recognition

Parameter Estimation:Bayesian Estimation Chapter 3 (Duda et al.) – Sections 3.3-3.7 CS479/679 Pattern RecognitionDr. George Bebis

Parameter Estimation:Main Methods • Maximum Likelihood (ML) • Views the parameters q as quantities whose values are fixed but unknown. • Estimates by maximizing the likelihood of obtaining the samples observed. • Bayesian Estimation (BE) • Views the parameters q as random variables having some known prior distribution p(q). • Observing new samples D, converts the prior p(q) to a posterior density p(q/ D) (i.e., the samples D revise our estimate over the parameters).

Parameter Estimation:Main Methods (cont’d) • Before we observe the data, the parameters are described by a prior density p(𝜃). • Once we obtain data, we make use of Bayes theorem to find the posterior p(𝜃|D). • Ideally we want the data to sharpen the posterior p(𝜃|D), that is, reduce our uncertainty about the parameters. p(D/θ) p(θ/D) p(θ/D)

Role of Training Examplesin Classification • The Bayes’ rule allows us to compute the posterior probabilities P(ωi /x): • Consider the role of the training examples D by introducing them in the computation of the posterior probabilities:

Role of Training Examples (cont’d) marginalize Using only the samples from class i/j

Role of Training Examples (cont’d) • The training examples are important in determining both the class-conditional densities and the prior probabilities: • For simplicity, replace P(ωi /Di) withP(ωi):

Bayesian Estimation (BE) • Need to estimate p(x/ωi,Di) for every class ωi • If the samples in Dj give no information about qi, we need to solve cindependent problems: “Given D, estimate p(x/D)”

BE Approach • Estimate p(x/D) as follows: • Since , we have: marginalize model assumed (i.e., Gaussian)

BE vs ML/MAP • ML/MAP makes a point estimate • BE estimates a distribution: • Note that the BE solution might not be of the exact parametric form assumed.

Interpretation of BE Solution • The BE solution implies that if we are less certain about the exact value of θ, consider a weighted average of p(x / θ) over the possible values of θ: • Samples D exert their influence on p(x / D) through p(θ / D).

Relation to ML solution • Ifp(D/θ) peaks sharply at (i.e., ML solution) then p(θ /D) will, in general, peak sharply at too (assuming p(θ) is broad and smooth) • Therefore, ML is a special case of BE! p(θ /D) p(θ /D)

BE Main Steps (1) Compute p(θ/D) : (2) Computep(x/D) :

Case 1: Univariate Gaussian,Unknown μ (known ) D={x1,x2,…,xn} (independently drawn) (1)

Case 1: Univariate Gaussian, Unknown μ(cont’d) • It can be shown that p(μ/D) has the following form: X c • p(μ/D) peaks at μn

Case 1: Univariate Gaussian, Unknown μ(cont’d) (i.e., lies between them) as(ML estimate) as 0

Case 1: Univariate Gaussian, Unknown μ(cont’d) Bayesian Learning

Case 1: Univariate Gaussian, Unknown μ(cont’d) independent on μ (2) Note that we assumed p(x/μ)~N(μ,σ2); however, p(x/D)~N(μn, σ2+ σ2n); As the number of samples increases, p(x/D)converges to p(x/μ)

Case 2: Multivariate Gaussian, Unknown μ Assume p(x/μ)~N(μ,Σ) and p(μ)~N(μ0, Σ0) D={x1,x2,…,xn} (independently drawn) Computep(μ/D): (1)

Case 2: Multivariate Gaussian, Unknown μ (cont’d) • It can be shown that p(μ/D) has the following form: where:

Case 2: Multivariate Gaussian, Unknown μ (cont’d) (2) Computep(x/D): Note that we assumed p(x/μ)~N(μ,Σ); however, p(x/D)~N(μn, Σ+Σn); As the number of samples increases, p(x/D)converges to p(x/μ)

Recursive Bayes Learning • Idea: develop an incremental learning algorithm: Dn: (x1, x2, …., xn-1, xn) • Rewrite as follows: • Substitute in Dn-1

Recursive Bayes Learning (cont’d) marginalize substitute cond. prob. n=1,2,…

Example p(θ)

Example (cont’d) (x4=8) In general:

Example (cont’d) p(θ/D4) peaks at p(θ)= p(θ/D0) Iterations ML estimate: Bayesian estimate:

ML vs Bayesian Estimation • Number of training data • The two methods become equivalent assuming infinite number of training data (and prior distributions that do not exclude the true solution). • For small training data sets, they give different results in most cases. • Computational complexity • ML uses differential calculus or gradient search for maximizing the likelihood. • Bayesian estimation requires complex multidimensional integration techniques.

ML vs Bayesian Estimation (cont’d) • Solution interpretation • Easier to interpret ML solutions (i.e., must be of the assumed parametric form). • A Bayesian estimation solution might not be of the parametric form assumed. • Prior distribution • If the prior distribution p(θ)is uniform, Bayesian estimation solutions are equivalent to ML solutions. • In general, the two methods will give different solutions.

Computational Complexity ML estimation dimensionality: d # training data: n # classes: c • Learning complexity O(dn) O(d2n) O(d2) O(n) O(d3) O(1) These computations must be repeated c times (once for each class) (n>d)

Computational Complexity dimensionality: d # training data: n # classes: c • Classification complexity O(1) O(d2) These computations must be repeated c times and take max

Computational Complexity Bayesian Estimation • Learning complexity: higher than ML • Classification complexity: same as ML

Main Sources of Error in Classifier Design • Bayes error • The error due to overlapping densities p(x/ωi) • Model error • The error due to choosing an incorrect model. • Estimation error • The error due to incorrectly estimated parameters

Parameter Estimation: Bayesian Estimation Techniques in Pattern Recognition

Parameter Estimation: Bayesian Estimation Techniques in Pattern Recognition

Presentation Transcript

Parameter Estimation Schemes for Solving ADH Groundwater Models

Bayesian state estimation and application to tracking

Lecture 2: Parameter Estimation and Evaluation of Support

Chapter 15

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Chapter 8:

An Introduction of Support Vector Machine

Pose Estimation

Chapter 8 Interval Estimation

Estimation!!

Parameter estimation class 6

Multiple View Geometry

Summary of Bayesian Estimation in the Rasch Model

Forest Parameter Estimation using Polarimetric SAR Interferometry

Statistics 300: Elementary Statistics

Ocean Ecosystem Model Parameter Estimation in a Bayesian Hierarchical Model (BHM)

SOFTWARE COST ESTIMATION

The covariation method of estimation Add_my_pet

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Software Estimation