1 / 15

PatReco: Estimation/Training

This presentation explains the estimation and training process for a Gaussian model using Maximum Likelihood Estimation (ML), Maximum-a-posteriori (MAP) estimation, and Bayesian estimation. It discusses supervised and unsupervised training, fully observed and partially observed data, and the assumptions and equations involved in each method. It also highlights the importance of selecting a good prior for Bayesian adaptation and the advantages of each estimation approach.

felipak
Download Presentation

PatReco: Estimation/Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PatReco: Estimation/Training Alexandros Potamianos School of ECE, NTUA Fall 2014-15

  2. Estimation/Training • Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class

  3. Supervised-Unsupervised • Supervised training: All data has been (manually) labeled, i.e., assigned to classes • Unsupervised training: Data is not assigned a class label

  4. Observable data • Fully observed data: all information necessary for training is available (features, class labels etc.) • Partially observed data: some of the features or some of the class labels are missing

  5. Supervised Training(fully observable data) • Maximum likelihood estimation (ML) • Maximum a posteriori estimation (MAP) • Bayesian estimation (BE)

  6. Training process • Collected data used for training consists of the following examples D = {x1, x2, … xN} • Step 1: Label each example with the corresponding class label ω1, ω2, ... ωΚ • Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D1, D2..DK

  7. Training Process: Step 1 D = {x1, x2, x3, x4, x5, … xN} Label manually ω1, ω2, ... ωΚ D1 = {x11, x12, x13, … x1N1} D2 = {x21, x22, x23, … x2N2} ………… DK = {xK1, xK2, xK3, … xKNk}

  8. Training Process: Step 2 • Maximum Likelihood θ1 = argmaxΘ P(D1|θ1) • Maximum-a-posteriori θ1 = argmaxΘ P(D1|θ1) P(θ1) • Bayesian estimation P (x|D1) = P(x|θ1)P(θ1|D1) dθ1

  9. ML Estimation Assumptions • P(x|ωi) follows a parametric distribution with parametersθ • Dj tells us nothing about P(x|ωi) (functional independence) • Observations x1, x2, x3, … xNareiid (independent identically distributed 4a (ML only!) θ is a quantity whose value is fixed but unknown

  10. ML estimation θ= argmaxΘ P(θ|D) = argmaxΘ P(D|θ) P(θ) =4argmaxΘ P(D|θ) = argmaxΘP(x1, x2, … xN|θ) =3argmaxΘ Πj P(xj|θ) =>  Πj P(xj|θ) / θ = 0 => θ = …

  11. ML estimate for Gaussian pdf If P(x|ω) = Ν(μ,σ2) and θ=(μ,σ2) then 1-D μ = (1/Ν) Σj=1..N xj σ2 = (1/Ν) Σj=1..N (xj – μ)2 Multi-D:θ=(μ, Σ) μ = (1/Ν) Σj=1..Nxj Σ = (1/Ν) Σj=1..N (xj – μ)Τ (xj – μ)

  12. Bayesian Estimat. Assumptions • P(x|ωi) follows a parametric distribution with parametersθ • Dj tells us nothing about P(x|ωi) (functional independence) • Observations x1, x2, x3, … xNareiid (independent identically distributed) 4b (MAP, BE) θ isa random variable whose prior distribution p(θ) is known

  13. Bayesian Estimation P (x|D) = P(x,θ|D) dθ = P(x|θ,D)P(θ|D) dθ = P(x|θ)P(θ|D) dθ STEP 1: P(θ)  P(θ|D) P(θ|D) = P(D|θ)P(θ)/P(D) STEP 2: P(x|θ) P (x|D)

  14. Bayesian Estimate for Gaussian pdf and priors If P(x|θ) = Ν(μ,σ2) and p(θ) = Ν(μ0,σ02) then STEP 1: P(θ|D)=Ν(μn,σn2) STEP 2: P(x|D)=N(μn, σ2+σn2) μn = σ02 /(n σ02 + σ2) (Σj xj) + σ2 /(n σ02 + σ2)μ0 σn2= σ2σ02 /(n σ02 + σ2) For large n (number of training samples) maximum likelihood and Bayesian estimation equivalent!!!

  15. Conclusions • Maximum likelihood estimation is simple and gives good estimates when the number of training samples is large • Bayesian adaptation gives good estimates even for small amounts of training data provided that a good prior is selected • Bayesian adaptation is hard and often does not have a closed form solution (in which case try: iterative recursive Bayesian estimation)

More Related