PatReco: Estimation/Training

PatReco: Estimation/Training Alexandros Potamianos School of ECE, NTUA Fall 2014-15

Estimation/Training • Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class

Supervised-Unsupervised • Supervised training: All data has been (manually) labeled, i.e., assigned to classes • Unsupervised training: Data is not assigned a class label

Observable data • Fully observed data: all information necessary for training is available (features, class labels etc.) • Partially observed data: some of the features or some of the class labels are missing

Supervised Training(fully observable data) • Maximum likelihood estimation (ML) • Maximum a posteriori estimation (MAP) • Bayesian estimation (BE)

Training process • Collected data used for training consists of the following examples D = {x1, x2, … xN} • Step 1: Label each example with the corresponding class label ω1, ω2, ... ωΚ • Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D1, D2..DK

Training Process: Step 1 D = {x1, x2, x3, x4, x5, … xN} Label manually ω1, ω2, ... ωΚ D1 = {x11, x12, x13, … x1N1} D2 = {x21, x22, x23, … x2N2} ………… DK = {xK1, xK2, xK3, … xKNk}

Training Process: Step 2 • Maximum Likelihood θ1 = argmaxΘ P(D1|θ1) • Maximum-a-posteriori θ1 = argmaxΘ P(D1|θ1) P(θ1) • Bayesian estimation P (x|D1) = P(x|θ1)P(θ1|D1) dθ1

ML Estimation Assumptions • P(x|ωi) follows a parametric distribution with parametersθ • Dj tells us nothing about P(x|ωi) (functional independence) • Observations x1, x2, x3, … xNareiid (independent identically distributed 4a (ML only!) θ is a quantity whose value is fixed but unknown

ML estimate for Gaussian pdf If P(x|ω) = Ν(μ,σ2) and θ=(μ,σ2) then 1-D μ = (1/Ν) Σj=1..N xj σ2 = (1/Ν) Σj=1..N (xj – μ)2 Multi-D:θ=(μ, Σ) μ = (1/Ν) Σj=1..Nxj Σ = (1/Ν) Σj=1..N (xj – μ)Τ (xj – μ)

Bayesian Estimat. Assumptions • P(x|ωi) follows a parametric distribution with parametersθ • Dj tells us nothing about P(x|ωi) (functional independence) • Observations x1, x2, x3, … xNareiid (independent identically distributed) 4b (MAP, BE) θ isa random variable whose prior distribution p(θ) is known

Bayesian Estimation P (x|D) = P(x,θ|D) dθ = P(x|θ,D)P(θ|D) dθ = P(x|θ)P(θ|D) dθ STEP 1: P(θ)  P(θ|D) P(θ|D) = P(D|θ)P(θ)/P(D) STEP 2: P(x|θ) P (x|D)

Bayesian Estimate for Gaussian pdf and priors If P(x|θ) = Ν(μ,σ2) and p(θ) = Ν(μ0,σ02) then STEP 1: P(θ|D)=Ν(μn,σn2) STEP 2: P(x|D)=N(μn, σ2+σn2) μn = σ02 /(n σ02 + σ2) (Σj xj) + σ2 /(n σ02 + σ2)μ0 σn2= σ2σ02 /(n σ02 + σ2) For large n (number of training samples) maximum likelihood and Bayesian estimation equivalent!!!

Conclusions • Maximum likelihood estimation is simple and gives good estimates when the number of training samples is large • Bayesian adaptation gives good estimates even for small amounts of training data provided that a good prior is selected • Bayesian adaptation is hard and often does not have a closed form solution (in which case try: iterative recursive Bayesian estimation)

PatReco: Estimation/Training

PatReco: Estimation/Training

Presentation Transcript

Effort Estimation

Project Estimation and scheduling

Chapter 8:

The Power of Physics Estimation

LECTURE 25: DISCRIMINATIVE TRAINING

Pose Estimation

Estimation!!

Decoupled Estimation of DOA and Angular Spread for a Spatially Distribution Source

Estimation et optimisation de la consommation d’énergie des circuits asynchrones

Power Estimation and Modeling

Software Cost Estimation

Hydro Training Status HSD Meeting February 23, 2010

SOFTWARE COST ESTIMATION

Shortcut to Parametric Estimation via Web Trns •port

Training Workshop on the ICCS 2009 database Weighting and Variance Estimation

Software Estimation

Effort Estimation

LECTURE 31: DISCRIMINATIVE TRAINING

Software Cost Estimation

Training Workshop on the ICCS 2009 database Weighting and Variance Estimation

Different Systems for Estimation of MARP Sizes