1 / 21

Bayesian Learning & Estimation Theory

Bayesian Learning & Estimation Theory. Example: For Gaussian likelihood P ( x | q ) = N ( x |  ,  2 ),. Objective of regression: Minimize error. E ( w ) = ½ S n ( t n - y ( x n , w ) ) 2. L =. Maximum likelihood estimation. Precision b =1/ s 2.

mignon
Download Presentation

Bayesian Learning & Estimation Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Learning & Estimation Theory

  2. Example: For Gaussian likelihood P(x|q) = N(x|,2), Objective of regression: Minimize error E(w)= ½Sn( tn- y(xn,w) )2 L = Maximum likelihood estimation

  3. Precision b =1/s 2 A probabilistic view of linear regression • Compare to error function: E(w)= ½Sn( tn- y(xn,w) )2 • Since argminwE(w)= argmaxw , regression is equivalent to ML estimation of w

  4. Bayesian learning • View the data D and parameter q as random variables (for regression, D= (x, t) and q=w) • The data induces a distribution over the parameter: P(q|D) = P(D,q) / P(D) P(D,q) • Substituting P(D,q) = P(D |q) P(q), we obtain Bayes’ theorem: P(q|D) P(D |q) P(q) Posterior Likelihood x Prior

  5. Bayesian prediction • Predictions (eg, predict t from x using data D) are mediated through the parameter: P(prediction|D) = q P(prediction|q ) P(q |D) dq • Maximum a posteriori (MAP) estimation: qMAP = argmaxqP(q |D) P(prediction|D) P(prediction| qMAP) • Accurate when P(q |D) is concentrated on qMAP

  6. A probabilistic view of regularized regression • E(w) = ½Sn( tn- y(xn,w) )2+ l/2Smwm2 • Prior: w’s are IID Gaussian p(w) = Pm (1/ 2pl-1 ) exp{- l wm2 / 2 } • Since argminwE(w)= argmaxwp(t|x,w) p(w), regularized regression is equivalent to MAP estimation of w ln p(t|x,w) ln p(w)

  7. M wm| 0,a -1 Computed using linear algebra (see textbook) m = 0 Bayesian linear regression • Likelihood: • b specifies precision of data noise • Prior: • a specifies precision of weights • Posterior: • This is an M+1 dimensional Gaussian density • Prediction:

  8. Likelihood Prior Example: y(x) = w0 + w1x y(x) sampled from posterior Data Posterior No data 1st point 2nd point ... 20th point

  9. Mean and one std dev of the predictive distribution Example: y(x) = w0 + w1x + … + wMxM • M = 9, a = 5x10-3: Gives a reasonable range of functions • b = 11.1: Known precision of noise

  10. 0 1 Example: y(x) = w0 + w1f1(x)+ … + wMfM(x) Gaussian basis functions:

  11. Choosing a particular M and w seems wrong – we should hedge our bets Hand-labeled horizontal coordinate, t Cross validation reduced the training data, so the red line isn’t as accurate as it should be The red line doesn’t reveal different levels of uncertainty in predictions How are we doing on the pass sequence? • Least squares regression…

  12. Choosing a particular M and w seems wrong – we should hedge our bets Hand-labeled horizontal coordinate, t Cross validation reduced the training data, so the red line isn’t as accurate as it should be Bayesian regression The red line doesn’t reveal different levels of uncertainty in predictions Hand-labeled horizontal coordinate, t How are we doing on the pass sequence?

  13. Estimation theory • Provided with a predictive distribution p(t|x), how do we estimate a single value for t? • Example: In the pass sequence, Cupid must aim at and hit the man in the white shirt, without hitting the man in the striped shirt • Define L(t,t*) as the loss incurred by estimating t* when the true value is t • Assuming p(t|x) is correct, the expected loss is E[L] = tL(t,t*) p(t|x) dt • The minimum loss estimate is found by minimizing E[L]w.r.t. t*

  14. Squared loss • A common choice:L(t,t*) = ( t - t* )2 E[L] = t ( t - t* )2p(t|x) dt • Not appropriate for Cupid’s problem • To minimize E[L] , set its derivative to zero: dE[L]/dt* = -2t ( t - t* )p(t|x) dt =0 -2tt p(t|x)dt + t* = 0 • Minimum mean squared error (MMSE) estimate: t* = E[t|x] = tt p(t|x)dt For regression: t* = y(x,w)

  15. Other loss functions Absolute loss Squared loss

  16. t* e Median Mean Mean and median Absolute loss t1 t2 t3 t4 t5 t6 t7 t L = |t*-t1| +|t*-t2| + |t*-t3| + |t*-t4| + |t*-t5| + |t*-t6| + |t*-t7| • Consider moving t*to the left by e • L decreases by 6e and increases by e • Changes in L are balanced when t* = t4 • The median of t under p(t|x) minimizes absolute loss • Important: The median is invariant to monotonic transformations of t

  17. D-dimensional estimation • Suppose t is D-dimensional, t = (t1,…,tD) • Example: 2-dimensional tracking • Approach 1: Minimum marginal loss estimation • Find td* that minimizes tL(td,td*)p(td|x) dtd • Approach 2: Minimum joint loss estimation • Define joint loss L(t,t*) • Findt* that minimizes tL(t,t*)p(t|x) dt

  18. Questions?

  19. t= 290 Man in white shirt is occluded Hand-labeled horizontal coordinate, t Fraction of pixels in column with intensity > 0.9 Feature, x Horizontal location Compute 1st moment: x= 224 How are we doing on the pass sequence? • Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data • Can we track the man in the white shirt? 0 320

  20. How are we doing on the pass sequence? • Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data • Can we track the man in the white shirt? Not very well. Regression fails to identify that there really are two classes of solution Hand-labeled horizontal coordinate, t Feature, x

More Related