1 / 26

Learning Structural SVMs with Latent Variables

Learning Structural SVMs with Latent Variables. Xionghao Liu. Annotation Mismatch. Action Classification. x. h. Input x. Annotation y. Latent h. y = “jumping”. Desired output during test time is y. Mismatch between desired and available annotations.

laksha
Download Presentation

Learning Structural SVMs with Latent Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Structural SVMs with Latent Variables Xionghao Liu

  2. Annotation Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is y Mismatch between desired and available annotations Exact value of latent variable is not “important”

  3. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

  4. Weakly Supervised Data x Input x h Output y  {-1,+1} Hidden h y = +1

  5. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1

  6. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = 0 y = +1

  7. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = Φ(x,h) y = +1

  8. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h

  9. Latent SVM Scoring function Parameters wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  10. Learning Latent SVM Training data {(xi,yi), i= 1,2,…,n} (yi, yi(w)) Σi minw Empirical risk minimization No restriction on the loss function Annotation mismatch

  11. Learning Latent SVM Find a regularization-sensitive upper bound (yi, yi(w)) Σi minw Empirical risk minimization Non-convex Parameters cannot be regularized

  12. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -wT(xi,yi(w),hi(w))

  13. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -maxhiwT(xi,yi,hi) y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  14. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Parameters can be regularized Is this also convex?

  15. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Convex - Convex Difference of convex (DC) program

  16. Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi

  17. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions

  18. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Difference of convex (DC) program

  19. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  20. Concave-Convex Procedure + Optimize the convex upper bound • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  21. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  22. Concave-Convex Procedure + Until Convergence • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  23. Concave-Convex Procedure + Linear upper bound? • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  24. Linear Upper Bound • -maxhiwT(xi,yi,hi) Current estimate = wt • hi* = argmaxhiwtT(xi,yi,hi) • -wT(xi,yi,hi*) • ≥ -maxhiwT(xi,yi,hi)

  25. CCCP for Latent SVM Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Repeat until convergence

  26. Thanks & QA

More Related