1 / 142

Loss-based Learning with Weak Supervision

Loss-based Learning with Weak Supervision. M. Pawan Kumar. Computer Vision Data. Log (Size). ~ 2000. Segmentation. Information. Computer Vision Data. ~ 1 M. Log (Size). Bounding Box. ~ 2000. Segmentation. Information. Computer Vision Data. > 14 M. ~ 1 M. Log (Size). Image-Level.

arden-foley
Download Presentation

Loss-based Learning with Weak Supervision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loss-based Learning with Weak Supervision M. Pawan Kumar

  2. Computer Vision Data Log (Size) ~ 2000 Segmentation Information

  3. Computer Vision Data ~ 1 M Log (Size) Bounding Box ~ 2000 Segmentation Information

  4. Computer Vision Data > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information “Chair” “Car”

  5. Computer Vision Data > 6 B Noisy Label > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information

  6. Computer Vision Data Detailed annotation is expensive Sometimes annotation is impossible Desired annotation keeps changing Learn with missing information (latent variables)

  7. Outline • Two Types of Problems • Part I – Annotation Mismatch • Part II – Output Mismatch

  8. Annotation Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is y Mismatch between desired and available annotations Exact value of latent variable is not “important”

  9. Output Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping”

  10. Output Mismatch Action Detection x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is (y,h) Mismatch between output and available annotations Exact value of latent variable is important

  11. Part I

  12. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

  13. Weakly Supervised Data x Input x h Output y  {-1,+1} Hidden h y = +1

  14. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1

  15. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = 0 y = +1

  16. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = Φ(x,h) y = +1

  17. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h

  18. Latent SVM Scoring function Parameters wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  19. Learning Latent SVM Training data {(xi,yi), i= 1,2,…,n} (yi, yi(w)) Σi minw Empirical risk minimization No restriction on the loss function Annotation mismatch

  20. Learning Latent SVM Find a regularization-sensitive upper bound (yi, yi(w)) Σi minw Empirical risk minimization Non-convex Parameters cannot be regularized

  21. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -wT(xi,yi(w),hi(w))

  22. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -maxhiwT(xi,yi,hi) y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  23. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Parameters can be regularized Is this also convex?

  24. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Convex - Convex Difference of convex (DC) program

  25. Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi

  26. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions

  27. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Difference of convex (DC) program

  28. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  29. Concave-Convex Procedure + Optimize the convex upper bound • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  30. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  31. Concave-Convex Procedure + Until Convergence • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  32. Concave-Convex Procedure + Linear upper bound? • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  33. Linear Upper Bound • -maxhiwT(xi,yi,hi) Current estimate = wt • hi* = argmaxhiwtT(xi,yi,hi) • -wT(xi,yi,hi*) • ≥ -maxhiwT(xi,yi,hi)

  34. CCCP for Latent SVM Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Repeat until convergence

  35. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions

  36. Action Classification Train Input xi Output yi Input x Jumping Phoning Playing Instrument Reading Riding Bike Output y = “Using Computer” Riding Horse Running PASCAL VOC 2011 Taking Photo 80/20 Train/Test Split UsingComputer 5 Folds Walking

  37. Setup • 0-1 loss function • Poselet-based feature vector • 4 seeds for random initialization • Code + Data • Train/Test scripts with hyperparameter settings http://www.centrale-ponts.fr/tutorials/cvpr2013/

  38. Objective

  39. Train Error

  40. Test Error

  41. Time

  42. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Annealing the Tolerance • Annealing the Regularization • Self-Paced Learning • Choice of Loss Function • Extensions

  43. Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Overfitting in initial iterations Repeat until convergence

  44. Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε’-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i ε’ = ε/K and ε’ = ε Repeat until convergence

  45. Objective

  46. Objective

  47. Train Error

  48. Train Error

  49. Test Error

  50. Test Error

More Related