loss based learning with weak supervision n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Loss-based Learning with Weak Supervision PowerPoint Presentation
Download Presentation
Loss-based Learning with Weak Supervision

Loading in 2 Seconds...

play fullscreen
1 / 142
arden-foley

Loss-based Learning with Weak Supervision - PowerPoint PPT Presentation

92 Views
Download Presentation
Loss-based Learning with Weak Supervision
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Loss-based Learning with Weak Supervision M. Pawan Kumar

  2. Computer Vision Data Log (Size) ~ 2000 Segmentation Information

  3. Computer Vision Data ~ 1 M Log (Size) Bounding Box ~ 2000 Segmentation Information

  4. Computer Vision Data > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information “Chair” “Car”

  5. Computer Vision Data > 6 B Noisy Label > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information

  6. Computer Vision Data Detailed annotation is expensive Sometimes annotation is impossible Desired annotation keeps changing Learn with missing information (latent variables)

  7. Outline • Two Types of Problems • Part I – Annotation Mismatch • Part II – Output Mismatch

  8. Annotation Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is y Mismatch between desired and available annotations Exact value of latent variable is not “important”

  9. Output Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping”

  10. Output Mismatch Action Detection x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is (y,h) Mismatch between output and available annotations Exact value of latent variable is important

  11. Part I

  12. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

  13. Weakly Supervised Data x Input x h Output y  {-1,+1} Hidden h y = +1

  14. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1

  15. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = 0 y = +1

  16. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = Φ(x,h) y = +1

  17. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h

  18. Latent SVM Scoring function Parameters wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  19. Learning Latent SVM Training data {(xi,yi), i= 1,2,…,n} (yi, yi(w)) Σi minw Empirical risk minimization No restriction on the loss function Annotation mismatch

  20. Learning Latent SVM Find a regularization-sensitive upper bound (yi, yi(w)) Σi minw Empirical risk minimization Non-convex Parameters cannot be regularized

  21. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -wT(xi,yi(w),hi(w))

  22. Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -maxhiwT(xi,yi,hi) y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  23. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Parameters can be regularized Is this also convex?

  24. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Convex - Convex Difference of convex (DC) program

  25. Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi

  26. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions

  27. Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Difference of convex (DC) program

  28. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  29. Concave-Convex Procedure + Optimize the convex upper bound • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  30. Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  31. Concave-Convex Procedure + Until Convergence • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  32. Concave-Convex Procedure + Linear upper bound? • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +

  33. Linear Upper Bound • -maxhiwT(xi,yi,hi) Current estimate = wt • hi* = argmaxhiwtT(xi,yi,hi) • -wT(xi,yi,hi*) • ≥ -maxhiwT(xi,yi,hi)

  34. CCCP for Latent SVM Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Repeat until convergence

  35. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions

  36. Action Classification Train Input xi Output yi Input x Jumping Phoning Playing Instrument Reading Riding Bike Output y = “Using Computer” Riding Horse Running PASCAL VOC 2011 Taking Photo 80/20 Train/Test Split UsingComputer 5 Folds Walking

  37. Setup • 0-1 loss function • Poselet-based feature vector • 4 seeds for random initialization • Code + Data • Train/Test scripts with hyperparameter settings http://www.centrale-ponts.fr/tutorials/cvpr2013/

  38. Objective

  39. Train Error

  40. Test Error

  41. Time

  42. Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Annealing the Tolerance • Annealing the Regularization • Self-Paced Learning • Choice of Loss Function • Extensions

  43. Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Overfitting in initial iterations Repeat until convergence

  44. Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε’-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i ε’ = ε/K and ε’ = ε Repeat until convergence

  45. Objective

  46. Objective

  47. Train Error

  48. Train Error

  49. Test Error

  50. Test Error