1 / 80

Loss-based Learning with Weak Supervision

Loss-based Learning with Weak Supervision. M. Pawan Kumar. About the Talk. Methods that use latent structured SVM A little math-y Initial stages. Outline. Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI.

heman
Download Presentation

Loss-based Learning with Weak Supervision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loss-based Learning with Weak Supervision M. Pawan Kumar

  2. About the Talk • Methods that use latent structured SVM • A little math-y • Initial stages

  3. Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

  4. Weakly Supervised Data x Input x h Output y  {-1,+1} Hidden h y = +1

  5. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1

  6. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = y = +1 0

  7. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = y = +1 Φ(x,h)

  8. Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h

  9. Latent SSVM Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)

  10. Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} w* = argminwΣiΔ(yi,yi(w)) Minimize empirical risk specified by loss function Highly non-convex in w Cannot regularize w to prevent overfitting

  11. Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w),hi(w)) ≤ wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - maxhiwTΨ(x,yi,hi) ≤ maxy,h{wTΨ(x,y,h) + Δ(yi,y)} - maxhiwTΨ(x,yi,hi)

  12. Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi Difference-of-convex program in w Local minimum or saddle point solution (CCCP)

  13. CCCP Start with an initial estimate of w Impute hidden variables Loss independent hi*= argmaxhwTΨ(xi,yi,h) Update w Loss dependent minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - wTΨ(xi,yi,hi*)≤ ξi Repeat until convergence

  14. Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi

  15. Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Joint Work with AseemBehl and C. V. Jawahar

  16. Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1

  17. Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1 Accuracy = 1 Average Precision = 0.92 Average Precision = 0.81 Accuracy = 0.67

  18. Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly

  19. Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Yue, Finley, Radlinski and Joachims, 2007

  20. Supervised Learning - Input P N = {HP,HN} Training images X Bounding boxes H

  21. Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*

  22. SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})

  23. Prediction using SSVM Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Sort by value of sample score wTΦ(xi,hi) Same as standard binary SVM

  24. Learning SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction

  25. Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP,HN})

  26. Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN})

  27. Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ

  28. Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ Loss Augmented Inference

  29. Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank positives according to sample scores

  30. Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank negatives according to sample scores

  31. Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Slide best negative to a higher rank Terminate after considering last negative Continue until score stops increasing Slide next negative to a higher rank Continue until score stops increasing Optimal loss augmented inference

  32. Recap Scoring function wTΨ(X,Y,{HP,HN}) Prediction Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Learning Using optimal loss augmented inference

  33. Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI

  34. Weakly Supervised Learning - Input Training images X

  35. Weakly Supervised Learning - Latent Bounding boxes HP Training images X All bounding boxes in negative images are negative

  36. Intuitive Prediction Procedure Select the best bounding boxes in all images

  37. Intuitive Prediction Procedure Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank them according to their sample scores

  38. Weakly Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*

  39. Latent SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})

  40. Prediction using Latent SSVM maxY,HwTΨ(X,Y, {HP,HN})

  41. Prediction using Latent SSVM maxY,HwTΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted

  42. Learning Latent SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction

  43. Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP(w),HN(w)})

  44. Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN}) maxH

  45. Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH

  46. Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH Loss Augmented Inference Cannot be solved optimally

  47. Recap Unintuitive prediction Unintuitive objective function Non-optimal loss augmented inference Can we do better?

  48. Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI

  49. Latent AP-SVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})

  50. Prediction using Latent AP-SSVM Choose best bounding box for all samples hi(w) = argmaxhwTΦ(xi,h) Optimize over the ranking Y(w) = argmaxYwTΨ(X,Y, {HP(w),HN(w)}) Sort by sample scores

More Related