1.43k likes | 1.56k Views
This paper explores the challenges of annotation mismatch and output mismatch in weakly supervised learning. Annotation mismatch occurs when desired outputs differ from available annotations, while output mismatch involves discrepancies between predicted outputs and annotations. We introduce the Latent Support Vector Machine (Latent SVM) framework to address these issues, detailing optimization techniques and practical implementations. The paper also discusses various extensions and applications, highlighting the significance of learning in scenarios where annotation is noisy or incomplete, an essential aspect for advancing computer vision methodologies.
E N D
Loss-based Learning with Weak Supervision M. Pawan Kumar
Computer Vision Data Log (Size) ~ 2000 Segmentation Information
Computer Vision Data ~ 1 M Log (Size) Bounding Box ~ 2000 Segmentation Information
Computer Vision Data > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information “Chair” “Car”
Computer Vision Data > 6 B Noisy Label > 14 M ~ 1 M Log (Size) Image-Level Bounding Box ~ 2000 Segmentation Information
Computer Vision Data Detailed annotation is expensive Sometimes annotation is impossible Desired annotation keeps changing Learn with missing information (latent variables)
Outline • Two Types of Problems • Part I – Annotation Mismatch • Part II – Output Mismatch
Annotation Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is y Mismatch between desired and available annotations Exact value of latent variable is not “important”
Output Mismatch Action Classification x h Input x Annotation y Latent h y = “jumping”
Output Mismatch Action Detection x h Input x Annotation y Latent h y = “jumping” Desired outputduring test time is (y,h) Mismatch between output and available annotations Exact value of latent variable is important
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
Weakly Supervised Data x Input x h Output y {-1,+1} Hidden h y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = 0 y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = Φ(x,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h
Latent SVM Scoring function Parameters wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SVM Training data {(xi,yi), i= 1,2,…,n} (yi, yi(w)) Σi minw Empirical risk minimization No restriction on the loss function Annotation mismatch
Learning Latent SVM Find a regularization-sensitive upper bound (yi, yi(w)) Σi minw Empirical risk minimization Non-convex Parameters cannot be regularized
Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -wT(xi,yi(w),hi(w))
Learning Latent SVM (yi, yi(w)) • wT(xi,yi(w),hi(w)) + • -maxhiwT(xi,yi,hi) y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Parameters can be regularized Is this also convex?
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Convex - Convex Difference of convex (DC) program
Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions
Learning Latent SVM • minw ||w||2 + C Σiξi (yi, y) • maxy,h • wT(xi,y,h) + • ≤ ξi • -maxhiwT(xi,yi,hi) Difference of convex (DC) program
Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Optimize the convex upper bound • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Linear upper-bound of concave part • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Until Convergence • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Concave-Convex Procedure + Linear upper bound? • maxy,h -maxhi (yi, y) • wT(xi,yi,hi) • wT(xi,y,h) +
Linear Upper Bound • -maxhiwT(xi,yi,hi) Current estimate = wt • hi* = argmaxhiwtT(xi,yi,hi) • -wT(xi,yi,hi*) • ≥ -maxhiwT(xi,yi,hi)
CCCP for Latent SVM Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Repeat until convergence
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Extensions
Action Classification Train Input xi Output yi Input x Jumping Phoning Playing Instrument Reading Riding Bike Output y = “Using Computer” Riding Horse Running PASCAL VOC 2011 Taking Photo 80/20 Train/Test Split UsingComputer 5 Folds Walking
Setup • 0-1 loss function • Poselet-based feature vector • 4 seeds for random initialization • Code + Data • Train/Test scripts with hyperparameter settings http://www.centrale-ponts.fr/tutorials/cvpr2013/
Outline – Annotation Mismatch • Latent SVM • Optimization • Practice • Annealing the Tolerance • Annealing the Regularization • Self-Paced Learning • Choice of Loss Function • Extensions
Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i Overfitting in initial iterations Repeat until convergence
Start with an initial estimate w0 hi* = argmaxhiHwtT(xi,yi,hi) Update Update wt+1as the ε’-optimal solution of min ||w||2 + C∑i i wT(xi,yi,hi*) - wT(xi,y,h) ≥ (yi, y) - i ε’ = ε/K and ε’ = ε Repeat until convergence