Loss-based Visual Learning with Weak Supervision

Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, NouraAzzabou, Pierre Carlier

SPLENDID Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data Machine Learning Weak Annotations Noisy Annotations Applications Computer Vision Medical Imaging Nikos Paragios Equipe Galen INRIA Saclay Daphne Koller DAGS Stanford 2 Visits from INRIA to Stanford 2012 ICML 1 Visit from Stanford to INRIA 3 Visits Planned 2013 MICCAI

Medical Image Segmentation MRI Acquisitions of the thigh

Medical Image Segmentation MRI Acquisitions of the thigh Segments correspond to muscle groups

Random Walks Segmentation Probabilistic segmentation algorithm Computationally efficient Interactive segmentation Automated shape prior driven segmentation L. Grady, 2006 L. Grady, 2005; Baudin et al., 2012

Random Walks Segmentation x: Medical acquisition y(i,s): Probability that voxel ‘i’ belongs to segment ‘s’ miny E(x,y) = yTL(x)y + wshape||y-y0||2 Positive semi-definite Laplacian matrix Convex Shape prior on the segmentation Parameter of the RW algorithm Hand-tuned

Random Walks Segmentation Several Laplacians L(x) = Σα wαLα(x) Several shape and appearance priors Σβ wβ||y-yβ||2 Hand-tuning large number of parameters is onerous

Parameter Estimation Learn the best parameters from training data Σα wαyTLα(x)y+ Σβ wβ||y-yβ||2

Parameter Estimation Learn the best parameters from training data wTΨ(x,y) w is the set of all parameters Ψ(x,y) is the joint feature vector of input and output

Outline • Parameter Estimation • Supervised Learning • Hard vs. Soft Segmentation • Mathematical Formulation • Optimization • Experiments • Related and Future Work in SPLENDID

Supervised Learning Dataset of segmented fMRIs Sample xk, voxel i Probabilistic segmentation?? 1, s is ground-truth zk(i,s) = 0, otherwise

Supervised Learning minwΣkξk+ λ||w||2 wTΨ(xk,ŷ) - wTΨ(xk,zk) Δ(ŷ,zk) - ξk ≥ Energy of Segmentation Energy of Ground-truth Δ(ŷ,zk) = Fraction of incorrectly labeled voxels Structured-output Support Vector Machine Taskar et al., 2003; Tsochantardis et al., 2004

Supervised Learning Convex with several efficient algorithms No parameter provides ‘hard’ segmentation We only need a correct ‘soft’ probabilistic segmentation

Hard vs. Soft Segmentation Hard segmentation zk Don’t require 0-1 probabilities

Hard vs. Soft Segmentation Soft segmentation yk Compatible with zk Binarizingyk gives zk

Hard vs. Soft Segmentation Soft segmentation yk Compatible with zk yk C(zk) Which yk to use?? yk provided by best parameter Unknown

Learning with Hard Segmentation minwΣkξk+ λ||w||2 wTΨ(xk,ŷ) - wTΨ(xk,zk) Δ(ŷ,zk) - ξk ≥

Learning with Soft Segmentation minwΣkξk+ λ||w||2 wTΨ(xk,ŷ) - wTΨ(xk,yk) Δ(ŷ,zk) - ξk ≥

Learning with Soft Segmentation minwΣkξk+ λ||w||2 wTΨ(xk,ŷ) - minyk wTΨ(xk,yk) Δ(ŷ,zk) - ξk ≥ yk C(zk) Latent Support Vector Machine Smola et al., 2005; Felzenszwalb et al., 2008; Yu et al., 2009

Outline • Parameter Estimation • Optimization • Experiments • Related and Future Work in SPLENDID

Latent SVM minwΣkξk + λ||w||2 wTΨ(xk,ŷ) – minyk wTΨ(xk,yk) ≥Δ(ŷ,zk)– ξk yk C(zk) Difference-of-convex problem Concave-Convex Procedure (CCCP)

CCCP Estimate soft segmentation yk* = minyk wTΨ(xk,yk) s.t.yk C(zk) Efficient optimization using dual decomposition Update parameters minwΣkξk + λ||w||2 wTΨ(xk,ŷ) – wTΨ(xk,yk*) ≥Δ(ŷ,zk)– ξk Convex optimization Repeat until convergence

Dataset 30 MRI volumes of thigh Dimensions: 224 x 224 x 100 4 muscle groups + background 80% for training, 20% for testing

Parameters 4 Laplacians 2 shape priors 1 appearance prior Baudin et al., 2012 Grady, 2005

Baselines Hand-tuned parameters Structured-output SVM Hard segmentation Soft segmentation based on signed distance transform

Results Small but statistically significant improvement

Loss-based Learning x: Input a: Annotation

Loss-based Learning x: Input a: Annotation h: Hidden information h h = “soft-segmentation” a = “jumping”

Loss-based Learning Annotation Mismatch minΣkΔ(correct ak, predicted ak) x: Input a: Annotation h: Hidden information h h = “soft-segmentation” a = “jumping”

Loss-based Learning Annotation Mismatch minΣkΔ(correct ak, predicted ak) Small improvement using small medical dataset

Loss-based Learning Annotation Mismatch minΣkΔ(correct ak, predicted ak) Large improvement using large vision dataset

Loss-based Learning Output Mismatch Modeled using a distribution minΣkΔ(correct {ak,hk}, predicted {ak,hk}) Inexpensive annotation No experts required Richer models can be learnt Kumar, Packer and Koller, ICML 2012

Questions?

Loss-based Visual Learning with Weak Supervision

Loss-based Visual Learning with Weak Supervision

Presentation Transcript

Strengths-based Supervision

Strengths-based Supervision

Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Gradual visual loss

Perioperative Visual Loss

ACUTE VISUAL LOSS

Loss-based Learning with Latent Variables

Chronic Visual Loss

CHRONIC VISUAL LOSS

Loss-based Learning with Weak Supervision

Loss-based Learning with Weak Supervision

Acute Visual loss

Risk-based Supervision

Visual Loss

Unexplained Visual Loss

Chronic Visual Loss

Risk-based Supervision

Learning Data Representations with “ Partial Supervision ”

Postoperative Visual Loss

Training Discriminative Computer Vision Models with Weak Supervision

Large-Scale Object Recognition with Weak Supervision

Risk-based Supervision