1 / 1

Restrict learning to a model-dependent “easy” set of samples General form of objective:

Small K. Medium K. Large K. Objective. Training Error (%). Test Error (%). CCCP. SPL. h. x. y. h. x. 1 vs. 7. 2 vs. 7. 3 vs. 8. 8 vs. 9. Iteration 1. Iteration 3. Iteration 5. Iteration 7. Self-Paced Learning for Latent Variable Models.

Download Presentation

Restrict learning to a model-dependent “easy” set of samples General form of objective:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Small K Medium K Large K Objective Training Error (%) Test Error (%) CCCP SPL h x y h x 1 vs. 7 2 vs. 7 3 vs. 8 8 vs. 9 Iteration 1 Iteration 3 Iteration 5 Iteration 7 Self-Paced Learning for Latent Variable Models M. Pawan Kumar, Ben Packer, and Daphne Koller Self-Paced Learning Experiments Aim: To learn an accurate set of parameters for latent variable models Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” vi: K determines threshold for a set being easy, which is annealed over successive iterations until all samples used Compare Self-Paced Learning to standard CCCP as in [2] Object Classification – Mammals Dataset Motivation Image label y is object class only, h is bounding box Ψ(xi,yi,hi) is HOG features in bounding box (offset by class) minwr(w) + i li(w) • Intuitions from Human Learning: • all information at once may be confusing => bad local minima • start with “easy” examples the learner is prepared to handle Objective Training Error (%) Test Error (%) minwr(w) + ivili(w) – 1/K ivi Standard Learning Self-Paced Learning CCCP ?? Got it! Okay… SPL Motif Finding – UniProbe Dataset Bengio et al. [1]: user-specified ordering “Self-paced” schedule of examples is automatically set by learner x is DNA sequence, h is motif position, y is binding affinity • “easy for human”  “easy for computer” • “easy for Learner A” “easy for Learner B” • task-specific • onerous on user Optimization Initialize K to be large Iterate: Run inference over h Alternatively update w and v: v set by sorting li(w), comparing to threshold 1/K Perform normal update for w over subset of data Until convergence Anneal K  K/μ Until all vi = 1, cannot reduce objective within tolerance Latent Variable Models x : input or observed variables y : output or observed variables h : hidden/latent variables Handwriting Recognition - MNIST x is raw image, y is digit, h is image rotation, use linear kernel y = “Deer” Easier subsets in early iterations, avoids learning from samples whose hidden variables are imputed incorrectly h = Bounding Box Learning Latent Variable Models Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Noun Phrase Coreference – MUC6 x consists of pairwise features between pairs of nouns y is a clustering of nouns h specifies a forest of nouns s.t. each tree is a cluster of nouns Expectation-Maximization for Maximum Likelihood • Maximize log likelihood: • maxwi log P(xi,yi;w) • Iterate: • Find expected value of hidden variables using current w • Update w to maximize log likelihood subject • to this expectation Object Classification Latent Struct SVM [2] • Minimize upper bound on risk minw ||w||2 + C·i maxy’,h’ [w·Ψ(xi,y’,h’) + Δ(yi,y’,h’)] • - C·i maxh [w·Ψ (xi,yi,h)] • Iterate: • Impute hidden variables • Update weights to minimize upper bound on risk given these hidden variables Discussion • Self-paced strategy outperforms state of the art • Global solvers for biconvex optimization may improve accuracy • Method is ideally suited to handle multiple levels of annotations • [1] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009. • [2] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009. Motif Finding

More Related