1 / 57

Ranking with High-Order and Missing Information

Ranking with High-Order and Missing Information. M. Pawan Kumar Ecole Centrale Paris. Aseem Behl. Puneet Kumar. Pritish Mohapatra. C. V. Jawahar. PASCAL VOC. “Jumping” Classification. Processing. Features. Training. Classifier. PASCAL VOC. “Jumping” Classification. Processing.

dex
Download Presentation

Ranking with High-Order and Missing Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ranking with High-Orderand Missing Information M. Pawan Kumar Ecole Centrale Paris AseemBehl Puneet Kumar PritishMohapatra C. V. Jawahar

  2. PASCAL VOC “Jumping” Classification Processing Features Training Classifier

  3. PASCAL VOC “Jumping” Classification Processing Features ✗ Training Classifier Think of a classifier !!!

  4. PASCAL VOC “Jumping” Ranking Processing Features ✗ Training Classifier Think of a classifier !!!

  5. Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1

  6. Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 = 0.92 = 0.81 Average Precision = 1 Accuracy = 1 = 0.67

  7. Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!

  8. Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information • Related Work Taskar, Guestrinand Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachimsand Altun, ICML 2004

  9. Structured Output SVM Input x Output y Joint Feature Ψ(x,y) Scoring function s(x,y;w)=wTΨ(x,y) Prediction y(w) = argmaxy s(x,y;w)

  10. Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} Loss function for i-th sample Δ(yi,yi(w)) Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur)

  11. Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} wTΨ(x,yi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w)) - wTΨ(x,yi) ≤wTΨ(x,yi(w)) + Δ(yi,yi(w)) ≤maxy{ wTΨ(x,y) + - wTΨ(x,yi) Δ(yi,y) } Sensitive to regularization of w Convex

  12. Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi)≤ ξi Quadratic program, which only requires cutting planes maxy{ wTΨ(x,y) + Δ(yi,y) }

  13. Parameter Estimation Training data {(xi,yi), i= 1,2,…,m} minw ||w||2 + C Σiξi for all y s(x,y;w) + Δ(yi,y) - s(x,yi;w)≤ ξi Quadratic program, which only requires cutting planes maxy{ s(x,y;w) + Δ(yi,y) }

  14. Recap • Problem Formulation • Input • Output • Joint Feature Vector or Scoring Function • Learning Formulation • Loss function (‘test’ evaluation criterion) • Optimization for Learning • Cutting plane (loss-augmented inference) • Prediction • Inference

  15. Outline • Structured Output SVM • Optimizing Average Precision (AP-SVM) • High-Order Information • Missing Information • Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007

  16. Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k  N

  17. Problem Formulation Single Output R +1 if i is better ranked than k Rik = -1 if k is better ranked than i

  18. Problem Formulation Scoring Function si(w) = wTΦ(xi) for all i P sk(w) = wTΦ(xk) for all k  N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))

  19. Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

  20. Optimization for Learning Cutting Plane Computation Optimal greedy algorithm is O(|P||N|) run time. Yue, Finley, Radlinski and Joachims, SIGIR 2007

  21. Ranking Sort in decreasing order of individual score si(w) Yue, Finley, Radlinski and Joachims, SIGIR 2007

  22. Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking

  23. AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes

  24. AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes

  25. Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information (M4-AP-SVM) • Missing Information • Related Work Kumar, Behl, Jawahar and Kumar, Submitted

  26. High-Order Information • People perform similar actions • People strike similar poses • Objects are of same/similar sizes • “Friends” have similar habits • How can we use them for ranking? classification

  27. Problem Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Ψ1(x,y) Unary Features Ψ(x,y) = Ψ2(x,y) Pairwise Features

  28. Learning Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Δ(y*,y) = Fraction of incorrectly classified persons

  29. Optimization for Learning x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

  30. Classification x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxywTΨ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

  31. Ranking? x Input x = {x1,x2,x3} Output y = {-1,+1}3 Use difference of max-marginals

  32. Max-Marginal for Positive Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is positive mm+(i;w) = maxy,yi=+1wTΨ(x,y) Convex in w

  33. Max-Marginal for Negative Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is negative mm-(i;w) = maxy,yi=-1wTΨ(x,y) Convex in w

  34. Ranking x Input x = {x1,x2,x3} Output y = {-1,+1}3 HOB-SVM Use difference of max-marginals si(w) = mm+(i;w) – mm-(i;w) Difference-of-Convex in w

  35. Ranking Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM si(w) = mm+(i;w) – mm-(i;w)

  36. Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k  N

  37. Problem Formulation Single Input R +1 if i is better ranked than k Rik = -1 if k is better ranked than i

  38. Problem Formulation Scoring Function si(w) = mm+(i;w) – mm-(i;w) for all i P sk(w) = mm+(k;w) – mm-(k;w) for all k  N S(X,R;w) = Σi PΣk NRik(si(w) - sk(w))

  39. Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

  40. Optimization for Learning Difference-of-convex program Very efficient CCCP Linearization step by Dynamic Graph Cuts Kohli and Torr, ECCV 2006 Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted

  41. Ranking Sort in decreasing order of individual score si(w)

  42. Experiments Images Classes PASCAL VOC 2011 Jumping 10 ranking tasks Phoning Playing Instrument Poselets Features Reading Riding Bike Riding Horse Cross-validation Running Taking Photo UsingComputer Walking

  43. HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes

  44. HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes

  45. M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 7, worse in 2 and tied in 1 class

  46. M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes

  47. Outline • Structured Output SVM • Optimizing Average Precision • High-Order Information • Missing Information (Latent-AP-SVM) • Related Work Behl, Jawahar and Kumar, CVPR 2014

More Related