1 / 70

Asymmetric Boosting for Face Detection

Asymmetric Boosting for Face Detection. presented by. Minh-Tri Pham Ph.D. Candidate and Research Associate Nanyang Technological University, Singapore. Overview. Online Asymmetric Boosting Fast Training and Selection of Haar -like Features using Statistics

willow
Download Presentation

Asymmetric Boosting for Face Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asymmetric Boosting for Face Detection presented by Minh-Tri PhamPh.D. Candidate and Research AssociateNanyang Technological University, Singapore

  2. Overview • Online Asymmetric Boosting • Fast Training and Selection of Haar-like Features using Statistics • Choosing Goal for Asymmetric Boosting • Margin-based Asymmetric Error Bounds

  3. Online Asymmetric Boosting CVPR’07 oral paper: Minh-Tri Pham and Tat-Jen Cham. Online Learning Asymmetric Boosted Classifiers for Object Detection. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007.

  4. Motivation • Usual goal of object detector: • Focused on accuracy • General detectors are designed to deal with different input spaces • Only one input space is used per application global input space Offline learned non-object region input space 1 Online learned? input space 3 object region input space 2

  5. Supervisor-Student paradigm Supervisor: Slow but general Supervisor Detector Fast but limited Student Detector Input Output Student: • Supervisor = existing object detector • Student = online-learned object detector • Less complex model • Faster detection speed

  6. Problem overview • Common appearance-based approach: • Classify a patch using a cascade or tree of boosted classifiers (Viola-Jones and variants): • F1, F2, …, FN: boosted classifiers • Main challenges for online learning a boosted classifier: • Asymmetric: P(non-object) >> P(object) • Online data pass pass pass pass F1 F2 FN object …. reject reject reject non-object

  7. Review of current methods • P(non-object) >> P(object): • Viola and Jones (2002) • Ma and Ding (2003) • Hou et. al. (2006) • Reweigh positives higher and negatives lower • Offline learning only • Online learning for boosting: • Online Boosting of Oza (2005) • Replace offline weak learners with online weak learners • Propagate weights similarly to AdaBoost • Only works well when P(non-object) ≈ P(object) • Asymmetric Online Boosting • Incorporate asymmetric reweighing scheme into Online Boosting • Skewness balancing: • New reweighting scheme giving better accuracy • Polarity balancing: • Faster learning convergence rate

  8. Skewness balancing • Skewness: • Measure the degree of asymmetry of the class probability distribution • Defined as: •  = logP(negative) – logP(positive) • Viola-Jones’ reweighing scheme: • Reweigh positives the same amount more than negatives on every weak learner • km = reweighing amount on the m-th weak learner • k = total reweighing amount Initial skewness: 1 > 0 After reweighing: 1’ = 1 - log k1 After training weak learner 1: 2 ≈ 0 After training weak learner 2: 3 ≈ 0 After training weak learner 3: 4 ≈ 0 skewness After reweighing: 2’ = 2 - log k2 After reweighing: 3’ = 3 - log k3 After reweighing: 4’ = 4 - log k4 : negative example : positive example weak learners

  9. Skewness balancing • Our approach: • Reweigh positives more than negatives differently, to have equal skewness presented to every weak learner • m = skewness after training the (m-1)-th weak learner • km = reweighing amount on the m-th weak learner • k = total reweighing amount Initial skewness: 1 > 0 After training weak learner 1: 2 ≈ 0 After training weak learner 2: 3 ≈ 0 After training weak learner 3: 4 ≈ 0 After reweighing: 1’ = 1 - log k1 After reweighing: 2’ = 2 - log k2 After reweighing: 3’ = 3 - log k3 After reweighing: 4’ = 4 - log k4 skewness : negative example : positive example weak learners

  10. Skewness balancing • Effective for initial boosted classifiers in the cascade • Better accuracy  faster detection speed • Effectiveness degraded as boosted classifiers get more complicated ROC curve for 4-feature boosted classifier ROC curve for 200-feature boosted classifier

  11. Polarity balancing positive negative • After training a weak learner with AdaBoost: •  classified weights = mis-classified weights • positive weights =  negative weights (if weak learner is optimal) • To maintain onlineAdaBoost’s properties: • Online Boosting ensures asymptotically: •  classified weights =  mis-classified weights • Our method ensures asymptotically: •  classified weights =  mis-classified weights •  positive weights =  negative weights  Faster convergence rate TP TN Correctly classified FN FP Wrongly classified Weight distribution after training a weak learner

  12. Polarity balancing • Learning time: • About 5-30% faster with Polarity balancing Online Learning a 20-feature boosted classifier

  13. Overall performance • ROC curves: • Similar results

  14. Online Learning a Face Detector • Video clip: • Length: 20 minutes • Resolution: 352x288 • 25fps • Learn online from the first 10 minutes • using OpenCV’s face detector as supervisor • Test with the remaining 10 minutes OpenCV’s face detector Detection speed: 15fps Our online-learned face detector Detection speed: 30fps

  15. Online Learning a Face Detector • Distribution of weak learners over the cascade:

  16. Concluding remarks • Skewness balancing: • Effective for early boosted classifiers • Better accuracy  faster detection speed • Polarity balancing: • Reduction in learning time about 5-30% empirically • Online learning an object detector from an offline counterpart: • Worst case: • detection accuracy and speed similar • Average case: • detection speed can be faster (twice as much)

  17. Fast Training and Selection of Haar-like Features using Statistics • ICCV’07 oral paper: • Minh-Tri Pham and Tat-Jen Cham. Fast Training and Selection of Haar Features using Statistics in Boosting-based Face Detection. In Proc. International Conference on on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007. • Won Travel Grant Award • Won Second Prize, Best Student Paper in Year 2007 Award, Pattern Recognition and Machine Intelligence Association (PREMIA), Singapore

  18. Motivation • Face detectors today • Real-time detection speed …but… • Weeks of training time

  19. Why is Training so Slow? • Time complexity: O(MNT log N) • 15ms to train a feature classifier • 10 minutes to train a weak classifier • 27 days to train a face detector

  20. Why Should the Training Time be Improved? • Tradeoff between time and generalization • E.g. training 100 times slower if we increase both N and T by 10 times • Trial and error to find key parameters for training • Much longer training time needed • Online-learning face detectors have the same problem

  21. Existing Approaches to Reduce the Training Time • Sub-sample Haar-like feature set • Simple but loses generalization • Use histograms and real-valued boosting (B. Wu et. al. ‘04) • Pro: Reduce from O(MNT log N) to O(MNT) • Con: Raise overfitting concerns: • Real AdaBoost not known to be overfitting resistant • Weak classifier may overfit if too many histogram bins are used • Pre-compute feature values’ sorting orders (J. Wu et. al. ‘07) • Pro: Reduce from O(MNT log N) to O(MNT) • Con: Require huge memory storage • For N = 10,000 and T = 40,000, a total of 800MB is needed.

  22. Why is Training so Slow? • Time complexity: O(MNT log N) • 15ms to train a feature classifier • 10min to train a weak classifier • 27 days to train a face detector • Bottleneck: • At least O(NT)to train a weak classifier • Can we avoid O(NT)?

  23. Our Proposal • Fast StatBoost: To train feature classifiers using statisticsrather than using input data • Con: • Less accurate … but not critical for a feature classifier • Pro: • Much faster training time: • Constant time instead of linear time

  24. Fast StatBoost Non-face Face • Training feature classifiers using statistics: • Assumption: feature value v(t) is normally distributed given face class c is known • Closed-form solution for optimal threshold • Fast linear projectionsof the statistics of a window’s integral image into 1D statistics of a feature value Optimal threshold Feature value : mean and variance of feature value v(t) : random vector representing a window’s integral image : mean vector and covariance matrix of : Haar-like feature, a sparse vector with less than 20 non-zero elements  constant time to train a feature classifier

  25. Fast StatBoost • Integral image’s statistics are obtained directly from the weighted input data • Input: N training integral images and their current weights w(m): • We compute: • Sample total weight: • Sample mean vector: • Sample covariance matrix:

  26. Fast StatBoost • To train a weak classifier: • Extract the class-conditional integral image statistics • Time complexity: O(Nd2) • Factor d2 negligible because fast algorithms exist, hence in practice: O(N) • Train T feature classifiers by projecting the statistics into 1D: • Time complexity: O(T) • Select the best feature classifier • Time complexity: O(T) • Time complexity: O(N+T)

  27. (3) (4) (5) (6) (17) (7) Experimental Results Edge features: Corner features: • Setup • Intel Pentium IV 2.8GHz • 19 types  295,920 Haar-like features • Time for extracting the statistics: • Main factor: covariance matrices • GotoBLAS: 0.49 seconds per matrix • Time for training T features: • 2.1 seconds (1) (2) Diagonal line features: (10) (11) (12) (13) (8) (9) Line features: Center-surround features: (15) (18) (19) (14) Nineteen feature types used in our experiments (16) • Total training time: 3.1 secondsper weak classifier with 300K features • Existing methods: 1-10 minutes with 40K features or fewer

  28. Experimental Results • Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known implementation of Viola-Jones’ framework:

  29. Experimental Results • Performance of a cascade: ROC curves of the final cascades for face detection

  30. Conclusions • Fast StatBoost: use of statistics instead of input data to train feature classifiers • Time: • Reduction of the face detector training time from up to a month to 3 hours • Significant gain in both N and T with little increase in training time • Due to O(N+T) per weak classifier • Accuracy: • Even better accuracy for face detector • Due to much more members of Haar-like features explored

  31. Detection with Multi-exit Asymmetric Boosting • CVPR’08 poster paper: • Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008. • Won Travel Grant Award

  32. Problem overview pass pass pass pass • Common appearance-based approach: • F1, F2, …, FN : boosted classifiers • f1,1, f1,2, …, f1,K : weak classifiers •  : threshold F1 F2 FN object …. reject reject reject non-object F1 yes + + + >  ? f1,K pass …. f1,1 f1,2 no reject

  33. Objective • Find f1,1, f1,2, …, f1,K, and  such that: • K is minimized  proportional to F1’s evaluation time F1 yes + + + >  ? f1,K pass …. f1,1 f1,2 no reject

  34. Existing trends (1) Idea • For k from 1 until convergence: • Let • Learn new weak classifier f1,k(x): • Let • Adjust  to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0: • Break loop if such  exists Issues • Weak classifiers are sub-optimalw.r.t. training goal. • Too many weak classifiers are required in practice.

  35. Existing trends (2) Idea • For k from 1 until convergence: • Let • Learn new weak classifier f1,k(x): • Break loop if FAR(F1) <= 0 and FRR(F1) <= 0 Pros • Reduce FRR at the cost of increasing FAR – acceptable for cascades • Fewer weak classifiers Cons • How to choose ? • Much longer training time Solution to con • Trial and error: • choose such that K is minimized.

  36. Our solution Why? Learn every weak classifier using the same asymmetric goal: where

  37. Because… FAR FAR (1) 1 1 (2) • Consider two desired bounds (or targets) for learning a boosted classifier • Exact bound: and • Conservative bound: • (2) is more conservative than (1) because (2) => (1). H1  = 0/0  = 1 H2 H3 H1 H4 H2 exact bound conservative bound exact bound conservative bound Q2 H39 0 0 H3 H40 Q1 Q1 Q3 Q4 Q2 Q3 H200 H201 H41 Q39 Q200 Q40 At for every new weak classifier learned, the ROC operating point moves the fastest toward the conservative bound Q201 Q41 0 b0 b0 1 FRR 0 1 FRR

  38. Multi-exit Boosting A method to train a single boosted classifier with multiple exit nodes: : a weak classifier: a weak classifier followed by a decision to continue or reject – an exit node + + + + + + + f1 f2 f3 f4 f5 f6 f7 f8 object pass pass pass F2 F3 F1 reject reject reject non-obj fi fi • Features: • Weak classifiers are trained with the same goal: • Every pass/reject decision is guaranteed with and • The classifier is a cascade. • Score is propagated from one node to another. • Main advantages: • Weak classifiers are learned (approximately) optimally. • No training of multiple boosted classifiers. • Much fewer weak classifiers are needed than traditional cascades.

  39. ResultsGoal () vs. Number of weak classifiers (K) • Toy problem:To learn a (single-exit) boosted classifier F for classifying face/non-face patches such that FAR(F) < 0.8 and FRR(F) < 0.01 • Best goal: • Ours chooses: • Similar results were obtained for tests on other desired error rates.

  40. Ours vs. Others (in Face Detection) • Use Fast StatBoost as base method for fast-training a weak classifier.

  41. Ours vs. Others (in Face Detection) • MIT+CMU Frontal Face Test set:

  42. Conclusion • Multi-exit Asymmetric Boosting trains every weak classifier approximately optimally. • Better accuracy • Much fewer weak classifiers • Significantly reduces training time • No more trial-and-error for training a boosted classifier

  43. Margin-based Bounds on Asymmetric Error • CVPR’08 poster paper: • Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008. • Won Travel Grant Award

  44. Summary • Online Asymmetric Boosting • Integrates Asymmetric Boosting with Online Learning • Fast Training and Selection of Haar-like Features using Statistics • Dramatically reduce training time from weeks to a few hours • Multi-exit Asymmetric Boosting • Approximately minimizes the number of weak classifiers

  45. Thank You!

  46. Backup Slides

  47. Online Asymmetric Boosting CVPR’07 oral paper: Minh-Tri Pham and Tat-Jen Cham. Online Learning Asymmetric Boosted Classifiers for Object Detection. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007.

  48. AdaBoost (Freund-Schapire’96) Wrongly classified Offline Weak Learner 1 Offline Weak Learner 2 Correctly classified Wrongly classified Correctly classified Stage 1 Stage 2 : negative example : positive example

  49. Asymmetric Boost (Viola-Jones’02) • To address P(non-object) >> P(object): • Weight positives k times more than negatives Offline Weak Learner 1 Offline Weak Learner 2 Stage 1 Stage 2 : negative example : positive example

  50. Online Boosting (Oza-Rusell’01) • To learn data online: • If wrongly classified: increase weight; otherwise : decrease weight Wrongly classified Online Weak Learner 1 Online Weak Learner 2 Correctly classified : negative example : positive example

More Related