Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features Qing Chen Discover Lab, SITE, University of Ottawa May 2, 2006
Outline • 1. Introduction • 2. Haar-like features • 3. Adaboost • 4. The Cascade of Classifiers • 5. Preliminary Results • 6. Future Work
1. Introduction • Hand-based Human Computer Interface (HCI) should meet the requirements of real-time, accuracy and robustness. • The purpose of Haar-like features is to meet the real-time requirement. • The purpose of the cascade of Adaboosted (Adaptive boost) classifiers is to achieve both accuracy and speed. • The algorithm has been used for face detection which achieved high detection accuracy and approximately 15 times faster than any previous approaches. • The algorithm is a generic objects detection/recognition method.
2. Haar-Like Features • Each Haar-like feature consists of two or three jointed “black” and “white” rectangles: • The value of a Haar-like featureis the difference between the sum of the pixel gray level values within the black and white rectangular regions: f(x)=Sumblack rectangle (pixel gray level) – Sumwhite rectangle (pixel gray level) • Compared with raw pixel values, Haar-like features can reduce/increase the in-class/out-of-class variability, and thus making classification easier. Figure 1: A set of basic Haar-like features. Figure 2: A set of extended Haar-like features.
A B P1 P2 D C P3 P4 P (x, y) 2. Haar-Like Features (cont’d) The rectangle Haar-like features can be computed rapidly using “integral image”. Integral image at location of x, y contains the sum of the pixel values above and left of x, y, inclusive: The sum of pixel values within “D”:
2. Haar-Like Features (cont’d) • To detect the hand, the image is scanned by a sub-window containing a Haar-like feature. • Based on each Haar-like feature fj , a weak classifier hj(x) is defined as: where x is a sub-window, and θis a threshold. pj indicating the direction of the inequality sign.
3. Adaboost • The computation cost using Haar-like features:Example: original image size: 320X240, sub-window size: 24X24, frame rate: 15 frame/second,The total number of sub-windows with one Haar-like feature per second: (320-24+1)X(240-24+1)X15=966,735 Considering the scaling factor and the total number of Haar-like features, the computation cost is huge. • AdaBoost (Adaptive Boost) is an iterative learning algorithm to construct a “strong” classifier using only a training set and a “weak” learning algorithm. A “weak” classifier with the minimum classification error is selected by the learning algorithm at each iteration. • AdaBoost is adaptive in the sense that later classifiers are tuned up in favor of those sub-windows misclassified by previous classifiers.
3. Adaboost (cont’d) • The algorithm:
3. Adaboost (cont’d) • Adaboost starts with a uniform distribution of “weights” over training examples. The weights tell the learning algorithm the importance of the example. • Obtain a weak classifier from the weak learning algorithm, hj(x). • Increase the weights on the training examples that were misclassified. • (Repeat) • At the end, carefully make a linear combination of the weak classifiers obtained at all iterations.
4. The Cascade of Classifiers • A series of classifiers are applied to every sub-window. • The first classifier eliminates a large number of negative sub-windows and pass almost all positive sub-windows (high false positive rate) with very little processing. • Subsequent layers eliminate additional negatives sub-windows (passed by the first classifier) but require more computation. • After several stages of processing the number of negative sub-windows have been reduced radically.
4. The Cascade of Classifiers (cont’d) • Negative samples: non-object images. Negative samples are taken from arbitrary images. These images must not contain object representations. • Positive samples: images contain object (hand in our case). The hand in the positive samples must be marked out for classifier training.
5. Preliminary Results • Number of pos. samples: 144 • Number of neg. samples: 3142 • Sample Resolution: 640X480 • Initial sub-window size: 15X30 • Scale factor: 1.3 • Cascade obtained: 12 grades
6. Future Work • Extended Haar-like features? Will extended Haar-like features improve the detection accuracy? (Still an Open Problem) The performance tradeoff? • Parallel cascades for multiple hand gestures. How to select the hand gesture configurations which can be detected more effectively with the employed Haar-like feature set? • Improve the robustness against hand rotation. • How much improvement can be achieved with more training samples? Intel face detection classifier: 5000 Pos. 10000 Neg. Accuracy: 98%
References: • Wu Bo, et al., “A Multi-View Face Detection Based on Real Adaboost Algorithm,”Computer Research and Development, 42 (9)：pp.1612-1621，2005. • Paul Viola and Michael J. Jones, “Robust Real-time Object Detection,”Technical Report, Cambridge Research Lab, Compaq. 2001. • Cynthia Rudin, Robert E. Schapire, Ingrid Daubechies, “Analysis of Boosting Algorithms using the Smooth Margin Function: A Study of Three Algorithms,” 2004. • Rainer Lienhart, Alexander Kuranov, Vadim Pisarevsky, “Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection,” MRL Technical Report, May 2002. • Andre L. C. Barczak, Farhad Dadgostar, “Real-time Hand Tracking Using a Set of Cooperative Classifiers and Haar-Like Features,”Research Letters in the Information and Mathematical Sciences, ISSN 1175-2777, Vol. 7, pp 29-42, 2005. • Mathias Kölsch and Matthew Turk, “Robust Hand Detection,”Proc.IEEE Intl. Conference on Automatic Face and Gesture Recognition, May 2004. • Intel OpenCV Documents. • Acknowledgement goes to Urtho’s training data for eye detection and F. Dadgostar’s hand palm database.