Robust Real-Time Face Detection 指導老師: 萬書言 老師 報告學生: 何炳杰 報告日期:2010/11/26
文章出處 • Title:Robust Real-Time Face Detection • Author:Paul Viola, MICHAEL J. JONES • Publication:International Journal of Computer Vision • Publisher:Springer • Date:May 1, 2004 http://www.springerlink.com/content/q70v4h6715v5p152/
Outline • Background • Key words • Abstract • 1. Introduction • 2. Features • 3. Learning Classification Functions • 4. The Attentional Cascade • 5. Results • 6. Conclusions • References
Background • Chellappa, R. Sinha, P. Phillips, P.J. 2010. Face Recognition by Computers and Humans. IEEE Computer Society. http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB
Background • 目標檢測方法最初由Paul Viola提出，並由Rainer Lienhart對這一方法進行了改善。首先，利用樣本（大約幾百幅樣本圖片）的harr特徵進行分類器訓練，得到一個級聯的boosted分類器。訓練樣本分為正例樣本和反例樣本，其中正例樣本是指待檢目標樣本(例如人臉或汽車等)，反例樣本指其它任意圖片，所有的樣本圖片都被歸一化(統一待策測影像大小)成同樣的尺寸大小(例如，20x20)。 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB
Background • 分類器訓練完以後，就可以應用於輸入圖像中的感興趣區域(與訓練樣本相同的尺寸)的檢測。檢測到目標區域(汽車或人臉)分類器輸出為1，否則輸出為0。為了檢測整幅圖像，可以在圖像中移動搜索視窗，檢測每一個位置來確定可能的目標。為了搜索不同大小的目標物體，分類器被設計為可以進行尺寸改變，這樣比改變待檢圖像的尺寸大小更為有效。所以，為了在圖像中檢測未知大小的目標物體，掃描程式通常需要用不同比例大小的搜索視窗對圖片進行幾次掃描。 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB
Background • 分類器中的“級聯”(cascade)是指最終的分類器是由幾個簡單分類器級聯組成。在圖像檢測中，被檢視窗依次通過每一級分類器，這樣在前面幾層的檢測中大部分的候選區域就被排除了，全部通過每一級分類器檢測的區域即為目標區域。 • “級聯”: http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB
Key words • Integral image • AdaBoost • Cascade classifier
Abstract Three main concepts: 積分影像 串聯/級聯 分類器 Adaptive Boosting
Abstract • This paper describes a face detection framework that is capable of processing images extremely rapidlywhile achieving high detection rates. • The first is the introduction of a new mage representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. • The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm to select a small number of critical visual features from a very large set of potential features. • The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-likeregions.
1. Introduction • 前言: - 在本篇文章中Paul Viola等人利用了三個演算法，來快速找到人臉。第一個是Integral Image，第二個是 AdaBoost，第三個是Cascade classifier。並重述摘要中所提到的主要三個貢獻。
1.1 Overview • Section 2: - It will detail the formof the featuresas well as a new scheme for computing them rapidly. • Section 3: - It will discuss the method in which these features are combined to form a classifier .(AdaBoost) • Section 4: - It will describe a method for constructing a cascade of classifiers. • Section 5: - It will describe a number of experimental results. • Section 6: - It contains a discussion of this system and its relationship to related systems.
Features • Authors use three kinds offeatures. The value of a two-rectangle feature is thedifference between the sum of the pixels within tworectangular regions.
Features • The regions have the same size and shape and are horizontally or vertically adjacent. • Finally a four-rectangle feature computes the difference between diagonal pairs of rectangles. • The base resolution of the detector is 24 × 24, the exhaustive set of rectangle features is quite large, 160,000.
2.1 Integral Image • 概念: 一個簡單的微積分類比：如果我們要經常計算 ， 那我們會先計算 ，那麼 。 積分圖的含意與此類似。 積分圖與積分的類比
2.1 Integral Image • Theintegral image at locationx,y contains the sum of the pixels above and to the leftof x, y, inclusive:
2.1 Integral Image - ii(x, y) is the integral image. - i(x, y) is the original image.
2.1 Integral Image - • s(x, y) is the cumulative row sum. • s(x, -1) = 0, ii(-1, y) = 0 其中，s(x, y)為點(x, y)及其y方向向上所有原始影像之和，稱 為〝列積分和〞 座標A(x, y)的積分圖定義為其左上角矩形所有像素之和(圖中陰影部分)。 s(x, y)為A(x, y)及其y方向向上所有像素之和。
2.1 Integral Image • Using the integral image any rectangular sum can becomputed in four array references (see Fig. 3).
2.1 Integral Image - The authors point out that in the case of linear operations (e.g. f · g), any invertible linear operation can be applied to f or g if its inverse is applied to the result. ex: In the case of convolution, if the derivative operator is applied both to the image and the kernel the result must then be double integrated.
2.1 Integral Image - The authors go on to show that convolution can be significantly accelerated if the derivatives of f and g are sparse (or can be made so). A similar insight is that an invertible linear operation can be applied to f if its inverse is applied to g.
2.1 Integral Image - Viewed in this framework computation of the rectangle sum can be expressed as a dot product, i · r,where iis the image and r is the box car image (with value 1 within the rectangle of interest and 0 outside). This operation can be rewritten.
2.2 Feature Discussion • Rectangle features are also sensitive to the presence of edges, bars, and other simple image structure, they are quite coarse. • The only orientations available are vertical, horizontal and diagonal.
Learning Classification Function • AdaBoost概念: • AdaBoost全名為Adaptive Boosting。AdaBoost是一种迭代算法，其核心思想是針對同一個訓練集(training set)訓練不同的分類器(弱分類器)，然後把這些弱分類器集合起來，構成一個更強的最终分類器(強分類器)。 • In our system a variant of AdaBoost is used bothto select the features and to train the classifier . In its original form, the AdaBoostlearning algorithm is used to boost the classificationperformance of a simple learning algorithm.
Learning Classification Function • AdaBoost概念: • It does this by combining a collection of weak classification functions to form a stronger classifier. In the language of boosting the simple learning algorithm is called a weak learner. • 〝Weak learner〞:隨機猜測一個是或否的問題，將會有50%的正確率。如果一個假設能夠稍微地提高猜測正確的機率，那麼這個假設就是弱學習算法，得到這個算法的過程稱為弱學習；反之，如果一個假設能夠顯著地提高猜測正確的機率，那麼這個假設就稱為強學習。
Learning Classification Function • The learner is called weak becausewe do not expect even the best classification function toclassify the training data well. (i.e. for a given problemthe best perceptron may only classify the training datacorrectly 51% of the time). • A weak classifier (h(x, f, p, θ)) thus consists of a feature ( f ), a threshold (θ) and a polarity (p) indicating the direction of the inequality: Here x isa 24 × 24 pixel sub-window of an image.
Learning Classification Function • Table1. (See paper p.142, please.) ＊ Given example images(x1, y1), ... ,(xn, yn) where yi = 0, 1 for negative and positive examples respectively. ＊ Initialize weights w1,i = 1/2m, 1/2l for yi = 0, 1 respectively, where m and l are the number of negatives and positives respectively. ＊ For t =1, ... , T： 1. Normalize the weights, 2. Select the best weak classifier with respect to the weighted error
Learning Classification Function • Table1. (See paper p.142, please.) 3. Define where are the minimizersof . 選取最佳的弱分類器 (擁有最小錯誤率 ) 4. Update the weights: (按照這個最佳弱分類器，調整權重) （其中 表示 被正確地分類， 表示 被錯誤地分類。 ）
Learning Classification Function • Table1. (See paper p.142, please.) ＊ The final strong classifier is:
3.1 Learning Discussion • The algorithm described in Table 1 is used to select key weak classifiers from the set of possible weak classifiers. • Since there is one weak classifier for each distinct feature/threshold combination, there are effectively KN weak classifiers, where K is the number of features and N is the number of examples. • Therefore the total number of distinct thresholds is N. Given a task with N = 20000 and K = 160000 there are 3.2 billion distinct binary weak classifiers.
3.1 Learning Discussion • The weak classifier selection algorithm proceeds as follows: (弱分類器的訓練及選取) • For each feature, the examples are sorted based on feature value. • The AdaBoost optimal threshold for that feature can then be computed in a single pass over this sorted list. • For each element in the sorted list, four sums are maintained and evaluated: - :The total sum ofpositive example weights. :The total sum of negativeexampleweights. :The sum of positive weights belowthe current example. :The sum of negative weights below the current example.
3.1 Learning Discussion • The weak classifier selection algorithm proceeds as follows: (弱分類器的訓練及選取) 3. For each element in the sorted list, four sums are maintained and evaluated: - : 全部人臉樣本的權重的和。 : 全部非人臉樣本的權重的和。 :在此元素之前的人臉樣本的權重的和。 :在此元素之前的非人臉樣本的權重的和。
3.1 Learning Discussion • The weak classifier selection al 因此，透過把這個排序的表從頭到尾掃描一遍就可以為弱分類器選擇使分類誤差最小的閥值（最佳閥值），也就是選取了一個最佳弱分類器。 訓練並選取最佳分類器算法
3.2 Learning Results • Initial experiments demonstrated that a classifierconstructed from 200 features would yield reasonable results (see Fig. 4).
3.2 Learning Results • The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks (see Fig. 5). • The second feature selected relies on the property that the eyes are darker than the bridge of the nose.
3.2 Learning Results • In summary the 200-feature classifier provides initial evidence that a boosted classifier constructed from rectangle features is an effective technique for face detection.
The AttentionalCascade • Cascade概念: -對於Cascade classifier的概念，就如Figure 6所示。我們一開始將feature分成好幾個classifier。最前面的classier辨識率最低，但是可以先篩選掉很大一部份不是人臉的圖片；接下來的Classifier處理比較難處理一點的case篩選掉的圖片也不如第一個classifier多了；依此下去，直到最後一個classifier為止。最後留下來的就會是我們想要的人臉的照片。
The AttentionalCascade • This section describes an algorithm for constructing a cascade of classifiers which achieves increased detection performance while radically reducing computation time. • The key insight is that smaller, and therefore moreefficient, boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all positive instances.
The AttentionalCascade • A positive result from the first classifier triggers the evaluation of a second classifier which has also been adjusted to achieve very high detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to the immediate rejection of the sub-window.
4.1 Training a Cascade of Classifiers • In order to achieve good detection rates (between 85 and 95 percent) and extremely low false positive rates, The number of cascade stages and the size of each stage must be sufficient to achieve similar detection performance while minimizing computation. • F:The false positive rate of the classifier. • K: The number of classifiers. • : The false positive rate of the ith classifier.
4.1 Training a Cascade of Classifiers • In order to achieve good detection rates (between 85 and 95 percent) and extremely low false positive rates, The number of cascade stages and the size of each stage must be sufficient to achieve similar detection performance while minimizing computation. • D:The false positive rate of the classifier. • K: The number of classifiers. • : The false positive rate of the ith classifier.
4.1 Training a Cascade of Classifiers • Purpose: • Given concrete goals for overall false positive and detection rates, target rates can be determined for each stage in the cascade process. • Ex: ※ For a detection rate of 0.9 can be achieved by a 10 stage classifier if each stage has a detection rate of 0.99 (since 0.9 ≈).
4.1 Training a Cascade of Classifiers • The key measure of each classifier is its “positiverate”, the proportion of windows which are labelledaspotentially containing a face. • The expected number of features which are evaluated is: • N : The expected number of features evaluated. • K : The number of classifiers. • : The positive rate of the ith classifier. • :The number of features in the ith classifier.
4.1 Training a Cascade of Classifiers • Table 2. (See paper p.146, please.) ＊ User selects values for f , the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer. ＊ User selects target overall false positive rate,. ＊ P = set of positive examples N = set of negative examples ＊ = 1.0 ; = 1.0; i = 0 ＊ 1 2 3 4 then evaluate the current cascaded detector on the set of non-face images and put any falsedetections intothe set N
4.2 Simple Experiment • In order to explore the feasibility of the cascade approach two simple detectors were trained: • A monolithic 200-feature classifier.(集成的概念) • A cascade of ten20-feature classifiers. ＊ The first stage: - The classifier in the cascade was trained using 5000 faces and 10000 non-face sub-windows randomly chosen from non-face images. The second stage: - The second stage classifier was trained on thesame 5000 faces plus 5000 false positives of the first classifier.
4.2 Simple Experiment All Sub-windows Mono-lithic Outcome T F Type 1: T T T T Type 2: 1 2 3 10 Outcome All Sub-windows F F F F
4.2 Simple Experiment • Two-typed experiments’ outcome: • Type 1: The monolithic 200-feature classifier was trained on the union of all examples used to train all the stages of the cascaded classifier. Note that without reference it might be difficult to select a set of non-face training examples to train the monolithic classifier. • Type 2: The sequential way in which the cascaded classifier is trained effectively reduces the non-face training set by throwing out easy examples and focusing on the “hard”ones.
4.2 Simple Experiment • Fig 7. Figure 7.ROC curves comparing a 200-feature classifier with a cascaded classifier containing ten 20-feature classifiers. Accuracy is not significantly different, but the speed of the cascaded classifier is almost 10 times faster.
Results • Preface: - This section describes the final face detection system. The discussion includes details on the structure and training of the cascaded detector as well as results on a large real-world testing set.
Training Dataset • The face training set consisted of 4916 hand labeledfaces scaled and aligned to a base resolution of 24 by 24 pixels. • This bounding box wasthen enlarged by 50% and then cropped and scaled to24 by 24 pixels.