Face detection with boosted Gaussian features

Face detection with boosted Gaussian features Pattern Recognition, Feb, 2007 井民全報告

Outline • Introduction • A brief overview of AdaBoost • The VC-Dimension concept • The features • Anisotropic Gaussian filters • Gaussian vs. Haar-like • Experiments and results

Introduction • Automatic face detection is a key step in any face processing system • It is far from a trivial task • faces are highly deformable objects • lighting conditions, poses • holistic methods • consider the face as a global object • feature-based methods • recognize parts of the face and assemble them to take the final decision

Introduction • The classical approach for face detection Step 1: scan the input image with a sliding window, and for each position Step 2: the window is classified as either face or non-face • The efficient exploration of the search space is a key ingredient for obtaining a fast face detector • Skin color, a coarse-to-fine approach, etc…

Introduction • A fast algorithm is proposed by Viola and Jones • Three main ideas • first train a strong classifier by Haar-like features-based classifiers • use the so-called integral image as image representation  very efficiently • a classification structure in cascade  speed

A brief overview of AdaBoost 24 a strong classifier 24 h1 h2 h3 h4 hT-1 hT Weaker classifier

Stage 3 Stage 2 Pass Pass Ada Boosting Learner Ada Boosting Learner False (Reject) False (Reject) Cascaded Classifiers Structure Stage 1 Feature set Ada Boosting Learner Feature Select & Classifier 100% Detection Rate 50% False Positive False (Reject) Reject as many negatives as possible  (minimize the false negative)

The region have the same size and shape And are horizontally or vertically adjacent Three-Rectangle Feature the sum within two outside rectangle subtracted from the sum in a center rectangle Four-Rectangle Feature The difference between the diagonal pairs of rectangles The base resolution is 24x24 The exhaustive set of rectangle is large, over 180,000. Haar-like features The difference between the sum of pixels within two rectangular regions 1 Two-Rectangle Feature 2 3

The feature values An example 24 24 Over 180,000 rectangle features associate with each sub-image

The training process for a weaker learner • Let’s see an example

The training process for a weaker classifier (an example) Example xi Example x yi y {f1(x),f2(x),…,fj(x), …, f180,000(x)} 1 {10,23,…,5, …} h1 {7,20,…, 25, …} 1 h1(xi)= 1 , if fj(xi) < 30 0, otherwise 0 {15,21,…,100,…} Searching for a feature that the training error is minimal ! {15,21,…,20,…} 0 fj(x)

The 1st iteration Example xi Example x y yi fj(x) h1(x) 1 {10,23,…,5, …} 1 h1 {7,20,…, 25, …} 1 1 h1(xi)= 1 , if fj(xi) < 30 0, otherwise 0 0 {15,21,…,100,…} {7,23,…,20,…} 1 (False positive) 0

h1 h1(xi)= 1 , if fj(xi) < 30 0, otherwise False positive Non-face

The training error for h1 Example xi yi fj(x) h1(x) weight error 1 {10,23,…,5, …} 1 1/4 0 + {7,20,…, 25, …} 1 1/4 0 1 + = 1/4 for h1 0 1/4 0 0 {15,21,…,100,…} + {7,23,…, 20,…} (False positive) 1 1/4 1/4 0

Update the weight (1/2) Distribute the contribution!

Update the weight (2/2) Example xi yi fj(x) h1(x) Weight error 1 {10,23,…,5, …} 1 1/4* 0 (變小) + {7,20,…, 25, …} 1/4* 1 0 1 (變小) + = 1/4 for h1 0 0 0 {15,21,…,100,…} 1/4* (變小) + {7,23,…, 20,…} (False positive) 1 1/4 1/4 0 (不變)

Normalization the weight N= # of the example

Normalize Weight Example xi yi fj(x) h1(x) 1 {10,23,…,5, …} 1 0.166 {7,20,…, 25, …} 0.166 1 1 0 0 {15,21,…,100,…} 0.166 {7,23,…, 20,…} (False positive) 1 0.5 0 (剛剛分錯的, weight 變大由 1/4  0.5)

分析目前使用 feature j 進行分類, 整體的 training error • 因為我們選 classifier 是選擇產生總體分類錯誤最小的 feature , 進行分類. • 而上次的分類, 分錯的 example 錯誤成本增加了. 故整個 training process 會趨向不讓上次分錯的 example, 在這次分錯每一個 example 的 weight (錯誤成本)

Cascaded Classifiers Structure h1 h2 False positive of h1 Non-face

Step 1: Giving example images Image The Boost algorithm for classifier learning Positive =1 Negative=0 Step 2: Initialize the weights For t = 1, … , T 1. Normalize the weights, 2. For each feature j, train a classifier hj which is restricted to using a single feature 3. Update the weights: Weak learner constructor

The final strong classifier 若超過一半的人, 贊成就通過每個人的投票的份量, 由正確率決定

Introduction • A brief overview of AdaBoost • The VC-Dimension concept • The features • Anisotropic Gaussian filters • Gaussian vs. Haar-like • Experiments and results

The VC-Dimension concept • A learning machine f takes an input x and transforms it, somehow using weights a, into a predicted output in some pagers, the definition is (Some vector of adjustable parameters) f

Examples f

How do we characterize “power”? • Different machines have different amounts of “power” • Tradeoff between: • More power: Can model more complex classifiers but might overfit • Less power: Not going to overfit, but restricted in what it can model • How do we characterize the amount of power?

Some definitions • Given some machine f • And under the assumption that all training points (xk,yk) were drawn i.i.d from some distribution. • And under the assumption that future test points will be drawn from the same distribution i.i.d  independent and identically distributed

Probability of misclassification Fraction training set of misclassification 片段 R = # of training

known Vapnik-Chervonenkis dimension • Given some machine f, let h be its VC dimension • Vapnik showed that Known(# of training example) with probability 1-

known known This gives us a way to estimate the error on future data based only on the training error and the VC-dimension of f

But given machine f, how do we define and compute h? the VC-dimension of f

Shattering • Machine f can shatter a set of points x1, x2 .. xr if and only if… • For every possible training set of the form • There exists some value of that gets zero training error.

Question • Can the following f shatter the following points?

Answer: No problem • There are four training sets to consider 水平線 (ok) 對角線 (ok) 對角線換正負號 (ok) 水平線換正負號 (ok)

Question • Can the following f shatter the following points?

Answer: No way my friend 衝突無法變換參數 (因為 f(x,b) 中無法控制 x 圓外一類圓內一類 (ok) 圓外一類圓內一類 (ok)

Definition of VC dimension • Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them 這個機器, 在所有 example 組合下, 最多不會分錯的 example 個數 • What ‘s VC dimension of Ans: 1

VC dim of line machine • For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? • Well, can we find four points that f can shatter? …

VC-dimension 越大的機器, Power 越大.

Structural Risk Minimization • considers a sequence of hypothesis spaces of increasing complexity • For example, polynomials of increasing degree.

Structural Risk Minimization • We’re trying to decide which machine to use • We train each machine and make a table… 越簡單越複雜

分析 • Vapnic-Chervonenkis 告訴我們任一台機器的 TestError 與 VC-Dimension (機器複雜程度) 有關. • 對相同 data set 而言, 複雜度越高的機器, 對 training 資料 overfit 的程度也越高. Training example

Generalization error for the AdaBoost proposed by Freund d= VC-dimension TRAINERR

For example • An AdaBoost algorithm proposed by [1] • Total # of features in all layer  6061 • AdaBoost has an important drawback • It tends to overfit training examples

Introduction • A brief overview of AdaBoost • The VC-Dimension concept • The features • Anisotropic Gaussian filters • Gaussian vs. Haar-like • Experiments and results

The proposed new features – Anisotropic Gaussianfilters • The generating function • It efficiently capture contour singularities with a smooth low resolution function

The transformations • Translation by • Rotation by • Bending by r

Anisotropic scaling by • By combining these four basic transforations,

Face detection with boosted Gaussian features

Face detection with boosted Gaussian features

Presentation Transcript

Face Detection

Face detection

Rapid Object Detection using a Boosted Cascade of Simple Features

Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features

Automated Face Detection

FACE DETECTION APPLICATION

Face Detection

Face detection

Rapid Object Detection using a Boosted Cascade of Simple Features

Face Detection

Rapid Object Detection using a Boosted Cascade of Simple Features

General Multivariate Gaussian Detection Problems

Hedge Detection with Latent Features

Face Detection with color eigenfaces

Face Detection

Face Detection

Face detection

Face Detection

Face detection

Detection in Non-Gaussian Noise

Face Detection