Dense Object Recognition. 2. Template Matching. Face Detection. We will investigate face detection using a scanning window technique:. Think that this task sounds easy?. Training Data. NonFaces. Faces. 800 random nonface regions 60x60, taken from same data as faces.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
2. Template Matching
Face Detection
We will investigate face detection using a scanning window technique:
Think that this task sounds easy?
NonFaces
Faces
800 random nonface regions 60x60, taken from same data as faces
800 face images 60x60, taken from online dating website
x1
x2
x3
……..
xN
Concatenate face pixels into
“vector”, x.
All that remains is to specify form of likelihood terms
denotes a ndimensional Gaussian or Normal distribution in the variable x with mean m and symmetric positive definite covariance matrix, S which comes in three flavours:
Fit model using maximum likelihood criterion
m face
m nonface
Pixel 2
Pixel 1
Face ‘template’
Results based on 200 cropped faces and 200 nonfaces from the same database.
How does this work with a real image?
Pr(Hit)
Pr(False Alarm)
Maxima in log like ratio
Maxima in log like ratio
Maxima in log like ratio
Threshold Maxima
Scale 1
Scale 2
Scale 3
Before Thresholding
Before Thresholding
Before Thresholding
After Thresholding
After Thresholding
After Thresholding
Original Image
Superimposed log like ratio
Detected Faces
Positions of maxima
Fit model using maximum likelihood criterion
m face
m nonface
Pixel 2
Pixel 1
Results based on 200 cropped faces and 200 nonfaces from the same database.
More sophisticated model unsurprisingly classifies new faces and nonfaces better.
Pr(Hit)
Diagonal
Uniform
Pr(False Alarm)
Fit model using maximum likelihood criterion
PROBLEM: we cannot fit this model. We don’t have enough data to estimate the full covariance matrix.
N=800 training images
D=10800 dimensions
Total number of measured numbers =
ND = 800x10,800 = 8,640,000
Total number of parameters in cov matrix = (D+1)D/2 = (10,800+1)x10,800/2 = 58,325,400
Pixel 2
Pixel 1
We could induce some covariance by using a mixtures of Gaussians model in which each component is uniform or diagonal. For small number of mixture components, the number of parameters is not too bad.
Pixel 2
Pixel 2
Pixel 1
Pixel 1
For diagonal Gaussians, there are 2D+1 unknowns per component (D parameters for mean, D for diagonal covariance, and 1 for the weight of the Gaussian). i.e. For K components K(2D+1).
3. Mixtures of Templates
Key idea: represent probability as weighted sum (mixture) of Gaussian distributions. Weights must sum to 1 or not a pdf.
Pr(x)
x
x
Try to think about the same problem in a different way...
Marginalize over h
THING TO NOTICE #1:
If we knew the hidden variables hi for the training data it would very easy to estimate parameters q – just estimate individual Gaussians separately.
THING TO NOTICE #2:
If we knew the parameters q it would very easy to estimate the posterior distribution over the each hidden variables hi using Bayes’ rule:
Pr(xh=3)
Pr(hx)
Pr(xh=2)
Pr(xh=1)
h=1
h=2
h=3
Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)
0.4999
0.5001
Prior
The face model and nonface model have divided the data into two clusters. In each case, these clusters have roughly equal weights.
The primary thing that these seem to have captured is the photometric (luminance) variation.
Note that the standard deviations have become smaller than for the single Gaussian model as any given data point is likely to be close to one mean or the other.
Mean
Face Model Parameters
Standard deviation
0.5325
0.4675
Prior
Mean
NonFace Model Parameters
Standard deviation
Performance improves relative to a single Gaussian model, although it is not dramatic.
We have a better description of the data likelihood.
Pr(Hit)
MOG 2
Diagonal
Uniform
Pr(False Alarm)
0.0988
0.1925
0.2062
0.2275
0.1575
Prior
Mean
Face Model Parameters
Standard deviation
0.1737
0.2250
0.1950
0.2200
0.1863
Prior
Mean
NonFace Model Parameters
Standard deviation
0.0075
0.1425
0.1437
0.0988
0.1038
0.1187
0.1638
0.1175
0.1038
0.0000
0.1137
0.0688
0.0763
0.0800
0.1338
0.1063
0.1063
0.1263
0.0900
0.0988
Results for MOG 2 Model
Performance improves slightly more, particularly at low false alarm rates.
What if we move to an infinite number of Gaussians?
Pr(Hit)
MOG 10
MOG 2
Diagonal
Uniform
Pr(False Alarm)
4. Subspace models: factor analysis
Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.
What happens if we keep adding more and more Gaussians along this line?
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
h=1
Pixel 1
h=0
h=1
Hidden Variable
Pixel 1
Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.
What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
h=2
Pixel 1
h=1
h=0
h=1
h=2
Hidden Variable
Pixel 1
Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.
What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
CONTINUOUS
Pixel 1
Hidden Variable
Pixel 1
Now consider weighting the constituent Gaussians...
Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.
What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
CONTINUOUS
Pixel 1
Hidden Variable
Pixel 1
If weights decrease with distance from central point, can get something like oriented Gaussian
f
Pixel 2
Marginalize
over h
Pixel 2
m
Pixel 1
h=1
Pixel 1
h=0
h=1
Hidden Variable
Pixel 1
f
Pixel 2
Marginalize
over h
Pixel 2
m
Pixel 1
h=1
Pixel 1
h=0
h=1
Hidden Variable
Pixel 1
f
Pixel 2
Marginalize
over h
Pixel 2
m
Pixel 1
h=1
Pixel 1
h=0
h=1
Hidden Variable
Pixel 1
Pixel 2
f
Marginalize
over h
Pixel 2
m
Pixel 1
h=2
Pixel 1
h=1
h=0
h=1
h=2
Hidden Variable
Pixel 1
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
CONTINUOUS
Pixel 1
Hidden Variable
Pixel 1
Now consider weighting the constituent Gaussians...
Pixel 2
Marginalize
over h
Pixel 2
Pixel 1
CONTINUOUS
Pixel 1
Hidden Variable
Pixel 1
Weight components by another Gaussian distribution with mean 0 and variance 1
For a general subspace of dh dimensions in a larger space of size dx
h
x
.
.
x1 = m+Fh1 + e
.
•
e
x1
h1
.
x2
.
.
HIDDEN DIM 2
OBSERVED DIM 2
.
h2
x3
Deterministic transformation + additive noise
OBSERVED DIM 3
h3
OBSERVED DIM 1
HIDDEN DIM 1
A Generative View
h
Equivalent Description:
x
Joint Distribution: (marginalize to get Pr(x))
For a general subspace of dh dimensions in a larger space of size dx.
This gives a total of dx (dh+1) parameters.
If dh is reasonably small, and dx is large then this is much less than the full covariance which has dx(dx+1)/2.
It is a reasonable assumption that an ensemble of images (like faces) genuinely lie largely within a subspace of the very highdimensional image space so this is not a bad model.
Interlude: Gaussian and Matrix Identities
Multivariate generalization of 1D Gaussian or Normal distribution. Depends on mean vector m and (symmetric, positive, definite) covariance matrix S. The multivariate normal distribution has PDF:
where n is the dimensionality of the space under consideration.
Property: When we multiply two Gaussian distributions (common when applying Bayes’ rule) then the resulting distribution is also Gaussian. In particular:
where:
The normalization constant is also Gaussian in either a or b. Intuitively you can see that the product must be a Gaussian, as each of the original Gaussians has an exponent that is quadratic in x. When we multiply the two Gaussians, we add the exponents giving another quadratic.
Proof:

=

where we have removed the terms that do not depend on x and placed them in the constant, k. It can be seen from the quadratic term that this looks like a Gaussian with covariance:
Completing the Square:
Rearranging:
As required.




Consider a Gaussian in x with a mean that is a linear function, H of y. We can rearrange to express this in terms of a Gaussian in y:
Proof:
Looking at the quadratic term in y, it resembles the quadratic term of a Gaussian in y with covariance:
Completing the Square:
Rearranging:
As Required
Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. The following equality holds:
Proof:
Taking the inverse of both sides:
Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. The following equality holds:
This is known as the Matrix Inversion Lemma.
Proof:
Remember Matrix Identity 1
As required
1.
2.
3.
4.
Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. Then:
5.
6.
(returning to)
4. Subspace models: factor analysis
GOAL: Given a data set x1...N, estimate factor analysis model parameters q = {m,F,S}.
Let’s make life somewhat easier: it is fairly obvious that
We’ll use this estimate, and subtract the mean from each of the training vectors to make a slightly simpler generative model
Goal: Learn parameters defining model,q={F,S}.
Problem: Hard to estimate parameters q since we don’t know the latent identity vectors, h.
Method: Expectation Maximization (EM) algorithm. Alternately perform EStep and MStep until convergence:
Can express this as:
.
.
HIDDEN DIM 2
OBSERVED DIM 2
Generative Model
h
x
OBSERVED DIM 3
OBSERVED DIM 1
HIDDEN DIM 1
In the EStep, we use Bayes rule to find the distribution for the identity vector h given the observed data, x:
In this simple subspace model, both of the terms in the denominator are Gaussian so this posterior probability for h can be calculated in closed form.
.
.
HIDDEN DIM 2
OBSERVED DIM 2
Probabilistic Inversion via Bayes’ Rule
h
x
OBSERVED DIM 3
OBSERVED DIM 1
HIDDEN DIM 1
Let’s consider just the numerator of this expression, since the denominator is just a scaling constant
Now apply Gaussian Relation #2 to the first term
to give:
Notice that we have a Gaussian times a Gaussian in the same variable here – this must make a Gaussian result. To find the mean and covariance of this, we use Gaussian relation #1
This distribution has moments around the mean which are given by:
We can reformulate these terms using our two matrix relations:
We can reformulate these terms using our two matrix relations:
to give:
Why should we bother to do this? Well, the matrices in brackets at the top are dxx dx. whereas matrices at the bottom are dh x dh
In the EStep, we use Bayes rule to find the distribution for the identity vector h given the observed data, x:
In this simple subspace model, both of the terms in the denominator are Gaussian so this posterior probability for h can be calculated in closed form.
.
.
HIDDEN DIM 2
OBSERVED DIM 2
Probabilistic Inversion via Bayes’ Rule
h
x
OBSERVED DIM 3
OBSERVED DIM 1
HIDDEN DIM 1
Objective function is joint log likelihood of latent variables and data:
Using expected values of h, write take derivatives of log likelihood, set to zero and solve for parameters q, substituting in the expected values of h.
F1
F2
m
S
m+2F1
m+2F2
F1
F2
m
S
m+2F1
m+2F2
Can calculate factor analysis performance for face detection in terms of a receiving operator characteristic curve.
m
S
F2
F1
F4
F5
F3
m+2F1
m+2F2
m+2F3
m+2F4
m+2F5
m
S
F2
F1
F4
F5
F3
m+2F1
m+2F2
m+2F3
m+2F4
m+2F5
Can calculate factor analysis performance for face detection in terms of a receiving operator characteristic curve.
Sampling from 10 parameter model
Factors are ambiguous up to a rotation:
There is an infinite set of equivalent models each of which has the same probability.
Mixture of factor analyzers (MOFA)
Pixel 2
Pixel 1
Gaussian Process Latent Variable Models
Pixel 2
Pixel 1
7. Relationship to nonprobabilistic methods
+ h3
…
+ h1
+ h2
m + h1f1 + h2f2 + h3f3 +…
x’
.
.
.
•
e
x1
h1
.
x2
.
.
HIDDEN DIM 2
OBSERVED DIM 2
.
h2
x3
OBSERVED DIM 3
h3
OBSERVED DIM 1
HIDDEN DIM 1
s2
x2
s1
x1
x2
x1
X’2
X’1
As before, we break down this covariance matrix into the product of three other matrices:
where U is a rotation matrix that transforms the principal axes of the fitted Gaussian back to the original coordinate system
5. Known objects under unknown pose and illumination
6. Objects under partial occlusion