Dense object recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 100

Dense Object Recognition PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on
  • Presentation posted in: General

Dense Object Recognition. 2. Template Matching. Face Detection. We will investigate face detection using a scanning window technique:. Think that this task sounds easy?. Training Data. Non-Faces. Faces. 800 random non-face regions 60x60, taken from same data as faces.

Download Presentation

Dense Object Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dense object recognition

Dense Object Recognition

2. Template Matching


Dense object recognition

Face Detection

We will investigate face detection using a scanning window technique:

Think that this task sounds easy?


Training data

Training Data

Non-Faces

Faces

800 random non-face regions 60x60, taken from same data as faces

800 face images 60x60, taken from online dating website


Vectorizing images

Vectorizing Images

x1

x2

x3

……..

xN

Concatenate face pixels into

“vector”, x.


Overview of approach

Overview of Approach

  • GENERATIVE APPROACH

  • Calculate models for data likelihood given each class

  • Compare likelihoods – in this case we will just calculate the likelihood ratio:

  • Threshold likelihood ratio to decide if face / non-face

All that remains is to specify form of likelihood terms


The multivariate gaussian

The Multivariate Gaussian

denotes a n-dimensional Gaussian or Normal distribution in the variable x with mean m and symmetric positive definite covariance matrix, S which comes in three flavours:


Model 1 gaussian uniform covariance

Model # 1: Gaussian, uniform covariance

Fit model using maximum likelihood criterion

m face

m non-face

Pixel 2

  • Face

  • 59.1

  • non-face

  • 69.1

Pixel 1

Face ‘template’


Model 1 results

Model 1 Results

Results based on 200 cropped faces and 200 non-faces from the same database.

How does this work with a real image?

Pr(Hit)

Pr(False Alarm)


Scale 1

Scale 1

Maxima in log like ratio


Scale 2

Scale 2

Maxima in log like ratio


Scale 3

Scale 3

Maxima in log like ratio


Dense object recognition

Threshold Maxima

Scale 1

Scale 2

Scale 3

Before Thresholding

Before Thresholding

Before Thresholding

After Thresholding

After Thresholding

After Thresholding


Results

Results

Original Image

Superimposed log like ratio

Detected Faces

Positions of maxima


Model 2 gaussian diagonal covariance

Model # 2: Gaussian, diagonal covariance

Fit model using maximum likelihood criterion

m face

m non-face

Pixel 2

  • Face

  • non-face

Pixel 1


Model 2 results

Model 2 Results

Results based on 200 cropped faces and 200 non-faces from the same database.

More sophisticated model unsurprisingly classifies new faces and non-faces better.

Pr(Hit)

Diagonal

Uniform

Pr(False Alarm)


Model 2 gaussian full covariance

Model # 2: Gaussian, full covariance

Fit model using maximum likelihood criterion

PROBLEM: we cannot fit this model. We don’t have enough data to estimate the full covariance matrix.

N=800 training images

D=10800 dimensions

Total number of measured numbers =

ND = 800x10,800 = 8,640,000

Total number of parameters in cov matrix = (D+1)D/2 = (10,800+1)x10,800/2 = 58,325,400

Pixel 2

Pixel 1


Possible solution

Possible Solution

We could induce some covariance by using a mixtures of Gaussians model in which each component is uniform or diagonal. For small number of mixture components, the number of parameters is not too bad.

Pixel 2

Pixel 2

Pixel 1

Pixel 1

For diagonal Gaussians, there are 2D+1 unknowns per component (D parameters for mean, D for diagonal covariance, and 1 for the weight of the Gaussian). i.e. For K components K(2D+1).


Dense object recognition1

Dense Object Recognition

3. Mixtures of Templates


Mixture of gaussians

Mixture of Gaussians

Key idea: represent probability as weighted sum (mixture) of Gaussian distributions. Weights must sum to 1 or not a pdf.

Pr(x)

x

x


Hidden variable interpretation

Hidden Variable Interpretation

Try to think about the same problem in a different way...

Marginalize over h


Hidden variable interpretation1

Hidden Variable Interpretation

  • ASSUMPTIONS

  • for each training datum xi there is a hidden variable hi.

  • hi represents which Gaussian xi came from

  • hence hi takes discrete values

  • OUR GOAL:

  • To estimate the parameters q:

    • means m,

    • variances s2

    • weights w

  • for each of the K components.

THING TO NOTICE #1:

If we knew the hidden variables hi for the training data it would very easy to estimate parameters q – just estimate individual Gaussians separately.


Hidden variable interpretation2

Hidden Variable Interpretation

THING TO NOTICE #2:

If we knew the parameters q it would very easy to estimate the posterior distribution over the each hidden variables hi using Bayes’ rule:

Pr(x|h=3)

Pr(h|x)

Pr(x|h=2)

Pr(x|h=1)

h=1

h=2

h=3


Expectation maximization

Expectation Maximization

  • Chicken and egg problem:

    • could find h1...N if we knew q

    • could find q if we knew h1...N

Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)

  • Alternate between:

  • 1. Expectation Step (E-Step)

    • For fixed q find posterior distribution over h1...N

  • 2. Maximization Step (M-Step)

    • Given these distributions,maximize lower bound on likelihood w.r.t. q


Mog 2 components

MOG 2 Components

0.4999

0.5001

Prior

The face model and non-face model have divided the data into two clusters. In each case, these clusters have roughly equal weights.

The primary thing that these seem to have captured is the photometric (luminance) variation.

Note that the standard deviations have become smaller than for the single Gaussian model as any given data point is likely to be close to one mean or the other.

Mean

Face Model Parameters

Standard deviation

0.5325

0.4675

Prior

Mean

Non-Face Model Parameters

Standard deviation


Results for mog 2 model

Results for MOG 2 Model

Performance improves relative to a single Gaussian model, although it is not dramatic.

We have a better description of the data likelihood.

Pr(Hit)

MOG 2

Diagonal

Uniform

Pr(False Alarm)


Mog 5 components

MOG 5 Components

0.0988

0.1925

0.2062

0.2275

0.1575

Prior

Mean

Face Model Parameters

Standard deviation

0.1737

0.2250

0.1950

0.2200

0.1863

Prior

Mean

Non-Face Model Parameters

Standard deviation


Mog 10 components

MOG 10 Components

0.0075

0.1425

0.1437

0.0988

0.1038

0.1187

0.1638

0.1175

0.1038

0.0000

0.1137

0.0688

0.0763

0.0800

0.1338

0.1063

0.1063

0.1263

0.0900

0.0988


Dense object recognition

Results for MOG 2 Model

Performance improves slightly more, particularly at low false alarm rates.

What if we move to an infinite number of Gaussians?

Pr(Hit)

MOG 10

MOG 2

Diagonal

Uniform

Pr(False Alarm)


Dense object recognition2

Dense Object Recognition

4. Subspace models: factor analysis


Factor analysis intuitions

Factor Analysis: Intuitions

Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.

What happens if we keep adding more and more Gaussians along this line?

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

h=1

Pixel 1

h=0

h=-1

Hidden Variable

Pixel 1


Factor analysis intuitions1

Factor Analysis: Intuitions

Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.

What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

h=2

Pixel 1

h=1

h=0

h=-1

h=-2

Hidden Variable

Pixel 1


Factor analysis intuitions2

Factor Analysis: Intuitions

Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.

What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

CONTINUOUS

Pixel 1

Hidden Variable

Pixel 1

Now consider weighting the constituent Gaussians...


Factor analysis intuitions3

Factor Analysis: Intuitions

Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical.

What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

CONTINUOUS

Pixel 1

Hidden Variable

Pixel 1

If weights decrease with distance from central point, can get something like oriented Gaussian


Factor analysis maths

Factor Analysis: Maths

f

Pixel 2

Marginalize

over h

Pixel 2

m

Pixel 1

h=1

Pixel 1

h=0

h=-1

Hidden Variable

Pixel 1


Factor analysis maths1

Factor Analysis: Maths

f

Pixel 2

Marginalize

over h

Pixel 2

m

Pixel 1

h=1

Pixel 1

h=0

h=-1

Hidden Variable

Pixel 1


Factor analysis maths2

Factor Analysis: Maths

f

Pixel 2

Marginalize

over h

Pixel 2

m

Pixel 1

h=1

Pixel 1

h=0

h=-1

Hidden Variable

Pixel 1


Factor analysis maths3

Factor Analysis: Maths

Pixel 2

f

Marginalize

over h

Pixel 2

m

Pixel 1

h=2

Pixel 1

h=1

h=0

h=-1

h=-2

Hidden Variable

Pixel 1


Factor analysis maths4

Factor Analysis Maths

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

CONTINUOUS

Pixel 1

Hidden Variable

Pixel 1

Now consider weighting the constituent Gaussians...


Factor analysis maths5

Factor Analysis Maths

Pixel 2

Marginalize

over h

Pixel 2

Pixel 1

CONTINUOUS

Pixel 1

Hidden Variable

Pixel 1

Weight components by another Gaussian distribution with mean 0 and variance 1


Factor analysis maths6

Factor Analysis Maths

  • This integral does actually evaluate to make a new Gaussian whose principal axis is oriented along the line given by m+kf.

  • This is not obvious!

  • The line along which the Gaussians are placed is termed a subspace.

  • Since h was just a number and there was one column in f it was a one dimensional subspace.

  • This is not necessarily the case though but dh< dxalways holds


Factor analysis maths7

Factor Analysis Maths

For a general subspace of dh dimensions in a larger space of size dx

  • F has dim(h) columns each of length dx– these are termed factors.

  • They are basis vectors span the subspace

  • h now weights these basis vectors to define a position in the subspace

  • Concrete example: 2D subspace in a 3D space

  • F will contain two 3D vectors in its columns, spanning plane subspace

  • h determines the weighting of these vectors

  • h determines the position on the plane


A generative view

A Generative View

  • We have considered factor analysis as an infinite mixture of Gaussians, but there are other ways to think about it.

    • Consider a rule for creating new data points xi

    • Created from some smaller underlying random variables hi

h

  • To generate:

    • Choose factor loadings, hi from standard normal distribution

    • Multiply by factors, F

    • Add mean, m

    • add random noise component ei w/ diagonal covS

x


A generative view1

A Generative View

  • Choose factor loadings, hi from standard normal distribution

  • Multiply by factors, F

  • Add mean, m

  • add random noise component ei w/ diagonal covS

.

.

x1 = m+Fh1 + e

.

e

x1

h1

.

x2

.

.

HIDDEN DIM 2

OBSERVED DIM 2

.

h2

x3

Deterministic transformation + additive noise

OBSERVED DIM 3

h3

OBSERVED DIM 1

HIDDEN DIM 1


Dense object recognition

A Generative View

  • Choose factor loadings, hi from standard normal distribution

  • Multiply by factors, F

  • Add mean, m

  • add random noise component ei w/ diagonal covS

h

Equivalent Description:

x

Joint Distribution: (marginalize to get Pr(x))


Factor analysis parameter count

Factor Analysis Parameter Count

For a general subspace of dh dimensions in a larger space of size dx.

  • Factor analysis covariance has:

    • dhdx parameters in the factor matrix, F

    • dx parameters in the covariance, S

This gives a total of dx (dh+1) parameters.

If dh is reasonably small, and dx is large then this is much less than the full covariance which has dx(dx+1)/2.

It is a reasonable assumption that an ensemble of images (like faces) genuinely lie largely within a subspace of the very high-dimensional image space so this is not a bad model.

  • But given some data, how to we estimate F, S, and m?

    • Unfortunately, to do this, we will need some more maths!


Dense object recognition3

Dense Object Recognition

Interlude: Gaussian and Matrix Identities


Multivariate normal distribution

Multivariate Normal Distribution

Multivariate generalization of 1D Gaussian or Normal distribution. Depends on mean vector m and (symmetric, positive, definite) covariance matrix S. The multivariate normal distribution has PDF:

where n is the dimensionality of the space under consideration.


Gaussian identity 1 multiplication of gaussians

Gaussian Identity #1:Multiplication of Gaussians

Property: When we multiply two Gaussian distributions (common when applying Bayes’ rule) then the resulting distribution is also Gaussian. In particular:

where:

The normalization constant is also Gaussian in either a or b. Intuitively you can see that the product must be a Gaussian, as each of the original Gaussians has an exponent that is quadratic in x. When we multiply the two Gaussians, we add the exponents giving another quadratic.


Dense object recognition

Proof:

-

=

-

where we have removed the terms that do not depend on x and placed them in the constant, k. It can be seen from the quadratic term that this looks like a Gaussian with covariance:


Dense object recognition

Completing the Square:

Re-arranging:

As required.

-

-

-

-


Gaussian identity 2

Gaussian Identity #2

Consider a Gaussian in x with a mean that is a linear function, H of y. We can re-arrange to express this in terms of a Gaussian in y:

Proof:

Looking at the quadratic term in y, it resembles the quadratic term of a Gaussian in y with covariance:


Dense object recognition

Completing the Square:

Re-arranging:

As Required


Matrix identity 1

Matrix Identity #1

Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. The following equality holds:

Proof:

Taking the inverse of both sides:


Matrix identity 2 the matrix inversion lemma

Matrix Identity 2: The Matrix Inversion Lemma

Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. The following equality holds:

This is known as the Matrix Inversion Lemma.

Proof:


Dense object recognition

Remember Matrix Identity 1

As required


Maths review

Maths Review

1.

2.

3.

4.

Consider the d x d matrix, P, the k x k matrix R, and the k x d matrix H, where P and R are symmetric, positive definite, covariance matrices. Then:

5.

6.


Dense object recognition4

Dense Object Recognition

(returning to)

4. Subspace models: factor analysis


Learning factor analysis models

Learning Factor Analysis Models

GOAL: Given a data set x1...N, estimate factor analysis model parameters q = {m,F,S}.

Let’s make life somewhat easier: it is fairly obvious that

We’ll use this estimate, and subtract the mean from each of the training vectors to make a slightly simpler generative model


Learning factor analysis models1

Learning Factor Analysis Models

Goal: Learn parameters defining model,q={F,S}.

Problem: Hard to estimate parameters q since we don’t know the latent identity vectors, h.

Method: Expectation Maximization (EM) algorithm. Alternately perform E-Step and M-Step until convergence:

  • E-STEP: Calculate the posterior distribution over the latent identity variable, Pr(h|x,q)

  • M-STEP: Maximize the likelihood of the parameters q using expected values of h


Learning e step

Learning: E-Step

Can express this as:

.

.

HIDDEN DIM 2

OBSERVED DIM 2

Generative Model

h

x

OBSERVED DIM 3

OBSERVED DIM 1

HIDDEN DIM 1


Learning e step1

Learning: E-Step

In the E-Step, we use Bayes rule to find the distribution for the identity vector h given the observed data, x:

In this simple subspace model, both of the terms in the denominator are Gaussian so this posterior probability for h can be calculated in closed form.

.

.

HIDDEN DIM 2

OBSERVED DIM 2

Probabilistic Inversion via Bayes’ Rule

h

x

OBSERVED DIM 3

OBSERVED DIM 1

HIDDEN DIM 1


Learning e step2

Learning: E-Step

Let’s consider just the numerator of this expression, since the denominator is just a scaling constant

Now apply Gaussian Relation #2 to the first term

to give:


Learning e step3

Learning: E-Step

Notice that we have a Gaussian times a Gaussian in the same variable here – this must make a Gaussian result. To find the mean and covariance of this, we use Gaussian relation #1


Learning e step4

Learning: E-Step

This distribution has moments around the mean which are given by:

We can reformulate these terms using our two matrix relations:


Learning e step5

Learning E-Step

We can reformulate these terms using our two matrix relations:

to give:

Why should we bother to do this? Well, the matrices in brackets at the top are dxx dx. whereas matrices at the bottom are dh x dh


Learning e step6

Learning: E-Step

In the E-Step, we use Bayes rule to find the distribution for the identity vector h given the observed data, x:

In this simple subspace model, both of the terms in the denominator are Gaussian so this posterior probability for h can be calculated in closed form.

.

.

HIDDEN DIM 2

OBSERVED DIM 2

Probabilistic Inversion via Bayes’ Rule

h

x

OBSERVED DIM 3

OBSERVED DIM 1

HIDDEN DIM 1


Learning m step

Learning: M-Step

Objective function is joint log likelihood of latent variables and data:

Using expected values of h, write take derivatives of log likelihood, set to zero and solve for parameters q, substituting in the expected values of h.


Learning results two factor model

Learning results: two factor model

F1

F2

m

S

m+2F1

m+2F2


Learning results two factor model1

Learning results: two factor model

F1

F2

m

S

m+2F1

m+2F2


Factor analysis performance

Factor Analysis Performance

Can calculate factor analysis performance for face detection in terms of a receiving operator characteristic curve.


Learning results five factor model

Learning results: five factor model

m

S

F2

F1

F4

F5

F3

m+2F1

m+2F2

m+2F3

m+2F4

m+2F5


Learning results five factor model1

Learning results: five factor model

m

S

F2

F1

F4

F5

F3

m+2F1

m+2F2

m+2F3

m+2F4

m+2F5


Factor analysis performance1

Factor Analysis Performance

Can calculate factor analysis performance for face detection in terms of a receiving operator characteristic curve.


Dense object recognition

Sampling from 10 parameter model

  • To generate:

    • Choose factor loadings, hi from standard normal distribution

    • Multiply by factors, F

    • Add mean, m

    • add random noise component ei w/ diagonal covS


Rotational ambiguity

Rotational Ambiguity

Factors are ambiguous up to a rotation:

There is an infinite set of equivalent models each of which has the same probability.


Non linear extensions 1

Non-Linear Extensions 1

Mixture of factor analyzers (MOFA)

  • Two levels of the EM algorithm

  • One to learn each factor analyzer

  • One to learn the mixture model

  • Learning subject to local minima

  • Can describe quite complex manifold structures in high dimensions with only a limited number of parameters

Pixel 2

Pixel 1


Non linear extensions 2

Non-linear Extensions 2

Gaussian Process Latent Variable Models

  • Non-linear version of factor analysis

  • Still a latent space, but now function mapping latent to observed space is nonlinear

  • Learning subject to local minima

Pixel 2

Pixel 1


Dense object recognition5

Dense Object Recognition

7. Relationship to non-probabilistic methods


Factor analysis and pca

Factor Analysis and PCA

  • Factor analysis is very closely related to another common technique in computer vision: principal component analysis (PCA).

  • Motivation of PCA is quite different from that for factor analysis.

    • It is not probabilistic

    • It is primarily concerned with dimensionality reduction

  • Dimensionality Reduction

  • Consider the hidden space as a smaller set of numbers that can approximately describe the image.


Dimensionality reduction

Dimensionality Reduction

+ h3

+ h1

+ h2

m + h1f1 + h2f2 + h3f3 +…

x’

.

.

.

e

x1

h1

.

x2

.

.

HIDDEN DIM 2

OBSERVED DIM 2

.

h2

x3

OBSERVED DIM 3

h3

OBSERVED DIM 1

HIDDEN DIM 1

  • face is approximately represented by the weighted sums of the factors.

  • h (low dimensional) can be used as a proxy for x (high dimensional)


Principal components analysis

Principal Components Analysis

  • KEY IDEAS:

  • Describe data as multivariate Gaussian

  • Project data onto axes of this Gaussian with largest variance

  • Discard all but the largest few dimensions

  • Finds a small set of numbers that describes as much of the variance in the dataset as possible (dimensionality reduction).


Bivariate axis aligned gaussian

s2

x2

s1

x1

Bivariate Axis-Aligned Gaussian


Bivariate axis aligned gaussian1

Bivariate Axis-Aligned Gaussian


Bivariate non axis aligned distribution

x2

x1

Bivariate Non-Axis Aligned Distribution

X’2

X’1


Bivariate distribution

Bivariate Distribution


Fitting a gaussian

Fitting a gaussian

  • Mean and covariance matrix of data define a Gaussian model


Parameters of gaussian

Parameters of Gaussian

  • Mean

  • Covariance


Eigen decomposition

Eigen-Decomposition

As before, we break down this covariance matrix into the product of three other matrices:

where U is a rotation matrix that transforms the principal axes of the fitted Gaussian back to the original co-ordinate system


Eigenvector decomposition

Eigenvector Decomposition

  • If S is an m x m covariance matrix, there exist m linearly independent eigenvectors, and all the corresponding eigenvalues are non-negative.

  • We can decompose S as


Principal component analysis

Principal Component Analysis

  • Compute eigenvectors of covariance,

  • Eigenvectors : main directions

  • Eigenvalue : variance along eigenvector


Dimensionality reduction1

Dimensionality Reduction

  • Co-ords often correlated

  • Nearby points move together


Dimensionality reduction2

Dimensionality Reduction

  • Data lies in subspace of reduced dim.

  • However, for some p,


Approximation

Approximation

  • Each element of the data can be written


Comparison of pca and factor analysis

Comparison of PCA and Factor Analysis

  • Factor analysis gives a probability

  • Factor analysis has a separate noise parameter for each dimension

  • Factors are arbitrary length, but principal components length 1

  • Factor loadings distributed as standard normal, PCA loadings arbitrary scale

  • Principle components are ordered, factors unordered


Dense object recognition6

Dense Object Recognition

5. Known objects under unknown pose and illumination


Dense object recognition7

Dense Object Recognition

6. Objects under partial occlusion


  • Login