- By
**dani** - Follow User

- 109 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'INFORMATION REPRESENTATION AND COMPRESSION' - dani

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

We do not know how to describe locations of blocks so....

Let’s think first about GLOBAL cotnent description in which locations are not considered!

That is look first into the problem in which only

block STATISTICS is considered

(we were illustrating on CAMSHIFT that color

statistics gives good results)

Distribution of DCT coefficients for typical 8x8 DCT block

We can see that higher frequency coefficients are small.

If we use strong quantization they will be quantized to zero.

Under strong quantization only first 4x4 block of

coefficients will be nonzero. This is equivalent to

4x4 DCT transform.

There is another effect too:

The greater the quantization the smaller the number

of DIFFERENT blocks.

In fact, with no quantization, every block is different

Quantization is rounding the coefficients to limited

number of values.

Coefficients of the 4x4 blocks

DC – zero frequency,

average light level in

the block

AC – correspond to

different frequencies

Quantization by QP

[DC]=round[DC/QP]

[AC]=round[AC/QP]

DC

AC ..... ...

AC .... ..... .....

Higher QP -> more zeros in the block

Here is an illustration for a picture

QP is quantization parameter, we see that as it

is increasing the number of DCT patterns is

reduced stronlgy

Now we use the following idea:

Let’s see how the histogram of the quantized DCT blocks looks!

For example, let’s find which blocks appear most often in a picture and create histogram of e.g. first 40 patterns

The shape of this histogram obviously depends on

the quantization. If the quantization is low, the

histogram will tend to be flat. If the quantization

is high it will tend to have a peak.

Let us see example of histograms for two pictures

Histograms of two face images

The database retrieval problem based on block histograms

Assume we have database D of pictures 1,2,..i,,j..m

We take a picture and want to check if it is in the

database or if there are similar pictures there.

Example: database of passport photographs.

In our approach we will use the similarity measure

between pictures based on their quantized histograms

Histograms are treated as vectors and similarity

is based on the following formula:

Bi,j=

i,j єD

The measure is city-block measure (differences between

absolute values of coefficients) and it achives minimum value = 0. Then two histogram vectors should be identical. The closer the value to zero the more similar

pictures should be. Remember that blocks are quantized

so noise and nonrelevant features are removed.

The question is what is the performance of such

scheme but before we can check this, we need to

look into the light normalization problem.

The values of DCT transform coefficients depend on

the light level. If the light level is higher the values are

higher. If we use the same quantization for two identical

pictures with different light levels the quantized blocks will

be different.

Light level can be normalized. First, let’s calculate

average light level for a picture. For this we use values of

DC coefficients in blocks

Here we get average light level for a picture

Average light level DCallin a database is calculated in the

same way based on values of DCmeanfor each picture. Next,

the values of light level for each picture are rescaled by the

factor of

Rescaling makes that the values of coefficients in the

quantized blocks will be similar:

At high quantization levels very many blocks will have

only DC coefficients. Information about these blocks

will be only DC that is what ist the average light level

in the block.

But of interest is how the average light level is changing

between the blocks. We want to use this information.

What we make is that we will account for the information

in the differences between DC values in neighbouring

blocks.

In a) we see fragment of a picture in which DC values

of the blocks are shown. For each block we have 8

neighbours like shown in b). We calculate 9 differences

between the neighbours (8 for directions and 1 for the

average from all directions) as shown in c). Now we order

the differences and form a vector from first k coefficients

as shown in d) for k=4

A combined histogram for AC blocks and DC vectors

is now formed

H =[ HAC , α xHDC ]

where α is a numerical parameter which will be optimized

later.

Combined histogram means that we have two vectors

for minimizing and they are summed with parameter α

Bi,j=

i,j єD

Optimization of database retrieval

The question is: How good can be the database retrieval

based on combined histogram?

This means e.g. how many errors it will be made.

But we can also ask another question: What is the best

achievable performance of this approach?

Remember that we use only statistical information but

we have several parameters which can be selected:

- quantization level

- size of histograms

- parameter α for combining histograms

- size of DC difference vectors

We can check this problem taking some databases and

optimizing the parameters for best retrieval. This will

show us what is the maximum performance. We did this

for face databases using the following scheme:

Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A.

The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR.

The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.

The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR. From the FAR and FRR, an Equal Error Rate (EER) is achieved when both measures take equal values. The lower the EER is, the better is the system's performance, as the total error rate which is the sum of the FAR and the FRR at the point of the EER decreases.

Typical performance of EER histogram for two face databases

- There are two cases:
- Database in which there is only one (standard)
- picture of each person
- 2. Database in which there are multiple pictures of each
- person (and they might very different)
- In case 2. the same person should be retrieved for any of
- its pictures which can be difficult.

The GTF (Georgia Tech Face) database contains the face images of 50 people, from both male and female, each with 15 images. Most of the images were taken in two different sessions to account for the variations in illumination conditions, facial expression, appearance, different scales and orientations. For test, we store the first 11 images of each person in the database and the remaining 4 images serve as key images for retrieval. Therefore, the total number of stored images is 550 and the total number of key images is 200.

The ORL (Olivetti Research Laboratory) database contains 10 different images of 40 persons. Images were taken at different times, with slightly varying lighting, various facial expressions (open/closed eyes, smiling/non-smiling) and facial details (glasses/no-glasses). The ORL has thus more variations for images taken from one person. For experiment, we store the first 6 images of each person in the database and the remaining 4 images serve as key images. Therefore, the total number of stored images is 240 and the total number of key images is 160.

We present results for AC only, for DC only and for

combined histogram

The best result of ORL is obtained when: QP_AC=36, number of AC patterns=80, QP_DC=75, number of Direction-Vector patterns = 300 and α=0.7, γ=7. The best result of GTF is obtained when: QP_AC=10, number of AC patterns = 250, QP_DC = 20, number of Direction-Vector patterns = 400 and α=0.9, γ=5.

Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A.

The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR.

The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.

The FERET database contains overall more than 10,000 images from more than 1000 individuals taken in largely varying circumstances. The FERET database images are divided into several sets which are formed to match its methodology of evaluation. Here we made a test based on the sets fa and fb. In both of them, each face has one picture with picture in fb taken seconds after the corresponding picture in fa. The fa set which has size of 994 images and serves as the database, the fb set which has sizes of 992 images, is used as key images for retrieval from the fa.

FERET is considered difficult database used in evaluation of professional applications:

The best EER result is obtained when: QP_AC = 12, number of AC patterns = 400, QP_DC=12, number of Direction-Vector patterns = 400 and α=0.5, γ=4.

FERET METHODOLOGYOF EVALUATION

For FERET there is another methodology based on calculation of how many correct retrievals will be obtained among n trials, n=1,2,…,3.

FERET evaluation is called cumulative match score.

Results are seen for histogram (red) and is overlaid

with other known good methods. Rank means how many

retrievals are made, one retrieval is most demanding.

For each non-border 4x4 image block, there are eight blocks surrounding it.

Such a 3x3 block matrix is utilized here to generate a Binary Feature Vector (BFV).

Taking the DC coefficients as an example: the nine DC coefficients within this area

form a 3x3 DC coefficient matrix. By measuring and thresholding the magnitude of

differences between the non-center DC’s and the central DC coefficient, a binary vector

length 8 is formed.

- Features based on Binary Feature Vectors

Two different cases are considered here:

Case1:

0 – current coefficient ≤ threshold

1 – current coefficient > threshold

Case2:

0 – current coefficient < threshold

1 – current coefficient ≥ threshold

Example

DC-BFV Histogram (based on DC coeff.)

- AC-BFV Histogram (based on AC coeff.

Example of DC-BFV histogram

Performance results for the Feret database

Result is quite good if we take into account that the method uses statistical information only

On the FERET plot we see the best performance 95%.

Which method it is?

It is called EIGENFACES and it is based on calculation

of eigenvectors and eigenevalues of matrices.

- Construction of Face Space

Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by

Then each face differs from the average face by :

Now covariance matrix of the training images can be

constructed:

where

The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix .

The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors

x is eigenvector for matrix A

ís eigenvalue

If S is an nonsingular nxn matrix then matrix B has the same

eigenvalues

B = SAS-1

nxn matrix has n eigenvalues

Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L:

The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by :

In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation

Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation:

In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained:

where is the number of face classes

A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes:

In the recognition stage, a new image, , is projected into the face space to obtain a vector, :

The distance of to each face class is defined by

For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed:

where

- These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules:
- IF THEN input image is not a face image;
- IF AND THEN input image contains an unknown face;
- IF AND THEN input image contains the face of individual .

The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.

In the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images

The average face of the training set

The first 15 eigenfaces corresponding to the 15 largest eigenvalues.

Recognition rate

Recognition rate depends on training images –

when single view images are used for training recognition

is much worse

Faces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage

Training images

Test images

lower images

are projections

in the face space

Eigenfaces method treat images globally, no local

information is used. Compression is done on

global level. The method requires lots of computations

but results are good. Explanation of good results:

images are represented as combinations of ”simple” images

and the system is trained on them.

Download Presentation

Connecting to Server..