Structural information

Structural information

Structural information • Structural information deals with geometry of objects We are able to deal with very limited amounts of structural information How to interpret structural information? We were showing before that this is difficult problem We will introduce this by SHAPE CONTEXT method

We take now a very difficult case Handwriting is very difficult: We recognize numbers easily even if they are very distorted. What are the algorithms achieving this?

We think that first the contour of object is detected as illustrated below

Next we think that location of points on the contour decide about the geometry of the object • We need thus to measure the location of EACH contour point RELATIVE to all other points. In other words we need vectors from a point to all other points. Z For example for point Z we need all 6 red vectors. Having all vectors for all points describes the object but is very complicated

So now we reduce the description by using APPROXIMATE polar coordinate net. The center of the net is located at each point at we only count HOW MANY other points are in each area of the net.

Shape histogram • Shape histogram of a contour point ai is denoted by Hi and it is a vector obtained from the polar net by counting the number of points in each area Hi = {hin=(#points in bin b), 0<k<M} For a contour with M points we obtain a list of m histograms. Two contours are similar if the sum of differences between the histograms is small.

Histogram differences Hi - Hj = These are differences for two points i, j Taking differences for all contour points will result in the difference between contours. Two contours which are ver ysimilar will have very small difference

Example: Below we can see contours with point marked examples of histograms for points

Example: Here we see handwritten numbers and histograms of contour points marked in grey levels

Here we can see contours with points and the polar net with areas marked in different colours What counts is the number of points in each area and this forms histogram

Othermethods - examples • There are hundreds of other methods for object retrieval and recognition It is impossible to lecture about all of them since they are based on different principles. To illustrate this we can look into an example of a best method known currently. This the method of eigenfaces which uses completely different principle.

EIGENFACES – global method • Construction of Face Space Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by Then each face differs from the average face by :

EIGENFACES Now covariance matrix of the training images can be constructed: where The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix . The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors .

Eigenvalues, eigenvectors x is eigenvector for matrix A, is eigenvalue If S is an nonsingular n x n matrix then matrix B has the same eigenvalues B = SAS-1 nxn matrix has n eigenvalues

EIGENFACES Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L: The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by : In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation

Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation: In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained: where is the number of face classes

A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes: In the recognition stage, a new image, , is projected into the face space to obtain a vector, : The distance of to each face class is defined by

For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed: where • These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules: • IF THEN input image is not a face image; • IF AND THEN input image contains an unknown face; • IF AND THEN input image contains the face of individual .

Experimental results The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.

Experimental results In the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images The average face of the training set

Experimental results The first 15 eigenfaces corresponding to the 15 largest eigenvalues.

Experimental results Recognition rate Recognition rate depends on training images – when single view images are used for training recognition is much worse

Experimental results Faces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage Training images Test images lower images are projections in the face space

CONCLUSIONS Eigenfaces method treat images globally, no local information is used. Compression is done on global level. The method requires lots of computations but results are good. Explanation of good results: images are represented as combinations of ”simple” images and the system is trained on them.

THERE ARE MANY OTHER METHODS FOR OBJECT RECOGNITION AND REPRESENTATION. THEY CAN BE CLASSIFIED AS - STRUCTURAL DESCRIPTIONS (WE MENTIONED ALREADY CHAIN CODES) TRANSFORM METHODS TRAINING/LEARNING METHODS BUT THERE ARE ALSO METHOD BASED ON CLEVER TRICKS WHICH WORK VERY WELL… NEXT

A TRANSFORM METHOD HERE WE TRY TO TRANSFORM THE PICTURE (OR OBJECT INFORMATION) TO SOME OTHER DOMAIN TO GET INFORMATION IN MORE CONVENIENT FORM.

THE METHOD OF MOMENTS MOMENTS of ORDER p,q ARE DEFINED AS MOMENT OF ORDER 1 FOR PHYSICAL OBJECTS WILL BE CENTER OF GRAVITY, IT IS OF COURSE NOT DEPENDENT HOW THE OBJECT IS LOCATED - IT IS THUS INVARIANT FOR LOCATION

CENTRAL MOMENTS

HIGHER ORDER CENTRAL MOMENTS ... AND SO ON...

NEXT, NORMALIZED CENTRAL MOMENTS ARE CREATED: AND INVARIANT MOMENTS: CAN BE DEFINED TOO OTHER MOMENTS

THESE MOMENTS ARE INVARIANT FOR TRANSLATION, ROTATION, AND SCALE CHANGE THUS WHEN MOMENTS ARE CALCULATED, THEY WILL NOT CHANGE WHEN OBJECT ROTATES OR CHANGES SIZE. THIS IS VERY DESIRABLE FEATURE. HOWEVER, MOMENTS ARE SENSITIVE FOR NOISE AND ILLUMINATION CHANGE

EXAMPLE: ROTATED AND SCALED OBJECT HERE MOMENTS CALCULATION IS SHOWN, PLEASE NOTED THAT FOR TRANSFORMED PICTURE THE MOMENTS ARE CONSTANT

PRACTICAL METHODS FOR DEALING WITH VISUAL OBJECTS: • THEY ARE BASED ON SOME TRICKS WHICH RESULT THAT THEY WORK VERY WELL FOR SPECIFIC PROBLEM BUT THEY ARE NOT GENERAL WE ILLUSTRATE THIS ON EXAMPLE OF PRACTICAL FACE TRACKING SYSTEM

WHAT IS FACE TRACKING? THERE IS CAMERA IN FRONT OF PC AND SOFTWARE WHICH ALLOWS TO MARK THE FACE LOCATION AND POSITION OF USER SITTING AT THE DISPLAY HERE WE DESCRIBE A METHOD AND SYSTEM FOR FACE TRACKING WHICH IS QUITE SIMPLE, ROBUST AND RUNS IN REAL TIME ON PC! THE METHOD IS BASED ON FACE COLOR HISTOGRAM STATISTICS AND MOMENTS

HERE IS THE BLOCK DIAGRAM OF FACE TRACKING ALGORITHM. FIRST THE COLOR IMAGE IS CONVERTED TO HUE, SATURATION, INTENSITY. NEXT SKIN COLOR HISTOGRAM IS CALCULATED FINALLY MOMENTS ARE CALCULATED AD WINDOW SIZE IS ADJUSTED ITERATIVELY

SKIN COLOR HISTOGRAM COLOR = HUE IN THE HSI REPRESENTATION PEOPLE HAVE THE SAME SKIN COLOR (HUE) ONLY SATURATION IS DIFFERENT SATURATION LEVELS CHANGE HERE IS THE DISTIRBUTION OF PLACES CORRESPONDING TO FACE ”COLOR”

COLOR IS GOOD FEATURE IF WE HAVE A COLOR CAMERA. HAVING FACE COLOR DISTRIBUTION WE CAN TREAT IT AS TWO-DIMENSIONAL FUNCTION I(x,y) AND CALCULATE: FIRST WE SELECT WINDOW OF CERTAIN SIZE. NEXT CALCULATE ZEROTH AND FIRST MOMENTS IN THIS WINDOW

NEXT NEW CENTER OF THE WINDOW IS CALCULATED AFTER ITERATING THIS CALCULATION THE ALGORITHM WILL CONVERGE TO SPECIFIC POSITION

HOW THE WINDOW SIZE IS SELECTED? IT DEPENDS ON THE SIZE OF FACE. THUS IT IS ADJUSTED ITERATIVELY STARTING WITH SIZE 3 WE THEN SELECT WINDOW SIZE TO BE 2m0/max pixel value BY THIS, THE WINDOW POSITION AND SIZE IS CONTINUOUSLY ADAPTED UNTIL IT WILL STABILIZE THIS CAN THUS BE USED FOR FACE TRACKING

THIS PROCESS IS ILLUSTRATEDHERE , START IS FROM SMALL WINDOW SIZE, THE SIZE IS ADJUSTED AND CENTER OF THE WINDOW IS MOVED UNTIL IT STABILIZES HERE THE FACE HAS MOVED, IN THE NEXT PICTURE THE WINDOW WILL ALSO MOVE TO NEW POSITION

THIS ALGORITHM IS SURPRISINGLY ROBUST NOISE DOES NOT HARM IT AND AS WE CAN SEE IT IS ROBUST AGAINST DISTRACTORS: ANOTHER FACE ON THE LEFT HAND ON THE RIGHT

THE METHOD CAN BE ALSO USED FOR EVALUATION OF HEAD ROLL, WIDTH AND LENGTH ROLL

PARAMETERS FOR HEAD POSITION CAN BE CALCULATED BASED ON THE SYMMETRY OF LENGTH L AND WIDTH W

THIS SYSTEM CAN BE USED FOR FACE TRACKING E.G. FOR INTERFACE TO COMPUTER GAMES

ANOTHER EXAMPLE:AMBULATORY VIDEO COMPUTER WITH CAMERA WEARABLE BY USER

THE GOAL IS TO BUILD COMPUTER WHICH WILL KNOW WHERE THE USER IS The user is wearing small camera attached e.g. to head. The camera produces circular picture which are not very good but good enough

HOW TO RECOGNIZE WHERE THE USER IS ? (E.G. ROOM, STREET) FIRST, SPLIT VIDEO INTO LIGHT INTENSITY I AND CHROMINANCES IN VERY APPROXIMATE WAY: I=R+G+B Cr=R/I Cg=G/I SECOND, SEGEMENT THE PICTURE INTO REGIONS, CALCULATE PARAMETERS FOR EACH, MEAN AND COVARIANCE

FOR EACH ENVIRONMENT THERE WILL BE DIFFERENT STATISTICAL DISTRIBUTIONS OF SIGNALS , WE CAN USE THEMTO FIND TO WHICH CLASS RECORDED VIDEO BELONGS

Label Correlation Coeff. Office 0.9124 Lobby 0.7914 Bedroom 0.8620 Cashier 0.8325 FOR 2 HOURS OF RECORDING RESULTS ARE VERY GOOD 

Structural information