CIS – 750 Advisor – Longin Jan Latecki Presented by – Venugopal Rajagopal

Object Tracking And SHOSLIF Tree Based Classification Using Shape And Color FeaturesAuthor: Lucio Marcenaro, Franco Oberti and Carlo S. Regazzoni CIS – 750 Advisor – Longin Jan Latecki Presented by – Venugopal Rajagopal

Introduction • Main functionalities of video- surveillance system Detection Tracking of objects acting within the guarded environment • Higher modules of the system responsible for object and event classification • Shape, color, motion are the frequent features used to achieve the above tasks. • Here we use shape and color related features for tracking and recognizing objects. Shape – to classify among different postures, provides finer discriminant feature allowing objects within a same general class to be classified Histogram – basis for classification between different objects

Introduction (contd.) • Novel approach for tracking and recognition • Corner groups and objects histograms are used as basis features for multilevel shape representation. • Methods for representing models for the tracking and recognition phases are based on Generalized Hough Transform and on SHOSLIF trees.

System Architecture Sensor (Camera) Low level IP High level IP Classification Corners Histogram ROI Blobs Corners Histogram Corners based representation SHOSLIF tree Short term memory Surveillance System Modules

Low Level Image Processing • Performs the first Stage of Abstraction from the sequence acquired from the sensor to the representation that is used for tracking and classification. • From the acquired frame mobile areas of the image (blobs) are detected by a frame-background difference and analyzed by extracting numerical characteristics (e.g.. Geometrical and shape properties) • Blob Analysis is performed by the following modules: Change detection: By using statistical morphological operators it identifies the mobile blobs present in the image that exhibit remarkable difference with respect to the background. Focus of attention: The minimum rectangle bounding (MBR) each of the blob in the image is detected using some fast image segmentation algorithm. • History of Detected ROI and blobs are maintained in terms of temporal graph, which are used for further processing by the higher level modules.

Detecting BLOBS and MBR

Temporal Graph • Temporal graph provides information on the current bounding boxes and their relations to the boxes detected in the previous frames. • All nodes of each level are the blobs detected in each frame. • Relationships among the blobs belonging to different adjacent levels are represented as arcs between the nodes. • Arcs are inserted on the basis of superposition of the blob areas on the image plane. If a blob at step (k-1) overlaps a blob at step k, then a link between them is created, so that the blob at step (k-1) is called "father" of the blob at time step k (its "son").

Temporal Graph (contd.) • Different events can occur: 1) If a blob has only one "father", its type is set "one-overlapping" (type o), and father label is assigned to it. 2) If a blob has more than one "father", its type is set to "merge" (type m), and a new label is assigned. 3) If a blob is not the only "son" of its father, its type is set to "split" (type s), and a new label is assigned. 4) If a blob has no "father", its type is set to "new" (type n) and a new label is assigned.

Temporal Graph (contd.) A sequence of images showing critical cases of blob splitting, merging and displacement. Each image contains the detected blobs with their numerical label and type.

Temporal Graphs (contd.) • Figure showing the bounding boxes and the temporal graph representing the correspondences between the bounding boxes

High Level Image Processing(Corner Extraction) • High level image processing extracts high-curvature points (corners) and histograms from each detected object. • General procedure to extract corners: • Gradient of the input gray-level image is computed using the Sobel Operator. • Edges are extracted by using the gradient magnitude. A pixel of the image is considered to be a point of an edge if its gradient magnitude is greater than a fixed threshold. • If large variation in the direction of the gradient is found in a neighborhood of edge points, then a corner is detected. • Given an image to extract corners: Edges are extracted first using sobel filter. The maximum variation of gradient direction of the edges points inside a square kernel is evaluated. If the maximum variation is greater than a threshold then the pixel at the center of the kernel is selected as a corner and its gradient direction is fixed as a the corner direction.

Corner Extraction (contd.) • This Figure shows the corner extraction steps: • Original Image • Edges image • Corners extracted

Tracking and Recognition Modules • System uses short-term memory, associated with the tracking process and long term memory associated with the recognition process. • This module performs tasks based on two working modalities: learning and matching • The tracking module enters in this modality whenever the object is not overlapped in order to update the short-term object model. • The recognition module builds up a self organizing tree during the learning modality.

Tracking and Recognition Modules (contd.) • Recognition Module (learning phase): A set of human classified samples presented to the tree which automatically organizes them in such a way to maximize the inter-class distances, minimizing the intra-class variances. • Recognition Module (Matching phase): SHOSLIF tree used for objects classification, each object that has been detected from the lower levels of the system is presented to the classification tree that outputs the estimated class for that object and the nearest training sample.

Generalized Hough Transform (GHT) • Technique used to find arbitrary curves in an image without having a parametric equation of them. • A look-up table called R-table is used to model the template shape of the object. • This R-table is used as a transform mechanism. • To build the R-table first a reference point and several feature points of the shape are selected.

GHT (contd.) Given a shape we wish to localize the first stage is to build up a look up table Known as R-table which will replace the need for parametric equation in the Transform stage.

GHT (contd.) For each feature point the orientation “omega” of the tangential line at that point, the length “r”, and the orientation “beta” of the radial vector that joins the reference point and the feature point can be calculated.

GHT (contd.) • If “n” is the number of feature points, a indexed table of size n X 2 can be created using all “n” pairs (r,beta) and using “omega” as index. • This table is the model of the shape and it can be used with a transformation to find occurrences of the same object in other images. • The shape is localized using a voting technique.

GHT (contd.) • Given an unknown image each edge point is segmented and its orientation “omega” is calculated. Using “omega” as an index into the R-table each (r,beta) at this location is extracted.

GHT (contd.) • Using the pair (r,beta), the possible position for the reference point can be computed and an accumulator of its position is incremented, the maximum accumulator value will occur with high probability at the actual reference point.

Modified GHT • In our approach the GHT is modified to automatically extract the model of the object (R-table) and also to individuate the position of the object (voting). • Corners extracted from the object are used as feature points and a different parameterization is used. Instead of using pairs (r, beta), pairs (dx, dy) are used where dx and dy are the differences in “x” and “y” with respect to the reference point. • Instead of using a 2 X N indexed table, a 3 X N table is used.

Modified GHT (contd.) • The first value is the angle direction “omega” of the gradient vector at the corner position with respect to the original image. The obtained triplet (omega, dx, dy) is used to model the position and orientation of the corner with respect to the reference point. Here in this approach for a corner not all of the “n” possible corners are voted, but only the ones that have “omega” similar to the one obtained, thus minimizing computational time and memory requirements.

Corner Based Tracker • The output from the Low level Image processing stage (MBR’s and the correspondence graphs) is used as the input for the tracking stage in order to detect the objects present in isolated boxes when they merge so forming a group. • Learning model phase applied to the isolated rectangles in 2 or 3 frames before the union takes place. • When the boxes are merged the matching phase used in order to find the position of the objects inside the merged rectangles.

Corner Based Tracker (Learning Phase) • The input is the gray level image of the desired object. The center of the gray-level image is selected as reference point. • Gradient operator applied to extract edges (sobel) and for every edge point the direction of the gradient is calculated. Then the corners are extracted. • For each corner “dx” and “dy” calculated and stored in the R-table, which represents the obtained model of the object. • For robustness the previous method is applied to different images of the object (frames of a sequence), and a unique R-table is constructed by selecting the corners that are present in most of the images at the same location and with the same orientation.

Corner Based Tracker (Matching Phase) • The input are the R-table of the searched object and the gray level image in which the object should be present. • As in the learning phase the gradient operator applied to the input image and corners extracted. • For every extracted corner, “omega” computed and if present in the R-table, then the possible position for the reference point can be calculated using (dx, dy) and its accumulator can be incremented. As in the GHT the reference point will be found at the maximum accumulator value.

Object Classification • The long term recognition module uses corner representation and histograms features extracted by the Image Processing modules (previous steps) as a basis for objects classification. • SHOSLIF (Self Organizing Hierarchical Optimal Subspace Learning and Interference Framework) is the tool used for the objects classification. • Input to the SHOSLIF is a set of labeled patterns X = {(xn wn): n= 1..N}, i.e. the training set, where “xn” is a vector of dimensionality K representing the observed sample and “wn” is a class associated with “xn”, chosen in a set of “C” classes. • The SHOSLIF algorithm produces as output a tree whose nodes contain decreasing set of samples, with root node containing all samples in X.

SHOSLIF • Uses the theories of optimal linear projection to generate a space defined by the training images. • This space is generated using two projections: Karhunen-Loeve projection to produce a set of Most Expressive Features (MEFs) Subsequent discriminant analysis projection to produce a set of Most Discriminating Features (MDFs) • System builds a network that tessellates these MEF/MDF spaces for recognizing objects from images.

SHOSLIF (contd.) Fig: Tree example a) Sample Partitioning in the feature space b) Tree structure

SHOSLIF (contd.)Most Expressive Features (MEF) • Each input sub image is treated as a high-dimensional feature vector by concatenating the rows of the sub image. • Perform Principal Component analysis on the set of training images. • PCA analysis utilizes the eigen vectors of the sample scatter matrix associated with the largest eigenvalues. These vectors are in the direction of the major variations in the samples and as such can be used as a basis set with which to describe the image samples. Using these eigen vectors the image can be reconstructed close to original. • Since the features produced in this projection give minimum square error for approximating an image and show good performance in image reconstruction its called the Most Expressive features.

SHOSLIF (contd.)Most Discriminating Features (MDF) • The features produced by MEF are not good for discriminating among classes defined by the set of samples (fails when two same images with different light intensity are present). • So on the features got from MEF, linear dicriminant analysis (LDA) is performed. • In LDA the between class scatter is maximized while minimizing the within-class scatter. • The features obtained from LDA optimally discriminate among the classes represented in the training set. Due to this it is called the Most Discriminating Features.

SHOSLIF (contd.)Tree Construction • Each level of the tree has an expected radius r(l) of the space it covers, where “l” is the level. • d(X,A) is the distance measure between node “N” with center vector “A” and a sample vector “X”. • Root node contains all the images from the training set. • Every node which contains more than a single training image computes a projection matrix “V” , which are features obtained from projecting the features in to the MEF space. • If the training samples contained in a node are drawn from multiple classes (indicated by the labels), then MEF vectors are used to compute a projection matrix “W” which are the features obtained from projecting the features (MEF) in to the MDF space.

SHOSLIF (contd.)Tree Construction (contd.) • If the training samples in a node are from a single class we leave it the way it is. • Each node contains the feature vectors which are within the radius covered by one of the children. • If we want to add a training sample “X” to node “N” at level “l”, first check has been made whether the feature vectors of “X” is within the radius covered by one of the children's of the node “N” , then “X” can be added as one of the descendants of that child, if the feature vectors of “X” is outside the radius then “X” is added as a new child of “N”.

SHOSLIF (contd.)Image Retrieval General Flow of each SHOSLIF Processing Element

Object Classification (contd.) • The SHOSLIF setup is used to organize corners extracted by blobs associated during a learning phase. • A training set “X” is represented by a set of pairs (corners, class). • One problem is the dimension of the input vectors are fixed in a SHOSLIF tree. • The feature selection is performed by partitioning the corner set C(t) into “M” regions where “M” is the desired cardinality for the pattern “x” to be given to the SHOSLIF.

Object Classification (contd.) X* Corner partitioning process example: a) first division along x-axis b) second division along y-axis c) third division along x – axis d) final areas with M = 16 12 6 6 25 25 X* 13 X* (b) (c) (d) (a)

Object Classification (contd.) • The corner set is chosen by iteratively partitioning the blob into two areas, each characterized by the same number of corners. • For each region the vector median corner in the survived local population is chosen as representative sample. • The next figure shows the survived corners as two set of connected points: a) external closed lines connect median corner points in outer regions. b) internal lines connect corner points in inner regions.

Object Classification (contd.) Examples of Survived Corners

Object Classification (contd.) • In this way the vector “xn” is computed for each sample and a class label is associated with it, which is given as input to the SHOSLIF tree.

Results • Training set : 328 Samples distributed over the classes. • Test set : 30 Samples • Misdetection probability over the test set was 15% • Second test done by using histograms for objects identification • Misdetection probability over the test set was 8%

Results (contd.) Example figure shows the probed figure in the left hand side and the retreived figure in the right hand side.

Conclusion • A method for tracking and classifying objects in a video-surveillance system has been presented. • A corner based shape model is used for tracking and for recognizing an object. • Classification performed by using SHOSLIF trees. • Computed misdetection probabilities confirm the correctness of the proposed approach.

References • A. Tesei, A. Teschioni, C.S. Regazzoni and G. Vernazza, “ Long Memory matching of interacting complex objects from real image sequences” • F. Oberti and C.S. Regazzoni, “ Real-Time Robust Detection of Moving Objects in Cluttered Scenes” • D.L. Swets and J. Weng, “Hierarchical Discriminant Analysis fro Image Retrieval”

CIS – 750 Advisor – Longin Jan Latecki Presented by – Venugopal Rajagopal