Loading in 2 Seconds...
Loading in 2 Seconds...
Visual Object Tracking Based on Local Steering Kernels and Color Histograms IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY VOL. 23, NO.5, MAY 2013. Olga Zoidi , Anastasios Tefas , Member, IEEE Ioannis Pitas, Fellow, IEEE. Overview. Introduction Proposed method
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Visual Object Tracking Based on Local Steering Kernels and Color HistogramsIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGYVOL. 23, NO.5, MAY 2013
Olga Zoidi, AnastasiosTefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
Visual tracking is difficult to accomplish as some reason
Exploit a priori information about the object shape and create model .
Deal with the problem of object tracking under illumination variations, viewing angle, and partial occlusion.
D.Roller, etc, “Model-based object tracking in monocular image sequences of road traffic scenes” Int. J. Comput. Vision, vol. 10, pp. 257-281, Mar.1993.
Use the visual information for the object projection on the image plane, i.e., color, texture, and shape.
Deal with simple object transformation.
Sensitive to illumination changes.
By employing shape matching or contour-evolution techniques . Contour can be represented by active models, such as snakes or B-splines .
Deal with rigid and nonrigid objects.
Incorporate with occlusion detection and estimation techniques.
 A. Yilmaz, X.Li, and M. Shah, “Contour-based object tracking with occlusion handling in video acquired using mobile cameras”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1531-1536, Nov. 2004
 Y.Wang and O. Lee, “Active mesh – a feature seeking and tracking image sequence representation scheme”, IEEE Trans, Image Process, vol.3, no. 5, pp. 610-624, Sep. 1994
By tracking a set of feature points and these features are then grouped.
Problem is the correct distinction between the target object and background features.
The position of the object in the following frame is usually predicted using a linear Kalmanfilter. 
 G. Welch and G. Bishop, “An introduction to the Kalman filter,” Univ. North Carolina, Chapel Hill, NC, Tech. Rep. TR95041, 2000.
Some tips of object tracking algorithm
Proposed tracking approach is an appearance based method using both CHs and LSK descriptor.
First, search image regions in video frame that have high color similarity to the object CH, and get candidate regions.
Next, LSK descriptors of both the target object and candidate search regions are extracted.
Discard the image regions with small CH similarity to the object CH, the new position of the object is selected as the image region, whose LSK representation has the maximum similarity to the one of the target object.
As tracking evolves, target object appearance changes and being a stack containing different instances. Stack is updated with the representation of the most recent detected object.
After object position prediction and search region selection, the search region of size R1*R2 is divided into candidate ROI which size is Q1*Q2.
Parameter d determines a uniform sampling of the candidate object ROIs every d pixels in the search region.
Three color channels’ S of all patches comprise a matrix MCH.
The distribution of MCH takes values and sets a threshold in deciding whether the patch is a valid candidate ROI.
Finally, the binary matrix BCH, whose entry is set to 1 if entry of MCH is ≧threshold and 0, otherwise. BCH will be used in tracking in Object Location section.
Setting , and as the mean, maximal, and minimal values of entries, respectively.
Introduction of LSK descriptors
LSKs descriptors are a nonlinear combination of weighted spatial distances between a pixel p of an image of size N1*N2 and its surrounding M*M pixels (pi). (M is equal to 3 pixels in this paper)
The distance K is measured using a weighted Euclidean distance, which uses as weights the covariance matrix Ci of the image gradients.
In order to get Ci matrix in Ki(p), get gradient vectors gi and formed matrix GiM^2*2. Where .
And Ci can be calculated via the singular value decomposition (SVD) of Gi.
For each neighboring pixel , ,extract K(p) and normalize into , where is the L1-norm.
Above concepts are applied. First converted ROI and search region from RGB to La*b* color space and the LSKs are computed for each channel separately through steps above.
The final representation of ROI is obtained by applying PCA .
Finally, the search region is divided into patches and the LSK similarity matrix, which will be used in next section, is estimated (like color similarity) by applying the cosine similarity measure.
 H. Seo and P. Milanfar, “Training-free, generic object detection using locally adaptive regression kernels,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1688–1704, Sep. 2010.
Object localization in the search region is performed by taking into account CH and LSK similarity of patch to the 1ROI in the previous frame and the 2object instance in the stack.
First, divide the search region into overlapping patches of size equal to the detected object. And for each patch, we extract CH and LSK features.
Then, for each patch, we construct three cosine similarity matrices
The new ROI is decided with the final decision matrix, which is computed by . (* denotes the element-wise matrix multiplication and λ usually takes the value 0.5)
The new candidate object position is at the patch with the maximal value maxi,j(Mij).
Other 4 decision matrix of rotation and scaling are calculated. The final decision for the new object is the one which corresponds to the maximal value of five decision matrices.
The object motion state , and the new state is given by . denotes the process noise, with probability distribution.
After get , we can then get to compute the equation :
Where is a covariance matrix with stochastic model. And this model is adjusted through equations below.
Among the equation above, is the predicted position of a search region’s center.
Quantitative evaluationcomparison is performed through the frame detection accuracy (FDA) measure.
FDA calculates the overlap area between the ground truth object and the detected object D at a given frame t.
The performance of proposed tracker is compared with two other trackers, PF tracker and FT tracker.
The tracker extracted a representation of the target object based on LSK and CH at frame and tried to find its location in the frame .
Proposed method is effective in object tracking under severe changes in appearance, affine transformations, and partial occlusion.
The method cannot handle the case of full occlusion. (The tracker continues tracking another object in the background)
Kalman filter cannot follow sudden changes in the object direction or speed. (Although a larger search region may solve the issue, but it would result in rapid decrease of speed)