Feature Reconstruction Using Lucas-Kanade Feature Tracking and Tomasi-Kanade Factorization

Feature Reconstruction Using Lucas-Kanade Feature Tracking and Tomasi-Kanade Factorization EE7740 Project I Dr. Gunturk

ABSTRACT • Recovering 3-D structure from motion in noisy 2-D images is a problem addressed by many vision system researchers. By consistently tracking feature points of interest across multiple images using a methodology first described by Lucas-Kanade, a 3-D shape of the scene can be reconstructed using these features points using the factorization method developed by Tomasi-Kanade.

x + d Image I Image J d x x The image flow, or velocity field, in the image plane due to object/camera motion can be computed using feature matching. Velocity Flow

Total error E is the weighted sum-squared difference

Approximate I(x-d) using the Taylor series expansion • A good match occurs when E is small, so we need to find a displacement d that minimizes E. • This can be achieved by differentiating E with respect to d, setting it equal to zero, and solving for d. We can approximate the value of I(x-d) using the Taylor series expansion:

Approximate (cont.) first order term approx is sufficient for the calculations. Gradient of the intensity I is can represent the shifted intensity as sum-squared difference can now be represented as

Approximate (cont.) Taking the partial differentials with respect to x,y: equivalently

Approximate (cont.) • Setting differential to 0 -> • This can be represented in matrix form as Zd = e, where

corner detecting Harris filter “cornerness” function • uses these 2 eigenvalues to give a quantitative measure of the corner and edge qualities.

Lucas-Kanade assumptions • Z is invertible, • that the two eigenvalues are large enough to be discernable from noise, • and that the ratio of the two eigenvalues is well-behaved (larger/smaller is not too large). • This is normally not the case.

desirable parameters for a tracker • Accuracy can be related to the local sub-pixel resolution, in which a smaller integration window is desirable in order not to “smooth out” the details in the image. • Robustness pertains to the sensitivity of the tracker to changes in lighting, size of image motion, etc. To handle larger motions, it is intuitive that a larger integration window would work better. • One solution to this problem is a pyramidal Lucas-Kanade algorithm.

pyramidal Lucas-Kanade algorithm • Using a Gaussian pyramid requires estimating the velocity at each pixel by solving Lucas-Kanade equations, using bilinear interpolation to warp the image so we keep all computation at a subpixel accuracy level, and then upsampling, • continuing doing this same process for each layer of the pyramid all the way to the highest resolution (original image).

u=1.25 pixels u=2.5 pixels u=5 pixels u=10 pixels image It-1 image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I Coarse-to-fine optical flow estimation

warp & upsample run iterative L-K . . . image J image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I Coarse-to-fine optical flow estimation run iterative L-K

pseudo-code Goal: Let u be a point on image I. Find its corresponding location v on image J Build pyramid representations of I and J: {IL}L=0,…,Lm and {JL}L=0,…,Lm Initialization of pyramidal guess:

for L = Lm down to 0 with step of -1 Location of point u on image IL: uL = [px py]T = u/2L Derivative of IL with respect to x: Ix(x, y) = IL(x + 1, y) - IL(x – 1, y) 2 Derivative of IL with respect to x: Ix(x, y) = IL(x + 1, y) - IL(x – 1, y) 2 Spatial gradient matrix: Initialization of iterative L-K: for k = 1 to K with step of 1 (or until < accuracy threshold) Image difference: Image mismatch vector: Optical flow (Lucas-Kanade): Guess for next iteration: end of for-loop on k

Final optical flow at level L: Guess for next level L - 1: end of for-loop on L Final optical flow vector: d = g0 + d0 Location of point on J: v = u + d Solution: The corresponding point is at location v on image J

Initial Feature Points • Methodology used to select the initial feature points on image I is as follows: • Compute the G matrix and its minimum eigenvalue lm at every pixel in image I. • Determine the maximum lmax of all the minimum eigenvalues over the whole image. • Retain the image pixels that have a lm value that is 5%-10% of lmax. • From those pixels keep the local max pixels (i.e. pixels are kept if its lm value is larger than any other pixel in its 3x3 neighborhood). • Keep the subset of those pixels so that the minimum distance between any pair of pixels is larger than a given threshold distance (typically 5 or 10 pixels).

Trajectories of image coordinates {ufp,vfp} | f=1…F, p=1...P Input: registered measurement matrix Ŵ Orthographic Case

place origin of the world coordinate at the centroid of the P points. Unit vectors if ,jf point along the direction X,Y of the image respectively The rank theorem

The projection (ufp,vfp) i.e. the image feature point of point sp=(xp,yp,zp) on to frame f  The rank theorem tf : the vector from world origin to the origin of image frame f Note: the origin is placed at the centroid of the object points, and since the origin of the world coordinates Is placed at the centroid of object points

For the registered horizontal image projection we have • To summerize The rank theorem

The registered measurement matrix can be expressed in a matrix form: represents the camera rotation is the shape matrix The rank theorem

The rank theorem • Since R is 2Fx3, S is 3xP , • Rank theorem: without noise, the registered measurement matrix is at most rank 3.

The registered measurement matrix Ŵ will be at most of rank three without noise. • When noise corrupts the images, however, Ŵ will not be exactly of rank 3. • The rank theorem can be extended to the case of noisy measurements in a well-defined manner, however, using approximate rank.

Ŵ can be decomposed into three matrix Ŵ=O1∑O2, O1 and O2 are unitary matrix Approximate rank • We have • Ideally, ∑’ should contains all the singular value of Ŵ, O1’’∑’’O2’’ must be entirely to noise.

All the shape and rotation information in W is contain in three greatest singular values, together with the corresponding left and right eigenvector. Rank theorem for Noisy Measurement

Ř and Š same size as the desired rotation and shape matrices R and S • decomposition is not unique • (ŘQ)(Q-1Š) = Ř(QQ-1)Š = ŘŠ = Ŵ • Since that column space is 3-D because of the rank theorem, R and Ř are different bases for the same space -> linear transformation between them • Ř is a linear transformation of the true rotation matrix R • Š is a linear transformation of the true rotation matrix S.

There exist a 3X3 matrix Q, R= ŘQ, S=Q-1 Š To find Q: R is the rows of true rotation matrix. These metrix constraints yield the over-constrained quadratic system The metric constraints • This is a simple nonlinear data fitting problem.

Experimental Results The 430 features selected by the automatic detection method Tomasi-Kanade.

Experimental Results 388 features selected by the automatic detection method Bishop Reconstructed image Bishop 288 features tracked across 10 images by the automatic detection method Bishop

Conclusions • The pyramidal Lucas-Kanade tracker worked quite well on the images I submitted to it. For larger motions I would like to implement the Shi-Tomasi improvements I read about concerning an automatic scheme for rejecting spurious features in [7], but time constraints have not allowed for me to implement yet. • The Tomasi-Kanade factorization method proved to be a robust solution for generating 3-D coordinates of feature points of rigid objects using the points tracked by the pyramidal Lucas-Kanade tracker.

References • [1] "Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm", Jean-Yves Bouguet, Intel Corporation, Microprocessor Research Labs, • jean-yves.bouguet@intel.com • [2] "A combined corner and edge detector", Chris Harris and Mike Stephens, • Proceedings Fourth Alvey Vision Conference, Manchester, pp 147-151, 1988. • [3] “Good Features to Track”, Jianbo Shi and Carlo Tomasi, • IEEE Conference on Computer Vision and Pattern Recognition (CVPR94), Seattle, June 1994 • [4] “Shape and motion from image streams under orthography: a factorization method.” Carlo Tomasi and Takeo Kanade, International Journal of Computer Vision, 9(2):137-154, November 1992. • [5] http://mathworld.wolfram.com/UnitaryMatrix.html • [6] “Linear and Incremental Acquisition of Invariant Shape Models from Image Sequences”, Daphna Weinshall and Carlo Tomasi, Proceedings: IEEE fourth International Conference of Computer Vision, pp. 675-682, Berlin, May 1993. • [7] “Improving Feature Tracking with Robust Statistics”, A. Fusiello, E. Trucco, T. Tommasini, V. Roberto, Pattern Analysis & Applications (1999)2:312–320, Ó 1999 Springer-Verlag London Limited

Feature Reconstruction Using Lucas-Kanade Feature Tracking and Tomasi-Kanade Factorization