Augmented Reality:Object Tracking and Active Appearance Model Presented by Pat Chan 01/03/2005 Group Meeting
Outline • Introduction to Augmented Reality • Object Tracking • Active Appearance Model (AAM) • Object Tracking with AAM • Future Direction • Conclusion
Introduction • An Augmented Reality system supplements the real world with virtual objects that appear to coexist in the same space as the real world • Properties : • Combine real and virtual objects in a real environment • Runs interactively, and in real time • Registers(aligns) real and virtual objects with each other
Introduction • Display • Presenting virtual objects on real environment • Tracking • Following user’s and virtual object’s movements by means of a special device or techniques • 3D Modeling • Forming virtual object • Registration • Blending real and virtual objects
Object Tracking • Visual content can be modeled as a hierarchy of abstractions. • At the first level are the raw pixels with color or brightness information. • Further processing yields features such as edges, corners, lines, curves, and color regions. • A higher abstraction layer may combine and interpret these features as objects and their attributes.
Object Tracking • Accurately tracking the user’s position is crucial for AR registration • The objective is to obtain an accurate estimate of the position (x,y) of the object tracked • Tracking = correspondence + constraints + estimation • Tracking objects is a sequence of video frames is composed of two main stages: • Isolation of objects from background in each frames • Association of objects in successive frames in order to trace them
Object Tracking • Object Tracking in image processing is usually based on reference image of the object, or properties of the objects. • Tracking techniques: • Kalman filtering • Correlation-based tracking, • Change-based tracking • 2D layer tracking • tracking of articulated objects
Object Tracking • Object Tracking can be briefly divides into following stages: • Input (object and camera) • Finding correspondence • Motion Estimation • Corrective Feedback • Occlusion Detection
Input • Tracking algorithms can be classified into • Single object & Single Camera • Single object & Multiple Cameras • Multiple object & Single Camera • Multiple objects & Multiple Cameras
Single Object & Single Camera • Accurate camera calibration and scene model • Suffers from Occlusions • Not robust and object dependant
Single Object & Multiple Camera • Accurate point correspondence between scenes • Occlusions can be minimized or even avoided • Redundant information for better estimation • Multiple camera Communication problem
Static Point Correspondence • The output of the tracking stage is • A simple scene model is used to get real estimation of coordinates • Both Affine and Perspective models were used for the scene modeling • Static corresponding points were used for parameter estimation • Least mean squares was used to improve parameter estimation
Block-Based Motion Estimation • Typically, in object tracking precise sub-pixel optical flow estimation is not needed. • Motion can be in the order of several pixels, thereby precluding use of gradient methods. • A simple sum of squared differences error criterion coupled with full search in a limited region around the tracking window can be applied.
Adaptive Window Sizing • Although simple block-based motion estimation may work reasonably well when motion is purely translational • It can lose the object if its relative size changes. • If the object’s camera field of view shrinks, the SSD error is strongly influenced by the background. • If the object’s camera field of view grows, the window fails to make use of entire object information and can slip away.
Four Corner Method • This technique divides the rectangular object window into 4 basic regions - each one quadrant. • Motion vectors are calculated for each subregion and each controls one of four corners. • Translational motion is captured by all four moving equally, while window size is modulated when motion is differential. • Resultant tracking window can be non-rectangular, i.e., any quadrilateral approximated by four rectangles with a shared center corner.
Example: Four Corner Method Synthetically generated test sequences:
Correlative Method • Four corner method is strongly subject to error accumulation which can result in drift of one or more of the tracking window quadrants. • Once drift occurs, sizing of window is highly inaccurate. • Need a method that has some corrective feedback so window can converge to correct size even after some errors. • Correlation of current object features to some template view is one solution.
Correlative Method (con’t) • Basic form of technique involves storing initial view of object as a reference image. • Block matching is performed through a combined interframe and correlative MSE: where sc’(x0,y0,0) is the resized stored template image. • Furthermore, minimum correlative MSE is used to direct resizing of current window.
Occlusion Detection • Each camera must possess an ability to assess the validity of its tracking (e.g. to detect occlusion). • Comparing the minimum error at each point to some absolute threshold is problematic since error can grow even when tracking is still valid. • Threshold must be adaptive to current conditions. • One solution is to use a threshold of k (constant > 1) times the moving average of the MSE. • Thus, only steep changes in error trigger indication of possibly wrong tracking.
Improvements • Things can be improved • Good filtering algorithms • Adequate dynamical models • Shape/appearance models need work
Active Appearance Models (AAMs) • Active Appearance Models are generative models commonly used to model faces • Can also be useful for other phenomena • Matching object classes • Deformable appearance models
Active Appearance Models (AAMs) • 2D linear shape is defined by 2D triangulated mesh and in particular the vertex locations of the mesh. • Shape scan be expressed as a base shape s0. • pi are the shape parameter. • s0 is the mean shape and the matrices si are the eigenvectors corresponding to the m largest eigenvalues
A0(u) A1(u) A2(u) A3(u) Active Appearance Models (AAMs) • The appearanceof an independent AAM is defined within the base mesh s0. A(u) defined over the pixels u∈ s0 • A(u) can be expressed as a base appearance A0(u) plus a linear combination of l appearance • Coefficients λi are the appearance parameters.
Active Appearance Models (AAMs) • The AAM model instance with shape parameters pand appearance parameters λ is then created by warping the appearance Afrom the base mesh s0 to the model shape s. Piecewise affine warp W(u; p): (1) for any pixel u in s0 find out which triangle it lies in, (2) warp u with the affine warp for that triangle. M(W(u;p))
u u u u Fitting AAMs • Minimize the error between I (u) and M(W(u; p)) = A(u). • If u is a pixel in s0, then the corresponding pixel in the input image I is W(u; p). • At pixel u the AAM has the appearance • At pixel W(u; p), the input image has the intensity I (W(u; p)). • Minimize the sum of squares of the difference between these two quantities:
Object Tracking with AAM • Objects can be tracked with the trained AAM • 3-D face tracking with AAM search • Pose estimation with AAM
Example • The training set consisted of five images of a DAT tape cassette • DAT cassette was annotated using 12 landmarks • Upon the five training images, a two-level multi-scale AAM was built. aam_tracking_mpeg4.avi
Future Direction • Propose a general object tracking algorithm with the help of AAM • Improve the accuracy of the object tracking algorithm • Improve the fitting speed of the AAM
Conclusion • Introduction on Augmented Reality • Survey on Object Tracking • Introduction Active Appearance Model • Improve the accuracy of object tracking by AAM • Proposed our future research direction