Mobile Motion Tracking using Onboard Camera Supervisor: Prof. LYU, Rung Tsong Michael Prepared by: Lam Man Kit Wong Yuk Man
Outline • Motivations • Objective • Methods • Results • Future Work • Q&A
Motivations • Rapid increase in the use of camera-phone. • Commonly used for taking photos or capturing video only. • Is it possible to add more values to the camera and make full use of it ?
Motivations • Unlike traditional camera, camera-phone has more functions than just taking photos. • Camera-phone can perform image processing tasks on the device itself. • Symbian OS makes programming on mobile phones possible.
Objective • To add more values to camera-phone, so that it enhances the human-computer interaction. • Implement real-time motion tracking on Symbian phone, without requiring additional hardware. • Movement detected acts as an innovative input method for different applications: • Camera mouse to control the cursor • New input method for interactive games • Gesture input
Objective • Novel ubiquitous computing applications can be developed !
Motion Estimation • Motion estimation is a process to find the motion vector of the current frame from reference frame(s). • Optical flow • Block matching • Block matching algorithm is an integral part for most of the motion-compensated video coding standards. Eg MPEG 1, MPEG 2, H.263.
Block Matching • Divide the previous frame to small rectangular blocks. • Find the best match for the reference block in current frame. • Calculate motion vector between the previous block and its counterpart in the current frame. • Typical size for a block: 16x16 pixels. • Search Range W: typically 16 or 32 pixels. • Similarity Measures: • Mean Absolute Error (MAE) • Mean Square Error (MSE) • Sum of the Absolute Difference (SAD) • SAD is used in our project. Current frame MV Previous frame
Block Matching center pixel 2W + 1 BW = 1 BH = 1 W = 4 H = 4 Search Window • Search Window (in current frame) • A region which has the same center as the selected block in the previous frame, extended by w pixels in both directions 2BW + 1 2H + 1 2BH + 1 W Block in previous frame
Block-match Motion Estimation • Two kinds of methods commonly used: • Fast Search Algorithms • 2-D Logarithmic Search • 3-Step Search (3SS) • Diamond Search • Exhaustive Search Algorithm (ESA)
Fast Algorithms • Fast Search Algorithms: • 2-D Logarithmic Search • 3-Step Search (TSS) • Diamond Search • Assumption: • The matching error monotonically increases as the search position moves away from the optimal motion vector
Three-Step Search (TSS) 1st Step: Search 8 surroundings and the central point Distance = w/2 pixels Find the best match 2nd Step: Use previous best match as center Repeat 1st step with distance = w/4 pixels 3th Step: Repeat 1st step with distance = w/8 pixels Searched only 25 points Fast Algorithms - TSS Center of Block 1 2 3 Search Window 1 1 1 1 1 1 2 2 2 3 3 3 3 2 3 1 2 1 1 3 3 3 2 2 2
Fast Algorithms • Advantages: • Extremely fast • Disadvantages: • All fast algorithms greatly rely on a monotonically increasing match criteria around the location of the optimal motion vector • Easily fall into local minimum • limited numberof positions examined (only 25 points) inside the search window, only find suboptimal solution
Exhaustive Search • All candidates within search window are examined • (2w+1)2 positions should be examined • Advantage: Good accuracy; Finds best match • Disadvantage: High computational load. Impractical for real-time applications • Solution • Fast Exhaustive Search
Fast Exhaustive Block Matching Algorithms • Much Faster • No performance Loss • Idea: exclude many search positions while still finding best match: • SEA（Successive Elimination Algorithm） • PNSA（Progressive Norm Successive Algorithm） • SEA and PNSA can be calculated quickly
SEA algorithm Slow SAD of two blocks X and Y is defined as By Minkowski inequality Thus, • By calculating the block-sum difference first, we can eliminate many candidate blocks (if D > SAD) before doing slow SAD • There exist fast method to calculate the block-sum for SEA • About 2 times faster than exhaustive search !! Fast Denoted as D
update Fast Exhaustive Block-Matching Algorithm Search range=2W+1 SEA Total No of candidate block: (2w+1)2 …. PNSA …. SAD SAD SAD …. SAD Probability of eliminating invalid candidate block: SEA < PNSA < SAD Computation Load:SEA < PNSA < SAD The smallest SAD
Feature Selection Is that block good? No • Which block should be chosen for tracking? • Flat-colored block is not good. • A block in a region of repeated pattern is not good. • Why is the “eye” a good candidate? • It is a good tracking location because of the brightness difference between the black and skin colors. • How do we find a good feature block? It is a good block !! Is that block good? No
Feature Selection • Goal: • Find a good reference block for tracking • Criteria: • The candidate block should have great SAD with it’s neighbors • It contains “complex” information • Great SAD with neighbors block • Prevent ambiguous detection • Speed up the searching algorithm • Many candidate blocks are eliminated by the tree in upper level • Complex block • Prevent choosing flat region as reference block • Enhance the performance of PDE (Partial Distortion Elimination)
PDE (Partial Distortion Elimination) X: candidate block Y: feature block • Simple, small overhead • Comparison can be done Halfway • Stop if the sub-blocks SAD between block X and Y is already larger than the previous minimum SAD • Removes unnecessary computations efficiently • if the feature block Y has high complexity • It will have great SAD with block X • Increase chance of halfway stop • We implement a simple feature selection algorithm based on the above criteria
Feature Selection • Divide the current frame to small rectangular blocks • For each block, sum all the pixels value, denoted as Ixy (Intensity of the block) • Calculate the variance of each block which represent the complexity of the block • Use Laplacian Mask for each block • The Laplacian operator indicates how the reference block differs from the neighbors • Flat background > small output • Dissimilar with neighbors > large output • Select the block which has the largest Ixy and large variance as the feature block Laplacian Mask
Adaptive Search Window • Conventional method • Search window is defined as a rectangle with the same center as block in previous frame, extended by W pixels in both directions. Search Window Block Center of Search Window
Adaptive Search Window • Proposed method • Center of the search window is predicted based on the previous displacement and previous predicted displacement • Example • Previous motion vector is (1,0), i.e. one pixel to the right • The predicted center of search window can be the position next to the center of the previous block Search Window Block Center of Search Window
Adaptive Search Window • Motivation • To Increase the speed of fast full search algorithm by searching the most probably optimal position first • Need to corporate with Spiral Scan • To increase the chance of finding the true optimum point • Explained in the following slides
Conventional Search Window • We used web camera to track the motion of an object and graph showing its x-axis velocity against time is plotted • Due to the limited size of search window, if an object is moving too fast, the optimal position would fall out of the search window, detection error results |Velocity| < W pixels/s Assume the algorithm is run every second
Adaptive Search Window • Based on the previous optimal position and motion vector, we estimate the next optimal position, and this will be the center of the search window P: Predicted Displacement P’: Previous Predicted Displacement L: Learning Factor, range is [0.5, 1.0] D: Previous Displacement
Adaptive Search Window • Applying adaptive search window method, the relative velocity fall within the range [-20,20], all true optimum points fall into the search window and thus no serious detection error results Relative velocity = actual disp. – expected disp. |Acceleration (relative velocity)| < W pixels/s W: Search Range Assume the algorithm is run every second
Raster Scan Method • Conventional Block Scanning Method • when we use the previous block to find a best match in the current frame, we calculate the Sum of Absolute Difference (SAD) of the previous block with the current block at the left top position of the search window first. Then scan from top to bottom, from left to right. • Simply to implement • Small code overhead Search Window # represents the priority of each current block in block matching
Spiral Scan Method • Proposed Block Scanning Method • Observation • The order of scanning can affect the time to reach the optimum candidate block • When SEA, PPNM or PDE method are used, this can affect the amount of computation • When adaptive search window method is used, the motion vectors are center biased • Objective • Search the motion vector around the center of a search window first • Higher chance to meet the optimal position earlier algorithm run faster
Spiral Scan Method Search Window • First find the SAD at the center of the search window • Then find the SAD at position that are n pixels away from the center where n = [1,BW]
Spiral Scan Method • Proposed Block Scanning Method • Result • Require larger memory space • If fast calculation of block sum is used together, the whole block sum 2D array is needed to be stored. • A bit larger code overhead • Degradation in speed not significant • Spiral Scan Complexity: O(DX2) • SAD Complexity: O(DX2 BW2) • Speed of Algorithm significantly improved, when use with adaptive search method, about 2-3 times speed-up in real-time motion tracking
Static Frame Motion Tracking Table showing the time required to find the motion vector at different regions using different algorithms (Each algorithm is run 5 times using 2.0GHz CPU) Result This is just to illustrate speed is possible to be improved with our final algorithm, a more general measurement will be presented at the next slide Optimal Motion Vector = (-5, 12), Previous Motion Vector = (-2, 4) Affecting Spiral Scan and Adaptive Search Window algorithm
Real-Time Motion Tracking Result • Algorithm using adaptive spiral method can reduce the average distance between search window’s center and optimum block’s position, thus improve the speed of algorithm ( as illustrated in the previous slide ) Average Velocity/Distance ~ 15 pixels Average Velocity/Distance ~ 6 pixels
Summary Video Source captured by camera Already selected feature block? Source Frames Extractor No Frame t Yes Feature Selection MV of reference block Block-matching Algorithm using two image frames Transmitter delay Frame t-1 e.g. Bluetooth A feature block is selected as reference block Server Application
Contribution • Proposed a method to improve the block matching algorithm for our application • Adaptive Spiral Method • Improve performance • Require larger memory space • New combination • Adaptive Spiral SEA PPNM PDE SAD algorithm
Testing Platform on Window • In order to test the performance of our algorithms, we have written a GUI program using Window MFC and OpenCV library.
Testing Platform on Symbian • We finally built an application on Symbian as an testing platform to further test and fine tune our algorithm. • Ready for other applications to build on top and use the motion tracking result directly.
Simple Applications finished • A “pong” game written in C# • Play in Window using Web camera as input device • A “pong” game written in Symbian Language • Play in Symbian phone using onboard camera
Future Work • Further improve the block matching algorithm by hierarchical method • Study and implement algorithms to detect rotation angle • Develop virtual mouse application • Develop multiplayer game • Build motion tracking API on Symbian