EE 492 ENGINEERING PROJECT

EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur

Outline • IDENTIFICATION OF THE PROBLEM • LIP CONTOUR EXTRACTION • LIP TRACKING • RESULTS AND CONCLUSION • FUTURE WORK

IDENTIFICATION OF THE PROBLEM • Automatic Speech Recognition (ASR) systems 1.Systems Using Only Acoustic Information - Poor performance in noisy environments 2.Bimodal Audio-Visual Systems - Visual signal often contains information that is complementary to audio information - Visual information is not affected by acoustic noise - The overall performance of the combined sistem is better

Recognition ratio of audio, visual and audio-visual approaches

LIP READING • Obtaining the visual information is known as lip reading problem • Lip tracking is a crucial step of extracting visual features.

LIP TRACKING • Lip tracking problem can be solved in 2 steps: • Extracting lip boundary in the first frame by the help of the user • Trackingthe obtained contour through the subsequent frames automatically

Lip Contour Extraction • Fully automaticsegmentation is a very difficult task • Semi-automatic methods are unavoidable and wanted • Intelligent Scissorsis a robust, accurate, and interactive semi-automatic boundary extraction tool which requires minimal user input.

Intelligent Scissors I • Intelligent Scissors tool provides extracting of object’s contour by using several seed points specified interactively by the user. • Intelligent Scissors algorithm converts the object boundary extraction to the problem of optimal path search in a weighted graph.

Obtaining Weighted Graph • Weighted Graph: The local cost is calculated from every pixel in the image to its neghbouring pixel. • Local Cost Functionals: -Laplacian zero crossing -Gradient Magnitude -Gradient Direction • Pixels that exibit strong edge features are made to have low local costs.

Optimal Path Selection • User Interaction: Seed points are specified on the image after all local costs are calculated. • Contour = Minimal Cost Path: The optimal path from every pixel in the image to the seed point is determined by using Dijkstra’s algorithm.

Live-Wire Tool • Live-Wire Tool: As the user moves the mouse, the optimal path from the free point to the seed point is displayed. • Property of the ‘live-wire’: If the cursor comes in proximity of the edge the ‘live-wire’ snaps to the object boundary. • Extracting the Contour: When the new seed point is specified, the live wire from this point to the previous seed point is taken as a segment of contour.

Extracting of a Lip Contour Using Intelligent Scissors At every move of the mouse the previous ‘live-wire’ is deleted and the new one beginning from the current position of the cursor and ending at the seed point is displayed.

Extraction of Outer Boundaries of Lena and a Lip Image Using Intelligent Scissors

LIP TRACKING Method 1: Non-Rigid Object Tracking Algorithm Method 2: Tracking with “Intelligent Scissors” Method 3: Active Shape Models

Non-Rigid Object Tracking

Results of Non-Rigid Object Tracking Esra-8 Video Sequence Aysel-0 Video Sequence Esra-6 Video Sequence

Evaluation of Algorithm Color Edge Frame 67 Frame 68 Color Segmentation

Remarks • The overall performance of the algorithm is satisfactory. • Advantage: Ability to track the lips through large number of frames. • Drawback: Long computation time of this algorithm in a closed loop mode makes it inappropriate for accurate tracking in real time applications.

Lip Tracking Using Intelligent Scissors Motivations : • A desire to obtain a more accurate and faster lip tracking tool. • Intelligent Scissors may be extended from lip segmentation to lip tracking easily.

Lip Tracking using Intelligent Scissors • Seed points from the first frame are tracked to the following frames and by using Intelligent Scissors the contour of the lip may be extracted automatically. • Suitable seed points are located by using priori information about the lip image. • Used Features: • Gradient Magnitude • Hue Value • Distance between successive seed points

Gradient Magnitude Feature • Lip region has larger gradient magnitude • than its surrounding region • N points with highest gradient magnitudes (N << M×M, M is the search range) are seed candidates.

Hue Values • Hue value is very useful for separating boundary from inner lip regions. • Hue tripple: In addition to the seed point that is going to be tracked, hues of neighbours that are p pixels up and down of the current point are calculated. • Selected Seed Point: From N points having largest gradients the one whose hue tripple is the most similar to the preious seed’s tripple is selected.

The Distance Between Seed Points • The relative poistion of seed points is very important during tracking. The Intelligent Scissor tool gives wrong results if they get too close or too far away from each other. • In the figure above the search range of seed point s2 in the following frame is shown.

Result • Result of the “Tracking Using Intelligent Scissors” method applied • on the 20 frame lip sequence

Active Shape Models Motivations: • Lip tracking is a specific case of the general object tracking problem. Therefore, taking into account the knowledge about the shape of the lip will increse the performance of a tracker. • Active Shape Models may be used for lip tracking on their own as well as for complementing and correcting the errors of a tracker with Intelligent Scissors.

Lip Training Set • The shape of a lip is represented by a set of n2-D points: x={x1,x2,x3,...,xn,y1,y2,y3,...,yn} • If there are s training examples in a set corresponding s vectors are constructed and brought to the same coordinate frame.

Active Shape Models I • Shape Model: We look for a parametric model x=M(b), where b is vector of model parameters. • Principal Component Analysis: Helps to reduce the dimensionality of the data. • Covariance matrix S of shape vectors:

Active Shape Models II • Eigenlips: Eigenvectors of S(φi) are computed and corresponding eigenvalues (λi)are determined . • The matrix Φis formed which contains t eigenvectors corresponding to t largest eigenvalues. Hence: • New Lip Shapes: By changing components of the vector b in a controlled way we may obtain new plausible lip shapes

Applications of Active Shape Models 1. Determining Visemes of a Language 2. Increasing Robustness of any Tracking Algorithm 3. If the shape model of an object is extracted apriory: i) To locate the object in the image ii)To track that object through image sequence

Visemes of a Language • Determining viseme of each letter: Using Acitive Shape Models the parameter vector b of a lip shape corresponding to a letter of a language is obtained. • Benefits to Speech Recognition: Parameter vectors obtained from an image sequence may be fused with acoustic information, thus increasing the recognition rate.

Contribution of EigenLips to Lip Tracking Algorithms • Lip tracking algorithms may give wrong lip contours for frames far from the first frame. • The shape vector of a wrong lip x’ is projected into the shape space: • Distribution of the parameter vector b: • if p(b’) is larger that a given threshold the contour is accepted as correct. • if p(b’) is smaller, then the closest b vector is assigned to to the lip, thus correcting the wrong boundary.

Conclusion I • “Intelligent Scissors” is an interactive semi- automatic image segmentation tool. • May be used for extracting of initial lip boundary as well as for tracking that boundary through image sequence.

Conclusion II • Non-Rigid Object Tracking Algorithm High time complexity Tracking through large number of frames • Tracking with Intelligent Scissors More accurate results Low time complexity Tracking through small number of frames

Future Works • Active Shape Models • The library of lip shapes was obtained • Viseme group for Turkish language • Correction of wrong contours • Extraction & Tracking of contours

Future Works II • The method of “Lip Tracking Using Itelligent Scissors” may be made more robust by imposing Shape Constraint factor. • Given an image, the region of the lip may be located by using Shape Models. • A lip tracking system which is fully based on Active Shape Models may be developed.

EE 492 ENGINEERING PROJECT