Real time vision based gesture recognition using haar like features
Download
1 / 21

Real-Time Vision-Based Gesture Recognition Using Haar-like Features - PowerPoint PPT Presentation


  • 524 Views
  • Uploaded on

Real-Time Vision-Based Gesture Recognition Using Haar-like Features. By: Qing Chen, Nicolas D. Georganas and Emil M. Petriu IMTC 2007, Warsaw, Poland, May 1-3, 2007. Outline. 1. Introduction 2. Two-level Approach 3. Posture Recognition 4. Gesture Recognition 5. Conclusions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Real-Time Vision-Based Gesture Recognition Using Haar-like Features' - kipling


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Real time vision based gesture recognition using haar like features l.jpg

Real-Time Vision-Based Gesture Recognition Using Haar-like Features

By: Qing Chen, Nicolas D. Georganas and Emil M. Petriu

IMTC 2007, Warsaw, Poland, May 1-3, 2007


Outline l.jpg
Outline Features

  • 1. Introduction

  • 2. Two-level Approach

  • 3. Posture Recognition

  • 4. Gesture Recognition

  • 5. Conclusions


1 introduction l.jpg
1. Introduction Features

  • Human-Virtual Environment (VE) interaction requires utilizing different modalities (e.g. speech, body position, hand gestures, haptic response, etc.) and integrating them together for a more immersive user experience.

  • Hand gestures are a intuitive yet powerful communication modality which has not been fully explored for H-VE interaction.

  • The latest computer vision, image processing techniques make real-time vision-based hand gesture recognition feasible for human-computer interaction.

  • Vision-based hand gesture recognition system needs to meet the requirements in terms of real-time performance, robustness and accurate recognition.


1 introduction cont d l.jpg
1. Introduction (cont’d) Features

  • Vision-based gesture recognition techniques can be divided into two categories:

  • Appearance-based approaches:√- Pros: simple hand models; efficient implementation; real-time performance easier to achieve.- Cons: limited capability to model 3D hand gestures.- We choose this approach to achieve the real-time performance.

  • 3D hand model-based approaches:- Pros: potentiality to model more natural hand gestures.- Cons: complex hand model; real-time performance is difficult; user-dependent.


2 two level approach l.jpg
2. Two-level Approach Features

  • Definition 1 (Posture/Pose) A posture or pose is defined solely by the (static) hand configurations and hand locations.

  • Definition 2 (Gesture) A gesture is a series of postures over a time span connected by motions (global hand motion and local finger motion).


2 two level approach cont d l.jpg
2. Two-level Approach (cont’d) Features

  • With the hierarchical nature of the definition, it is natural to decouple the gesture classification problem into two levels:

    • Lower-level: recognition of primitives (postures);

      • Solution: Viola and Jones algorithm

    • Higher-level: recognition of structure (gesture);

      • Solution: Grammar-based analysis

Posture level

Viola & Jones Algorithm

Gesture level

Grammar-based analysis


3 posture recognition l.jpg
3. Posture Recognition Features

  • Viola and Jones Algorithm (2001):

    • A statistical approach originally for the task of human face detection and tracking.

    • 15 times faster than any previous face detection approaches while achieving equivalent accuracy to the best published results.

    • Employed 3 techniques :

      • Haar-like features

      • Integral image

      • AdaBoosting Learning algorithm

    • Issues for hand postures:

      • Applicability

      • Classification besides detection

      • Selection of posture sets

      • Calibration


3 posture recognition cont d l.jpg
3. Posture Recognition (cont’d) Features

  • Haar-like features:

  • The value of a Haar-like feature:

    f(x)=Sumblack rectangle (pixel gray level) – Sumwhite rectangle (pixel gray level)

  • Compared with raw pixels, Haar-like features can reduce/increase the in-class/out-of-class variability, and thus making classification easier.

Figure 1: The set of basic Haar-like features.

Figure 2: The set of extended Haar-like features.


3 posture recognition cont d9 l.jpg

A Features

B

P1

P2

D

C

P3

P4

P (x, y)

The rectangle Haar-like features can be computed rapidly using “integral image”.

Integral image at location of x, y contains the sum of the pixel values above and left of x, y, inclusive:

The sum of pixel values within “D” can be computed by : P1 +P4-P2 -P3

3. Posture Recognition (cont’d)


3 posture recognition cont d10 l.jpg
3. Posture Recognition (cont’d) Features

  • To detect the hand, the image is scanned by a sub-window containing a Haar-like feature.

  • Based on each Haar-like feature fj , a weak classifier hj(x) is defined as:where x is a sub-window, and θis a threshold. pj indicating the direction of the inequality sign.


3 posture recognition cont d11 l.jpg
3. Posture Recognition (cont’d) Features

  • In machine vision:

    • HARD to find a single accurate classification rule;

    • EASY to find rules with classification accuracy slightly better than 50% (weak classifiers) .

    • AdaBoosting (Adaptive Boosting) is an iterative algorithm to improve the accuracy stage by stage based on a series of weak classifiers.

    • Adaptive: later classifiers are tuned up in favor of the samples misclassified by previous classifiers.


Slide12 l.jpg

3. Posture Recognition (cont’d) Features

  • Adaboost starts with a uniform distribution of “weights” over training examples. The weights tell the learning algorithm the importance of the example.

  • Obtain a weak classifier from the weak learning algorithm, hj(x).

  • Increase the weights on the training examples that were misclassified.

  • (Repeat)

  • At the end, carefully make a linear combination of the weak classifiers obtained at all iterations.


3 posture recognition cont d13 l.jpg
3. Posture Recognition (cont’d) Features

  • A series of classifiers are applied to every sub-window.

  • The first classifier:

    • Eliminates a large number of negative sub-windows;

    • pass almost all positive sub-windows (high false positive rate) with very little processing.

  • Subsequent layers eliminate additional negatives sub-windows (passed by the first classifier) but require more computation.

  • After several stages of processing the number of negative sub-windows have been reduced radically.


3 posture recognition cont d14 l.jpg
3. Posture Recognition (cont’d) Features

  • Four hand postures have been tested with Viola & Jones algorithm:

  • Input device: A low cost Logitech QuickCam web-camera with a resolution of 320 × 240 up at 15 frames-per-second.


3 posture recognition cont d15 l.jpg
3. Posture Recognition (cont’d) Features

  • Training samples collection:

    • Negative samples: images that must not contain object representations. We collected 500 random images as negative samples.

    • Positive samples: hand posture images that are collected from humans hand, or generated with a 3D hand model. For each posture, we collected around 450 positive samples. As the initial test, we use the white wall as the background.


3 posture recognition cont d16 l.jpg
3. Posture Recognition (cont’d) Features

  • After the training process based on the AdaBoosting learning algorithm, we get a cascade classifier for each hand posture when the required accuracy is achieved:

    • “Two-finger” posture: 15 stage cascade classifier;

    • “Palm” posture: 10 stage cascade classifier;

    • “Fist” posture: 15 stage cascade classifier;

    • “Little finger” posture: 14 stage cascade classifier.

  • The performance of trained classifiers for 100 testing images:


3 posture recognition cont d17 l.jpg
3. Posture Recognition (cont’d) Features

  • To recognize these different hand postures, a parallel structure that includes all of the cascade classifiers is implemented:


3 posture recognition cont d18 l.jpg
3. Posture Recognition (cont’d) Features

  • The real-time performance of the posture recognition:


4 gesture recognition l.jpg
4. Gesture Recognition Features

  • As a gesture is a series of postures, a grammar-based syntactic analysis is suitable to describe the composite gestures based on postures, and thus enables the system to recognize the gestures based on their representations.

  • For pattern recognition, a grammar G= (N, T, P, S)

    • A finite set N of non-terminal symbols;

    • A finite set T of terminal symbols that is disjoint from N;

    • A finite set P of production rules;

    • A distinguished symbol S Nthat is the start symbol.

  • Issues in modeling the structure of hand gestures:

    • Choice of basic primitives

    • Choice of appropriate grammar type (context free, stochastic context free, regular, HMM)


5 conclusions l.jpg
5. Conclusions Features

  • The parallel cascade structure based Haar-like features and the AdaBoosting learning algorithm can achieve satisfactory real-time hand posture classification results;

  • The experiment result shows the Viola and Jones algorithm has very robust performance against scale invariance and a certain degree of robustness against in-plane rotation (±15˚) and out-of-plane rotation;

  • Viola and Jones algorithm also shows good performance for different illumination conditions, but poor performance for different backgrounds;

  • A two-level architecture that can capture the hierarchical nature of gesture classification is proposed: the lower level focused on the posture recognition while the higher level focused on the description of composite gestures using grammar-based syntactic analysis.


Dziekuje l.jpg
Dziekuje Features


ad