Human Gesture Recognition by Mohamed Bécha Kaâniche

Human Gesture RecognitionbyMohamed Bécha Kaâniche 11/02/2009

Outline Introduction State of the art Proposed Method Human Gesture Descriptor Human Gesture Learning and Classification Preliminary results Conclusion

Introduction HumanGesture Recognition ?? HumanGesture ?? Gesture Recognition ??

Introduction (2) • What is a Gesture ? • Any meaningful movement of the human body ! • To convey information or to interact with environment ! • [Pei 1984] identifies 700 000 non-verbal signals ! • [Birdwhistell 1963] estimates 250 000 Face expressions ! • [Krout 1935] identifies 5000 Hand gestures ! • Gesture signification differs widely from one culture to another ! • Synchronous with speech, gaze, expressions ! • According to [Hall 1973] 65% of communication is non-verbal ! • Non-verbal: gesture, appearance, voice, chronemics , haptics !

Introduction (3)

Introduction (4) • What kind of gesture recognition ? • Identify, eventually interpret automatically human gestures ! • Use a set of sensors and electronic processing units ! • According to the type of sensors we distinguish: • Pen-based gesture recognition • Multi-touch surface based gesture recognition • Tracker-based gesture recognition • Instrumented gloves, Wii remote control,… • Body suits. • Vision-based gesture recognition

Introduction (5) • Vision-based gesture recognition ? • Advantages: Passive, non-obtrusive and « low-cost ». • Challenges: • Efficiency: real-time constraints. • Robustness: background/foreground changes. • Occlusion: Change of the point of view, self-occlusion,… • Categories: • Head/Face gesture recognition • Hand/arm gesture recognition • Body gesture recognition

State of the art • Vision-basedGesture Recognition System SensorProcessing Feature Extraction Gesture Database Gesture Classification RecognizedGesture

State of the art (2) • Issues: • Number of cameras: mono/multi cameras, stereo/multi-view ? • Speed and latency: fast enough with low enough latency interaction. • Structured environment: background, lighting, motion speed. • User requirements: clothes, body markers, glasses, beard,… • Primary features: edges, regions, silhouettes, moments, histograms. • 2D/3D representation. • Time representation: Time aspect representation.

State of the art (3)

State of the art (4)

State of the art (5) • About Motion models based approaches: • Automata based recognition: • Very complex and difficult process ! • Unreliable with monocular environment ! • Computationally expensive ! • Integrate the time aspect in the gesture model. (Unique model!) • Techniques dedicated to posture recognition can be used. • Early Methods: Optical flow, Motion History (MHI, 3D-MHM) • [Calderara 2008] proposes a global descriptor: action signature. • [Liu 2008] proposes local descriptors: cuboids.

Proposed Method • Hypotheses • Monocular environment. • Dedicated to isolated individuals. (For implementation reason) • No restrictions on the environment and the clothes of targets. • Distinguishable body parts: Not handle target far away from the camera. • Assume sensor processing algorithms provided ! • Availability of a segmentation algorithm. • Availability of a people classifier. • Availability of a people tracker. (For implementation reason)

Proposed Method (2) Type of gestures and actions to recognize

ProposedMethod (3) • MethodOverview SensorProcessing Local Motion Descriptors Extraction Gesture Codebook Gesture Classification RecognizedGesture

ProposedMethod (4) • Local Motion Descriptors Extraction FromsensorProcessing Corners Extraction 2D – HoGDescriptors Computation 2D-HoGDescriptorsTracker Local Motion Descriptors

ProposedMethod (5) • GestureCodebook Learning Training videosequences SensorProcessing Local Motion Descriptors Extraction Gesture Codebook Clustering Sequence annotations Code-words

Human Gesture Descriptor • Steps for Human Gesture Descriptor Generation: • Corners Detection • Find interest points where the motion can be easily tracked. • Ensure uniform distribution of feature through the body. • 2D HoG Descriptors Extraction • For each interest point compute a 2D HoG descriptor. • Local Motion Descriptors Computation • Tracking 2D HoG Descriptors to build local motion descriptors • Gesture descriptor Computation: matching the local motion descriptors with the learned code-words.

Human Gesture Descriptor (2) • Corners detection: • Shi-Tomasi features: Given an image and its gradients and respectively through the x axis and the y axis. The Harris matrix for an image pixel in a window of size (u,v) is: [Shi 1994] prove that is a better measure of corner strength than the measure proposed by Harris Detector. Where and are the eigen values of the Harris matrix.

Human Gesture Descriptor (3) • Corners detection (cont’d): • FAST features (Features from Accelerated Segment Test) :

HumanGestureDescriptor (4) • 2D HoGDescriptor: • Descriptor bloc (3x3 cells): 5x5 or 7x7 pixels Cell Corner Point

HumanGestureDescriptor (5) • 2D HoGDescriptor (cont’d): • For each pixel in the descriptor bloc wecompute: • and • For eachcell in the descriptor bloc wecompute: • where K is the number of orientation bins and :

HumanGestureDescriptor (6) • 2D HoGDescriptor (cont’d): • The 2D HoGDescriptorassociated to the descriptor bloc is: • where: • and is a normalisation coefficient defined as: • The dimention of is 9 x K and its components values are in [0..1]

HumanGestureDescriptor (7) • Local Motion Descriptor: • Track 2D HoG Descriptor with the least square method using kalman filter: • Initialization (t=0, first frame): compute new 2D HoG descriptorsFor each of them associate its position « x0 » and initialize the error tolerance « P0 » (2x2 covariance matrix). • Prediction (t>0):For each 2D HoG descriptor in the last frame,use the Kalman filter to predict the relative position of the descriptor « » which is considered as search center.

HumanGestureDescriptor (8) • Local Motion Descriptor (cont’d): • Correction (t>0):Locate the 2D HoG descriptor in the current frame (which is in the neighborhood of the predicted position ) by using its real position (measurement by minimizing the squared error) to carry out the position correction using the Kalman filter : finding the final estimate . Steps 2 and 3 are carried out while the tracking runs. Consider a 2D HoG descriptor tracked successfully during a temporal window, the Local Motion Descriptor is the concatenation of all the values of the descriptors in this temporal window.

HumanGestureDescriptor (9) • Local Motion Descriptor (cont’d): Measurement Update(Correct) Compute the Kalman gain Update estimatewithmeasurement Update the error covariance Time Update (Predict) Project the position ahead Project the error covariance

HumanGesture Learning and Classification • Gesture Learning Training videosequences SensorProcessing Local Motion Descriptors Extraction Gesture Codebook K-meansClustering Sequence annotations Code-words

HumanGesture Learning and Classification (2) • Gesture Learning (cont’d): • k-means: classify the generated local descriptors (for all gestures) into « k » clusters. • Let « n » the number of generated local descriptors, • and « m » the number of gestures in the training set: • where « T » is a parameter (strictly positive integer) which can be fixed empirically or learned with an Expectation Maximization (EM) algorithm. • Minimize total intra-cluster variance (the squared error function):

HumanGesture Learning and Classification (3) • Gesture Classification: • The k-nearest neighboors algorithm: • Given a Gesture codebook database {(code-word,gesture)}, • and an input {code-word}: • For each code-word in the input, • select the k-nearest input code-words in the database using euclidean distance. • For each correspondant output gesture • Vote for the gesture. • Select the gesture that win the vote.

PreliminaryResults • Cuurent Progress: • Evaluate Local Motion DescriptorsGeneration • Training gesturesfrom KTH and IXMAS databases. Boxing Walking

Conclusion • Contributions: • Local Motion Descriptors for Gesture representation. • Tracking local texture-based descriptor. • Future Work: • Add Likelihood information by using Maximization of Mutual Information algorithm for the Gesture Learning Process. • Evaluate SVM classifier and compare its results to the k-nearest neighboors algorithm.

Thankyou for your attention !

Human Gesture Recognition by Mohamed Bécha Kaâniche