1 / 145

Gesture Recognition

Gesture Recognition. CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington. Gesture Recognition. What is a gesture?. Gesture Recognition. What is a gesture? Body motion used for communication. Gesture Recognition. What is a gesture?

wirt
Download Presentation

Gesture Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gesture Recognition CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

  2. Gesture Recognition • What is a gesture?

  3. Gesture Recognition • What is a gesture? • Body motion used for communication.

  4. Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures.

  5. Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking).

  6. Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking). • Example applications:

  7. Gesture Recognition • What is a gesture? • Body motion used for communication. • There are different types of gestures. • Hand gestures (e.g., waving goodbye). • Head gestures (e.g., nodding). • Body gestures (e.g., kicking). • Example applications: • Human-computer interaction. • Controlling robots, appliances, via gestures. • Sign language recognition.

  8. Dynamic Gestures • What gesture did the user perform? Class “8”

  9. Gesture Types:10 Digits

  10. Gesture Recognition Example • Recognize 10 simple gestures performed by the user. • Each gesture corresponds to a number, from 0, to 9. • Only the trajectory of the hand matters, not the handshape. • This is just a choice we make for this example application. Many systems need to use handshape as well.

  11. Decomposing Gesture Recognition • We need modules for:

  12. Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means.

  13. Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means. • Motion estimation and recognition are quite different tasks.

  14. Decomposing Gesture Recognition • We need modules for: • Computing how the person moved. • Person detection/tracking. • Hand detection/tracking. • Articulated tracking (tracking each body part). • Handshape recognition. • Recognizing what the motion means. • Motion estimation and recognition are quite different tasks. • When we see someone signing in ASL, we know how they move, but not what the motion means.

  15. Nearest-Neighbor Recognition Query M1 M2 • Question: how should we measure similarity? MN

  16. Motion Energy Images • A simple approach. • Representing a gesture: • Sum of all the motion occurring in the video sequence.

  17. Motion Energy Images

  18. Motion Energy Images 7 6 1 2 9 3 8 0 5 4

  19. Motion Energy Images • Assumptions/Limitations: • No clutter. • We know the times when the gesture starts and ends.

  20. If Hand Location Is Known: Example database gesture • Assumption: hand location is known in all frames of the database gestures. • Database is built offline. • In worst case, manual annotation. • Online user experience is not affected.

  21. Comparing Trajectories We can make a trajectory based on the location of the hand at each frame. 6 9 2 4 6 2 8 3 4 5 7 8 1 3 7 5 1

  22. Comparing Trajectories How do we compare trajectories? 6 9 2 4 6 2 8 3 4 5 7 8 1 3 7 5 1

  23. Comparing i-th frame to i-th frame is problematic. What do we do with frame 9? Matching Trajectories 2 2 4 6 8 3 1 5 7 8 9 6 4 3 7 5 1

  24. Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Matching Trajectories 2 2 4 6 8 3 1 5 7 8 9 6 4 3 7 5 1

  25. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  26. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  27. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  28. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  29. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  30. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  31. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  32. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  33. Matching Trajectories 2 4 6 8 3 4 1 7 8 9 6 2 5 1 7 5 3 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp))

  34. Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M1 is matched to Q2 and Q3.

  35. Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M4 is matched to Q5 and Q6.

  36. Matching Trajectories 6 8 6 7 8 2 5 4 1 3 4 9 2 7 3 1 5 M = (M1, M2, …, M8). Q = (Q1, Q2, …, Q9). • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Can be many-to-many. • M5 and M6 are matched to Q7.

  37. Matching Trajectories 6 8 7 6 5 8 2 3 4 1 9 2 4 7 3 1 5 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Cost of alignment:

  38. Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Cost of alignment: cost(s1, t1) + cost(s2, t2) + … + cost(sm, tn) 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5

  39. Matching Trajectories 6 8 7 6 5 8 2 3 4 1 9 2 4 7 3 1 5 • Alignment: • ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). • ((s1, t1), (s2, t2), …, (sp, tp)) • Cost of alignment: • cost(s1, t1) + cost(s2, t2) + … + cost(sm, tn) • Example: cost(si, ti) = Euclidean distance between locations. • Cost(3, 4) = Euclidean distance between M3 and Q4.

  40. Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Rules of alignment. Is alignment ((1, 5), (2, 3), (6, 7), (7, 1)) legal? 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5

  41. Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Rules of alignment. Is alignment ((1, 5), (2, 3), (6, 7), (7, 1)) legal? Depends on what makes sense in our application. 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5

  42. Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: boundaries s1 = 1, t1 = 1. sp = m = length of first sequence tp = n = length of second sequence. 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 first elements match last elements match

  43. Matching Trajectories Illegal alignment (violating monotonicity): (…, (3, 5), (4, 3), …). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: monotonicity. 0 <= (st+1 - st|) 0 <= (tt+1 - tt|) 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot go backwards.

  44. Matching Trajectories Illegal alignment (violating continuity). (…, (3, 5), (6, 7), …). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: continuity (st+1 - st|) <= 1 (tt+1 - tt|) <= 1 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot skip elements.

  45. Matching Trajectories Alignment: ((1, 1), (2, 2), (2, 3), (3, 4), (4, 5), (4, 6), (5, 7), (6, 7), (7, 8), (8, 9)). ((s1, t1), (s2, t2), …, (sp, tp)) Dynamic time warping rules: monotonicity, continuity 0 <= (st+1 - st|) <= 1 0 <= (tt+1 - tt|) <= 1 6 8 6 7 8 2 5 4 4 3 2 9 1 7 3 1 5 The alignment cannot go backwards. The alignment cannot skip elements.

  46. Dynamic Time Warping Dynamic Time Warping (DTW) is a distance measure between sequences of points. The DTW distance is the cost of the optimal alignment between two trajectories. The alignment must obey the DTW rules defined in the previous slides. 8 7 5 2 6 8 6 2 4 3 9 4 1 1 7 3 5

  47. DTW Assumptions • The gesturing hand must be detected correctly. • For each gesture class, we have training examples. • Given a new gesture to classify, we find the most similar gesture among our training examples. • What type of classifier is this?

  48. DTW Assumptions • The gesturing hand must be detected correctly. • For each gesture class, we have training examples. • Given a new gesture to classify, we find the most similar gesture among our training examples. • Nearest neighbor classification, using DTW as the distance measure.

  49. Computing DTW 6 8 7 5 8 6 2 9 2 4 3 1 4 7 3 1 5 Q M • Training example M = (M1, M2, …, M8). • Test example Q = (Q1, Q2, …, Q9). • Each Mi and Qj can be, for example, a 2D pixel location.

  50. Computing DTW • Training example M = (M1, M2, …, M10). • Test example Q = (Q1, Q2, …, Q15). • We want optimal alignment between M and Q. • Dynamic programming strategy: • Break problem up into smaller, interrelated problems (i,j). • Problem(i,j): find optimal alignment between (M1, …, Mi) and (Q1, …, Qj). • Solve problem(1, 1):

More Related