1 / 38

Gesture Recognition & Machine Learning for Real-Time Musical Interaction

Gesture Recognition & Machine Learning for Real-Time Musical Interaction. Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University. Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab. Introductions. Outline.

gannon
Download Presentation

Gesture Recognition & Machine Learning for Real-Time Musical Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab

  2. Introductions

  3. Outline • ~40 min: Machine learning fundamentals • ~1 hour: Wekinator: Intro & hands-on • ~1 hour: Eyesweb: Intro & hands-on • Wrap-up

  4. Models in gesture recognition & mapping • What is the current state (e.g., pose)? • Was a control motion performed? • Ifso, which • How? • What sound should result from this state, motion, motion quality, etc.? sensed action interpretation model response (music, visuals, etc.) human + sensors sound, visuals, etc. computer

  5. Supervised learning inputs training data algorithm model Training outputs

  6. Supervised learning inputs “Gesture 1” “Gesture 2” “Gesture 3” training data algorithm model Training outputs Running “Gesture 1”

  7. Why use supervised learning? • Models capture complex relationships from the data. (feasible) • Models can generalize to new inputs. (accurate) • Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient)

  8. Data, features, algorithms, and models: the basics

  9. Features • Each data point is represented as a feature vector

  10. Features • Good features can make a problem easier to learn!

  11. Classification feature2 This model: a separating line or hyperplane(decision boundary) feature1

  12. Regression output This model: a real-valued function of the input features feature

  13. Unsupervised learning • Training set includes examples, but no labels • Example: Infer clusters from data: feature2 feature1

  14. Temporal modeling • Examples and inputs are sequential data points in time • Model used for following, identification, recognition Image: Bevilacqua et al., NIME 2007

  15. Temporal modeling Image: Bevilacqua et al., NIME 2007

  16. How supervised learning algorithms work (the basics)

  17. The learning problem • Goal: Build the best** model given the training data • Definition of “best” depends on context, assumptions…

  18. Which classifier is best? “Underfit” “Overfit” Competing goals: Accurately model training data **Accurately classify unseen data points** Image from Andrew Ng

  19. A simple classifier: nearest neighbor feature2 feature1 ?

  20. Another simple classifier: Decision tree Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html, http://nghiaho.com/?p=1300

  21. AdaBoost: Iteratively train a “weak” learner Image from http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005/FinalProject_KH.htm

  22. Support vector machine • Re-map input space into a higher number of dimensions and find a separating hyperplane

  23. Choosing a classifier: Practical considerations • k-Nearest Neighbor + Can tune k to adjust smoothness of decision boundaries - Sensitive to noisy, redundant, irrelevant features; prone to overfitting; weird in high dimensions • Decision tree:+ Can prune to reduce overfitting, produces human-understandable model - Can still overfit • AdaBoost + Theoretical benefits, less prone to overfitting + Can tune by changing base learner, number of training rounds • Support Vector Machine + Theoretical benefits similar to AdaBoost • Many parameters to tune, training can take a long time

  24. How to evaluate which classifier is better? • Compute a quality metric • Metrics on training set (e.g, accuracy, RMS error) • Metrics on test set • Cross-validation • Use it Image from http://blog.weisu.org/2011/05/cross-validation.html

  25. Neural Networks • TODO: Use nick’s slides

  26. Which learning method should you use? • Classification (e.g., kNN, AdaBoost, SVM, decision tree): • Apply 1 of N labels to a static pose or state • Label a dynamic gesture, when segmentation & normalization are trivial • E.g., feature vector is a fixed-length window in time • Regression (e.g., with neural networks): • Produce a real-valued output (or vector of real-valued outputs) for each feature vector • Dynamic time warping, HMMs, other temporal models • Identify when a gesture has occurred, identify probable location within a gesture, possibly also apply a label • Necessary when segmentation is non-trivial or online following is needed

  27. Suggested ML reading • Bishop, 2006: Pattern Recognition & Machine Learning. Science and Business Media, Springer • Duda, 2001: Pattern Classification, Wiley-Interscience • Witten, 2005: Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann

  28. Suggested NIME-y reading • Lee, Freed, & Wessel, 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems, 1706:244–55. (early example of ML in music) • Hunt, A. and Wanderley, M. M. 2002. Mapping performer parameters to synthesis engines. Organised Sound 7, 2, 97–108. (learning as a tool for generative mapping creation) • Chapter 2 of Rebecca’s dissertation: http://www.cs.princeton.edu/~fiebrink/thesis/ (historical/topic overview) • Recent publications by F. Bevilacqua & team @ IRCAM (HMMs, gesture follower) • TODO: Nick, anything else?

  29. Hands-on with Wekinator

  30. The Wekinator: Running in real time Feature extractor(s) OSC .01, .59, .03, ... .01, .59, .03, ... .01, .59, .03, ... .01, .59, .03, ... time model(s) Inputs: from built-in feature extractors or OSC. Outputs: control ChucK patch or go elsewhere using OSC. 5, .01, 22.7, … 5, .01, 22.7, … 5, .01, 22.7, … 5, .01, 22.7, … time OSC Parameterizable process

  31. Brief intro to OSC • Messages sent to host (e.g., localhost) and port (e.g., 6448) • Listener must listen on the same port • Message contains message string (e.g., “/myOscMessage”) and optionally some data • Data can be int, float, string types • Listener code may listen for specific message strings & data formats

  32. Wekinator: Under the hood joystick_x joystick_y webcam_1 … Feature1 Feature2 Feature3 FeatureN Model1 Model2 ModelM … … Parameter1 Parameter2 ParameterM volume pitch 3.3098 Class24

  33. Under the hood Learning algorithms: Classification: AdaBoost.M1 J48 Decision Tree Support vector machine K-nearest neighbor Regression: Multilayer perceptron NNs … Feature1 Feature2 Feature3 FeatureN Model1 Model2 ModelM … … Parameter1 Parameter2 ParameterM 3.3098 Class24

  34. Interactive ML with Wekinator inputs “Gesture 1” “Gesture 2” “Gesture 3” training data algorithm model Training outputs “Gesture 1” Running

  35. Interactive ML with Wekinator inputs “Gesture 1” “Gesture 2” training data algorithm model Training creating training data outputs “Gesture 1” Running

  36. Interactive ML with Wekinator inputs “Gesture 1” “Gesture 2” training data model algorithm “Gesture 1” Training creating training data… evaluating the trained model outputs Running

  37. Interactive ML with Wekinator inputs “Gesture 1” “Gesture 2” “Gesture 3” training data algorithm model interactive machine learning Training creating training data evaluating the trained model… modifying training data (and repeating) outputs “Gesture 1” Running

  38. Time to play • Discrete classifier • Continuous neural net mapping • Free-for-all

More Related