Gesture recognition machine learning for real time musical interaction
Download
1 / 38

Gesture Recognition & Machine Learning for Real-Time Musical Interaction - PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on

Gesture Recognition & Machine Learning for Real-Time Musical Interaction. Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University. Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab. Introductions. Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Gesture Recognition & Machine Learning for Real-Time Musical Interaction' - gannon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Gesture recognition machine learning for real time musical interaction

Gesture Recognition & Machine Learning for Real-Time Musical Interaction

Rebecca Fiebrink

Assistant Professor of Computer Science (also Music)

Princeton University

Nicholas Gillian

Postdoc in Responsive Environments

MIT Media Lab


Introductions
Introductions Interaction


Outline
Outline Interaction

  • ~40 min: Machine learning fundamentals

  • ~1 hour: Wekinator: Intro & hands-on

  • ~1 hour: Eyesweb: Intro & hands-on

  • Wrap-up


Models in gesture recognition mapping
Models in gesture recognition & mapping Interaction

  • What is the current state (e.g., pose)?

  • Was a control motion performed?

    • Ifso, which

    • How?

  • What sound should result from this state, motion, motion quality, etc.?

sensed action

interpretation

model

response (music, visuals, etc.)

human + sensors

sound, visuals, etc.

computer


Supervised learning
Supervised learning Interaction

inputs

training

data

algorithm

model

Training

outputs


Supervised learning1
Supervised learning Interaction

inputs

“Gesture 1”

“Gesture 2”

“Gesture 3”

training

data

algorithm

model

Training

outputs

Running

“Gesture 1”


Why use supervised learning
Why use supervised learning? Interaction

  • Models capture complex relationships from the data. (feasible)

  • Models can generalize to new inputs. (accurate)

  • Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient)



Features
Features Interaction

  • Each data point is represented as a feature vector


Features1
Features Interaction

  • Good features can make a problem easier to learn!


Classification
Classification Interaction

feature2

This model: a separating line or hyperplane(decision boundary)

feature1


Regression
Regression Interaction

output

This model: a real-valued function of the input features

feature


Unsupervised learning
Unsupervised learning Interaction

  • Training set includes examples, but no labels

  • Example: Infer clusters from data:

feature2

feature1


Temporal modeling
Temporal modeling Interaction

  • Examples and inputs are sequential data points in time

  • Model used for following, identification, recognition

Image: Bevilacqua et al., NIME 2007


Temporal modeling1
Temporal modeling Interaction

Image: Bevilacqua et al., NIME 2007



The learning problem
The learning problem Interaction

  • Goal: Build the best** model given the training data

    • Definition of “best” depends on context, assumptions…


Which classifier is best
Which classifier is best? Interaction

“Underfit”

“Overfit”

Competing goals:

Accurately model training data

**Accurately classify unseen data points**

Image from Andrew Ng


A simple classifier nearest neighbor
A simple classifier: nearest neighbor Interaction

feature2

feature1

?


Another simple classifier decision tree
Another simple classifier: Decision tree Interaction

Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html, http://nghiaho.com/?p=1300


Adaboost iteratively train a weak learner
AdaBoost: Iteratively train a “weak” learner Interaction

Image from http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005/FinalProject_KH.htm


Support vector machine
Support vector machine Interaction

  • Re-map input space into a higher number of dimensions and find a separating hyperplane


Choosing a classifier practical considerations
Choosing a classifier: Practical considerations Interaction

  • k-Nearest Neighbor

    + Can tune k to adjust smoothness of decision boundaries

    - Sensitive to noisy, redundant, irrelevant features; prone to overfitting; weird in high dimensions

  • Decision tree:+ Can prune to reduce overfitting, produces human-understandable model

    - Can still overfit

  • AdaBoost

    + Theoretical benefits, less prone to overfitting

    + Can tune by changing base learner, number of training rounds

  • Support Vector Machine

    + Theoretical benefits similar to AdaBoost

    • Many parameters to tune, training can take a long time


How to evaluate which classifier is better
How to evaluate which classifier is better? Interaction

  • Compute a quality metric

    • Metrics on training set (e.g, accuracy, RMS error)

    • Metrics on test set

    • Cross-validation

  • Use it

Image from http://blog.weisu.org/2011/05/cross-validation.html


Neural networks
Neural Networks Interaction

  • TODO: Use nick’s slides


Which learning method should you use
Which learning method should you use? Interaction

  • Classification (e.g., kNN, AdaBoost, SVM, decision tree):

    • Apply 1 of N labels to a static pose or state

    • Label a dynamic gesture, when segmentation & normalization are trivial

      • E.g., feature vector is a fixed-length window in time

  • Regression (e.g., with neural networks):

    • Produce a real-valued output (or vector of real-valued outputs) for each feature vector

  • Dynamic time warping, HMMs, other temporal models

    • Identify when a gesture has occurred, identify probable location within a gesture, possibly also apply a label

    • Necessary when segmentation is non-trivial or online following is needed


Suggested ml reading
Suggested ML reading Interaction

  • Bishop, 2006: Pattern Recognition & Machine Learning. Science and Business Media, Springer

  • Duda, 2001: Pattern Classification, Wiley-Interscience

  • Witten, 2005: Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann


Suggested nime y reading
Suggested NIME- Interactiony reading

  • Lee, Freed, & Wessel, 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems, 1706:244–55. (early example of ML in music)

  • Hunt, A. and Wanderley, M. M. 2002. Mapping performer parameters to synthesis engines. Organised Sound 7, 2, 97–108. (learning as a tool for generative mapping creation)

  • Chapter 2 of Rebecca’s dissertation: http://www.cs.princeton.edu/~fiebrink/thesis/ (historical/topic overview)

  • Recent publications by F. Bevilacqua & team @ IRCAM (HMMs, gesture follower)

  • TODO: Nick, anything else?



The wekinator running in real time
The Wekinator: Running in real time Interaction

Feature extractor(s)

OSC

.01, .59, .03, ...

.01, .59, .03, ...

.01, .59, .03, ...

.01, .59, .03, ...

time

model(s)

Inputs: from built-in feature extractors or OSC.

Outputs: control ChucK patch or go elsewhere using OSC.

5, .01, 22.7, …

5, .01, 22.7, …

5, .01, 22.7, …

5, .01, 22.7, …

time

OSC

Parameterizable process


Brief intro to osc
Brief intro to OSC Interaction

  • Messages sent to host (e.g., localhost) and port (e.g., 6448)

    • Listener must listen on the same port

  • Message contains message string (e.g., “/myOscMessage”) and optionally some data

    • Data can be int, float, string types

    • Listener code may listen for specific message strings & data formats


Wekinator under the hood
Wekinator: Under the hood Interaction

joystick_x

joystick_y

webcam_1

Feature1

Feature2

Feature3

FeatureN

Model1

Model2

ModelM

Parameter1

Parameter2

ParameterM

volume

pitch

3.3098

Class24


Under the hood
Under the hood Interaction

Learning algorithms:

Classification:

AdaBoost.M1

J48 Decision Tree

Support vector machine

K-nearest neighbor

Regression:

Multilayer perceptron NNs

Feature1

Feature2

Feature3

FeatureN

Model1

Model2

ModelM

Parameter1

Parameter2

ParameterM

3.3098

Class24


Interactive ml with wekinator
Interactive ML with Wekinator Interaction

inputs

“Gesture 1”

“Gesture 2”

“Gesture 3”

training

data

algorithm

model

Training

outputs

“Gesture 1”

Running


Interactive ml with wekinator1
Interactive ML with Wekinator Interaction

inputs

“Gesture 1”

“Gesture 2”

training

data

algorithm

model

Training

creating training data

outputs

“Gesture 1”

Running


Interactive ml with wekinator2
Interactive ML with Wekinator Interaction

inputs

“Gesture 1”

“Gesture 2”

training

data

model

algorithm

“Gesture 1”

Training

creating training data… evaluating the trained model

outputs

Running


Interactive ml with wekinator3
Interactive ML with Wekinator Interaction

inputs

“Gesture 1”

“Gesture 2”

“Gesture 3”

training

data

algorithm

model

interactive machine learning

Training

creating training data evaluating the trained model… modifying training data (and repeating)

outputs

“Gesture 1”

Running


Time to play
Time to play Interaction

  • Discrete classifier

  • Continuous neural net mapping

  • Free-for-all


ad