Interactive Machine Learning: Leveraging Human Intelligence

Interactive Machine Learning: Leveraging Human Intelligence Dan R. Olsen Jr. Brigham Young University Dept. of Computer Science

Interactive Machine Learning (IML) More questions than answers

Why is IML of Interest?

Why is IML of Interest? • Exponential Growth • Computer processor speed • Memory size • Available data on the Internet • Static Humans • Fixed USABLE screen size Fixed Cost

Direct manipulation will not scale • Machine Learning can leverage human expression Gigabytes of data

Why is IML of Interest? • Exponential drop in cost • Computing in more domains • Computing freed from the desktop • Interaction • Many new input sensors • Many new situations • No design tools Fixed Power Varying cost

IML Examples • Characteristics of IML • Feedback Effect • When are we done? • Future

Laser-Spot Detection

People Tracking – (Join/Capture)

Skin Tracking – (Light Widgets)

Image Processing with Crayons • Design in minutes not months • Use image painting as the design metaphor • Base the learning on selection from hundreds of features rather than combination of a few

Safe/Unsafe Driving • Problem: Steering robots • Increase operator neglect time so that attention can be used elsewhere • Reduce collisions with unseen objects • Solution: Teach the robot what is safe and what is not

IML Examples • Characteristics of IML • Generalize to artifacts, not just images • Feedback Effect • When are we done? • Future

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator ? ? Labeled Artifacts Analysis Math Program Analysis Math Program

Shape and texture features needed to separate grass from trees

Unlabeled Artifacts Seconds not days Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeling time dominates the process Labeled Artifacts Analysis Math Program Analysis Math Program

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Rapid discovery of wrong solutions Labeled Artifacts Analysis Math Program Analysis Math Program

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Exploration of a space of problems Labeled Artifacts Analysis Math Program Analysis Math Program

IML Examples • Characteristics of IML • Feedback Effect • When are we done? • Future

Unlabeled Artifacts Does Feedback Matter? Trained Function Training Algorithm ? IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program

Labeling to correct

Decision Surface

Learned Decision Surface

Data->Learn->Feedbackimplicit focus on decision surface(like boosting)

Does feedback/correction reduce effort? • Simulation • Artificial oracles • 1) Train with random selection • 2) Train by selecting only corrections of the decision • 15% less training examples  • Margin-based rather than area-based classifiers

Does feedback/correction reduce effort? • User Studies • Fully annotate 20 pictures to create a standard • 1) Have users “paint” classifications without feedback • 2) Have users “paint” with feedback • No significant difference • Why?

IML Examples • Characteristics of IML • Feedback Effect • Are we done yet? • Future

How does the user find out that the current feature set cannot separate grass from trees?

Estimating Accuracy • Strategies from Machine Learning • Hold-out set • K-fold cross validation

Compare to Real Accuracy

Incremental Difference Estimate • Create a classifier C(i) for training set I • Create a classifier C(i+500) for training set (I+500) • Compare how C(i) and C(i+500) classify unlabeled data. • Percent difference is the error estimate

Incremental Difference Estimate

Why doesn’t cross-validation work?

C(i+1) Area of disagreement Training Myopia All Data Labeled Data C(i)

Use incremental classifier distance to suggest regions to label

Unlabeled Artifacts Does Feedback Matter? Trained Function Training Algorithm ? IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Maybe  Analysis Math Program Analysis Math Program

IML Examples • Characteristics of IML • Observations of User Behavior • Observations of Algorithm Behavior • Future

Other artifacts • Text • Classification • Extraction • Video • Are there multiple people in the room? • Audio • Is someone talking in the room? • Sensor streams • Object being shaken • Objects bumped together • Brain sensing

Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Different artifact types have differing labeling effort Analysis Math Program Analysis Math Program

Alternative interfaces • Selection  Classification Select all frames of a football game where the ball is actually In play

Alternative interfaces • Copy and Paste is a learnable data transformation

Alternative interfaces • Similarity metrics • Sesame Street Learning – One of these things is not like the other A Niched Pareto Genetic Algorithm for Multiobjective Optimization Principles And Implementation Of Deductive Parsing Grammatical Trigrams: A Probabilistic Model of Link Grammar

Alternative interfaces • Placing artifacts in a folder structure is classification

Alternative interfaces • Hints • Query by Critique Price too high Wrong color

Interactive Machine Learning: Leveraging Human Intelligence