550 likes | 638 Views
Interactive Machine Learning: Leveraging Human Intelligence. Dan R. Olsen Jr. Brigham Young University Dept. of Computer Science. Interactive Machine Learning (IML). More questions than answers. Why is IML of Interest?. Why is IML of Interest?. Exponential Growth Computer processor speed
E N D
Interactive Machine Learning: Leveraging Human Intelligence Dan R. Olsen Jr. Brigham Young University Dept. of Computer Science
Interactive Machine Learning (IML) More questions than answers
Why is IML of Interest? • Exponential Growth • Computer processor speed • Memory size • Available data on the Internet • Static Humans • Fixed USABLE screen size Fixed Cost
Direct manipulation will not scale • Machine Learning can leverage human expression Gigabytes of data
Why is IML of Interest? • Exponential drop in cost • Computing in more domains • Computing freed from the desktop • Interaction • Many new input sensors • Many new situations • No design tools Fixed Power Varying cost
IML Examples • Characteristics of IML • Feedback Effect • When are we done? • Future
Image Processing with Crayons • Design in minutes not months • Use image painting as the design metaphor • Base the learning on selection from hundreds of features rather than combination of a few
Safe/Unsafe Driving • Problem: Steering robots • Increase operator neglect time so that attention can be used elsewhere • Reduce collisions with unseen objects • Solution: Teach the robot what is safe and what is not
IML Examples • Characteristics of IML • Generalize to artifacts, not just images • Feedback Effect • When are we done? • Future
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator ? ? Labeled Artifacts Analysis Math Program Analysis Math Program
Shape and texture features needed to separate grass from trees
Unlabeled Artifacts Seconds not days Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeling time dominates the process Labeled Artifacts Analysis Math Program Analysis Math Program
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Rapid discovery of wrong solutions Labeled Artifacts Analysis Math Program Analysis Math Program
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Exploration of a space of problems Labeled Artifacts Analysis Math Program Analysis Math Program
IML Examples • Characteristics of IML • Feedback Effect • When are we done? • Future
Unlabeled Artifacts Does Feedback Matter? Trained Function Training Algorithm ? IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Analysis Math Program Analysis Math Program
Data->Learn->Feedbackimplicit focus on decision surface(like boosting)
Does feedback/correction reduce effort? • Simulation • Artificial oracles • 1) Train with random selection • 2) Train by selecting only corrections of the decision • 15% less training examples • Margin-based rather than area-based classifiers
Does feedback/correction reduce effort? • User Studies • Fully annotate 20 pictures to create a standard • 1) Have users “paint” classifications without feedback • 2) Have users “paint” with feedback • No significant difference • Why?
IML Examples • Characteristics of IML • Feedback Effect • Are we done yet? • Future
How does the user find out that the current feature set cannot separate grass from trees?
Estimating Accuracy • Strategies from Machine Learning • Hold-out set • K-fold cross validation
Incremental Difference Estimate • Create a classifier C(i) for training set I • Create a classifier C(i+500) for training set (I+500) • Compare how C(i) and C(i+500) classify unlabeled data. • Percent difference is the error estimate
C(i+1) Area of disagreement Training Myopia All Data Labeled Data C(i)
Use incremental classifier distance to suggest regions to label
Unlabeled Artifacts Does Feedback Matter? Trained Function Training Algorithm ? IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Maybe Analysis Math Program Analysis Math Program
IML Examples • Characteristics of IML • Observations of User Behavior • Observations of Algorithm Behavior • Future
Other artifacts • Text • Classification • Extraction • Video • Are there multiple people in the room? • Audio • Is someone talking in the room? • Sensor streams • Object being shaken • Objects bumped together • Brain sensing
Unlabeled Artifacts Trained Function Training Algorithm IML User Interface Feedback Labeling Feature Generator Labeled Artifacts Different artifact types have differing labeling effort Analysis Math Program Analysis Math Program
Alternative interfaces • Selection Classification Select all frames of a football game where the ball is actually In play
Alternative interfaces • Copy and Paste is a learnable data transformation
Alternative interfaces • Similarity metrics • Sesame Street Learning – One of these things is not like the other A Niched Pareto Genetic Algorithm for Multiobjective Optimization Principles And Implementation Of Deductive Parsing Grammatical Trigrams: A Probabilistic Model of Link Grammar
Alternative interfaces • Placing artifacts in a folder structure is classification
Alternative interfaces • Hints • Query by Critique Price too high Wrong color