Learning Interaction Protocols through imitation A data mining approach

Artificial Intelligence, Adv (E) (2013) Do not distribute beyond this class Learning Interaction Protocols through imitationA data mining approach Yasser Mohammad Nishida Lab.

Situated Modules • Used in many systems until now mainly with Robovie • Situated modules are executed in serial [Ishiguro et al. 1999]

Route Guidance Listener (2006) Structure Adjustment Redesign Tune/Adapt (Supervised) Parameter Adjustment Analyze Human Human Interactions Implement Model Evaluate Model Model Controller 2 WOZ experiments using motion captured data [Kanda et al. 2007]

Engineering vs. Learning Approaches Standard Engineering Approach Structure Adjustment Redesign Tune/Adapt (Supervised) Parameter Adjustment Analyze Human Human Interactions Implement Model Evaluate Model Model Controller Learning/Imitation Approach Adapt (Unsupervised) Parameter & Structure Adjustment Collect Human Human Interactions Develop Interact Training Data Controller

Example Scenarios Gaze Control During Listening Implicit Guided Navigation Explicit

Model of Actions Communication Protocol Model of Commands Bird’s Eye View Shared ground Learned models and protocol Adapted models and protocol models and protocol models and protocol Watch Mimic Interact Adapt Learner Robot Co-action Learned action Adapted actions action action Primordial Knowledge Model Our Long Term Model Watch External Behavior Learn Actions’ Model Learn Commands’ Model Learn Communication Protocol Commands Feedback • Main Insights • Learning By Watching is Ubiquitous in humans • Learning Actions and Commands are related • Change in Behavior is what matters Actions Operator Actor Learner Interaction

Basic Architecture Activation Level Execution Time Behavioral Influence

Design Procedure Structure Adjustment Redesign Tune/Adapt (Supervised) Parameter Adjustment Analyze Human Human Interactions Implement Model Evaluate Model Model Controller Redesign Evaluate Structure Adjustment Analyze Task & Required Basic Actions Decide Required Behavior (H-H Interactions) Learn Parameters (FPGA) Intentions Processes

Example: Gaze Control during Listening Sensors Perception Processes Behavior Processes Intentions

Floating Point Genetic Algorithm Select 2 individuals and generate 4: Calculate probability of passing: Crossover 1. Calculate probabilities over 1~m: 2. Calculate P(mutation@ k) as: 3. Select mutation site  according to P(mutation @ k) 4. Mutate parameter  using: Mutation Eliting Tournament Cross Over Mutation

FPGA – Preliminary Evaluation • Fitness function: • 100 generations • 100 individuals • Two comparison algorithms Proposed>A1 p=0.0133 Proposed>A2 p=0.0032 [Mohammad & Nishida 2010d]

Applications – Gaze Control • Fixed Structure Gaze Controller (18 parameters) • Dynamic Structure Gaze Controller (7 parameters) [Mohammad & Nishida 2010d]

Applications – Gaze Control • Fixed vs. Dynamic Structure GC • Six novel sessions • Four control GCs • Follow • Stare • Random [Mohammad & Nishida 2010d]

Learning by watching/imitation/mimicry 2. Learn 1. Watch Command stream Action stream Interaction Protocol Discovery Phase Constrained Motif Discovery Learner Commands Discrete Commands Discrete Actions 320000310 23340003204402 Feedback Actions Association Phase Baysian Network Induction Operator Actor learned Interaction Protocol Behavior Generation Model offline 3. Act Commands Piecewise Linear Controller Gen. Controller Generation Feedback Actions Robot/Agent Controller Feedback Controller online Operator Learned Actor

Building Blocks • Behavior Discovery • Motif Discovery • Change Point Detection • Behavior Association • Bayesian Network Induction • Causality Analysis • Behavior Generation • Piecewise Linear Controller Generation • Behavior Adaptation • Bayesian Network Combination

Gaze Control:Data Collection Experiment • 44 participants • ages 19-37 (27% females) • Untrained to interact with robots • Two objects (chair/stepper) • Easily assembled (7 steps both) • Not so easy (2 ordering steps both) • Two roles: • Instructor: explains about a single object three times: • Good listener • Bad listener • Robot • Listener: listens to two explanations about two objects: • Good listener • Bad listener

Gaze Control:Evaluation Experiment • Internet poll • 35 subjects • Watch 2 videos: • ISL learned controller • carefully designed controller • Age (ranged from 24 to 43 with an average of 31.16 years). • Gender (8 females and 30 males). • Experience in dealing with robots (ranged from I never saw one before to I program robots routinely). • Expectation of robot attention in a range from 1 to 7 (4 +-1.376). • Expectation of robot's behavior naturalness in a range from 1 to 7 (3.2 +-1.255). • Expectation of robot's behavior human-likeness in a range from 1 to 7 (3.526 +-1.52).

Gaze Control:Example Session

(1) Behavior Discovery Proposed Advantages • Utilizes relation between actions and commands • removes irrelevant dimensions • No need for separate clustering step • No predefined model Command Stream Action Stream Robust Singular Spectrum Transform Discover Change Points Discover Change Points Granger-Causality Maximization Natural Delay Discovery X Constrained Motif Discovery Discover Motifs Discover Motifs Remove Irrelevant Dimensions Remove Irrelevant Dimensions

Motif Discovery • Given a timeseries (an ordered list of real numbers), find approximately recurring subsequences Chiu 2013

Motif Discovery • Given a time series X(t) find recurring patterns of length L using distance function D

Motif Catalano’s Algorithm candidate noise comparison Compare Keep top k rather than best only [Catalano 2006]

Constrained Motif Discovery • Given a time series X(t) find recurring patterns of length between L1 and L2 using distance function D subject to the constraint P(t), where P(t) is an estimation of the probability that a motif occurrence exists near time step t. A motif is likely near here

DGCMD Signal Tcon Constraint [Mohammad & Nishida 2009]

DGCMD • Advantages: • Controlled Exhaustiveness (# candidates). • Controlled Sensitivity (Tc). • No random subwindow as needed by some MD algorithms. • No upper bound on motif size as needed by most MD algorithms. • Disadvantages: • Can become quadratic if # candidates is large. • Sensitive to outlier segments (long subwindows of outliers).

DGCMD – Evaluation • 50440 time series • Variable length (102~106) • Variable noise level (0~20%PP) • Variable motif types • Variable # of occurrences • Motif Discovery Algorithms: • Projections (most accurate) • Catalano et al. (fastest) • Constrained Motif Discovery Alg.: • MCFull • MCInc • DGCMD [Mohammad & Nishida 2009]

How good is the constraint? Probability of discovering a motif - not using the constraint - using the constraint Relative entropy between constraint and motif locations Entropy of the constraint Number of motif occurrences Window length Average motif length Time series length [Mohammad & Nishida 2010a]

How to get the constraint? • Main insight • The generating dynamics change near the beginning and end of motifs. • We need to find points in the time series where generating dynamics change

Change Point Discovery • Given a time series X(t) find for every time step the probability that X(t) is changing form (underlying dynamics are changing!!)

Available Techniques • CUMSUM • Detects only mean change • Inflection Point Detection • Assumes any variation is a change!! • Autoregressive Modeling • Assumes a specific generating model • Mixtures of Gaussians • Assumes a specific generating model • Discrete Cosine Transform • Finds only global changes • Wavelet Analysis • Tons of parameters • Singular Spectrum Transform (SST) [Ide et al. 2005] • Most General, no ad-hoc adjustment

Main idea • At every point • Use few values before it to represent the past: H • Use few values after it to represent the future: G • Compare the past with the future. The more dissimilar, the highest the score H is a hyper plan Future Past G H G is a set of Eigen vectors

Singular Spectrum Transform Parameters w n Future Past g, m G H l Future Change Angle [Ide et al. 2005]

Numeric Example • X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} • Parameters: • w=g=4,n=m=2,l=1 • At t=6 Future SVD Change Angle

Numeric Example (Continued • X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} • Parameters: • w=g=4,n=m=2,l=1 • At t=6 Future Change Angle

Numeric Example • X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} • Parameters: • w=g=4,n=m=2,l=1 • At t=10 Future SVD Change Angle

Numeric Example (Continued • X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} • Parameters: • w=g=4,n=m=2,l=1 • At t=6 Change Angle

Final Result

Singular Spectrum Transform • Advantages: • No predefined generation model. • Comparably few parameters (5). • PCA using SVD works for ANY matrix so no ad-hoc preprocessing is needed. • Linear in the length of the time series. • Disadvantages • Still there are 5 parameters hard to select. • Specificity degrades very fast with increased noise level. • Inadequate for time series with no background signal.

Robust Singular Spectrum Transform Parameters w,n Future Past G H Future Change Angles [Mohammad & Nishida 2009b]

RSST vs. SST – Effect of noise

RSST vs. SST – Real world data • Explanation Scenario • 22 participants • 3 conditions: • Natural listening • Unnatural listening • Robot • Physiological Sensors: • Respiration • Skin Conductance • Pulse

RSST vs. SST – Physio-psychological data analysis [Mohammad & Nishida 2009d]

Exampled Discovered Behaviors Stop Come Here

Behavior Association After Discovering Basic Motifs in both actions and commands and detecting their occurrence in all time series as in this graph Command 1 Action 1 use the natural delay between commands and actions calculated during the discovery phase. For every command-action pair calculate the joint-activation of them by the number of occurrences of the action within the natural delay interval of the command. Use the joint-activation values to induce a Baysian Network describing the relation between actions and commands Mohammad & Nishida 2009

Causality Based Delay Estimation To find delay between and Regress actions using actions & gestures Regress actions using actions only Compare residues Calculate g-causality statistic Find the delay that maximizes g-causality [Mohammad & Nishida 2009c]

Example: Associating Actions and Gestures • Guided Navigation Scenario Correct prediction 95.2%. [Mohammad & Nishida 2009c]

Learning Interaction Protocols through imitation A data mining approach