Lecture 6. The Mirror Neuron System Model (MNS) 1

Michael Arbib: CS664 – Neural Models for Visually guided behaviourUniversity of Southern California, Fall 2001 • Lecture 6. • The Mirror Neuron System Model (MNS) 1

Visual Control of Grasping in Macaque Monkey A key theme of visuomotor coordination: parietal affordances (AIP) drive frontal motor schemas (F5) AIP - grasp affordances in parietal cortex Hideo Sakata F5 - grasp commands in premotor cortex Giacomo Rizzolatti

F5 Motor Neurons • F5 Motor Neurons include all F5 neurons whose firing is related to motor activity. • We focus on grasp-related behavior. Other F5 motor neurons are related to oro-facial movements. • F5 Mirror Neurons form the subset of grasp-related F5 motor neurons of F5 which discharge when the monkey observes meaningful hand movements. • F5 Canonical Neurons form the subset of grasp-related F5 motor neurons of F5 which fire when the monkey sees an object with related affordances.

Mirror Neurons Rizzolatti, Fadiga, Gallese, and Fogassi, 1995:Premotor cortex and the recognition of motor actions Mirror neurons form the subset of grasp-related premotor neurons of F5 which discharge when the monkey observes meaningful hand movements made by the experimenter or another monkey. F5 is endowed with anobservation/execution matching system

What is the mirror system (for grasping) for? Mirror neurons: The cells that selectively discharge when the monkey executes particular actions as well as when the monkey observes an other individual executing the same action. Mirror neuron system (MNS): The mirror neurons and the brain regions involved in eliciting mirror behavior. Interpretations: • Action recognition • Understanding (assigning meaning to other’s actions) • Associative memory for actions

Computing the Mirror System Response • The FARS Model: • Recognize object affordances and determine appropriate grasp. • The Mirror Neuron System (MNS) Model: • We must add recognition of • trajectoryand • hand preshape • to • recognition of object affordances • and ensure that all three are congruent. • There are parietal systems other than AIP adapted to this task.

cIPS: caudal intraparietal sulcus cIPS cIPS cIPS 7a (PG): caudal part of the posterior parietal lobule STS: Superior Temporal Sulcus 7b (PF): Rostral part of the posterior parietal lobule Further Brain Regions Involved Axis and surface orientation Spatial coding for objects, analysis of motion during interaction of objects and self-motion Detection of biologically meaningful stimuli (e.g.hand actions) Motion related activity (MT/MST part) Mainly somatosensory Mirror-like responses

cIPS cIPS cIPS cell response Surface orientation selectivity of a cIPS cell Sakata et al. 1997

Key Criteria for Mirror Neuron Activation When Observing a Grasp • a) Does the preshape of the hand correspond to the grasp encoded by the mirror neuron? • b) Does this preshape match an affordance of the target object? • c) Do samples of the hand state indicate a trajectory that will bring the hand to grasp the object? • Modeling Challenges: • i) To have mirror neurons self-organize to learn to recognize grasps in the monkey’s motor repertoire • ii) To learn to activate mirror neurons from smaller and smaller samples of a trajectory.

Mirror Neuron Development Hypothesis The development of the (grasp) mirror neuron system in a healthy infant isdriven by the visual stimuligenerated by the actions (grasps) performed by theinfant himself. The infant (with maturation of visual acuity) gains the ability tomap other individual’s actionsinto his internal motor representation. [In the MNS model, the hand state provides the key representation for this transfer.] Then the infant acquires the ability to create (internal) representations fornovel actionsobserved. Parallel to these achievements, the infant develops anaction predictioncapability (the recognition of an action given the prefix of the action and the target object)

The Mirror Neuron System (MNS) Model

Implementing the Basic Schemas of the Mirror Neuron System (MNS) Model • using Artificial Neural Networks • (Work of Erhan Oztop) • Hand State & Core Mirror Circuit • Visual Processing • Reach and Grasp generation

MNS: Core Mirror Circuit and Hand State

Opposition Spaces and Virtual Fingers The goal of a successful preshape, reach and grasp is to match the opposition axis defined by the virtual fingers of the hand with the opposition axis defined by an affordance of the object (Iberall and Arbib 1990)

Hand State • Our current representation of hand state defines a 7-dimensional trajectory F(t) with the following components • F(t) = (d(t), v(t), a(t), o1(t),o2(t), o3(t), o4(t)): • d(t): distance to target at time t • v(t): tangential velocity of the wrist • a(t): Aperture of the virtual fingers involved in grasping at time t • o1(t): Angle between the object axis and the (index finger tip – thumb tip) vector [relevant for pad and palm oppositions] • o2(t): Angle between the object axis and the (index finger knuckle – thumb tip) vector [relevant for side oppositions] • o3(t), o4(t): The two angles defining how close the thumb is to the hand as measured relative to the side of the hand and to the inner surface of the palm.

Hand State components For most components we need to know (3D) configuration of the hand.

Assuming that we can compute the hand state trajectory, how can we recognize it as a grasp action ? Solution: Fit a cubic spline to the sampled values. Then normalize and re-sample from the spline curve. Result:Very good generalization. Better performance than using the Fourier coefficients to recognize curves. The general problem: associate N-dimensional space curves with object affordances A special case: The recognition of two (or three) dimensional trajectoriesin physical space Simplest solution: Map temporal information into spatial domain. Then apply known pattern recognition techniques. Problem with simplest solution: The speed of the moving point can be a problem! The spatial representation may change drastically with the speed Scaling can overcome the problem. However the scaling must be such that it preserves thegeneralization ability of the pattern recognition engine.

A simple example of curve recognition Curve recognition system demonstrated for hand drawn numeral recognition (successful recognition examples for 2, 8 and 3). Spatial resolution: 30 Network input size: 30 Hidden layer size: 15 Output size: 5 Training : Back-propagation with momentum.and adaptive learning rate Sampled points Point used for spline interpolation Fitted spline

Core Mirror Circuit as Neural Network • With the assumptions: • Visual Information about the hand and the object can be extracted • The information about the hand and the object represented with the Hand State • We can apply the curve recognition idea for the core mirror circuit learning. Thus • We associate a 2 layer feed forward neural network with the core mirror circuit • Then the learning task is: given the 7 dimensional hand state trajectory, predict the grasp action observed.

MNS: Visual Processing

Visual Processing for the MNS model • How much should we attempt to solve ? • Even though computers are getting more powerful the visionproblem in its general form is an unsolved problem in engineering. • There exists gesture recognition systems for human-computer interaction and sign language interpretation • Our vision system must at least recognize • 1) The Hand and its Configuration • 2) Object features • We attempt in (1)

Simplifying the problem • We simplifying the problem of recognizing the Hand and its Configuration by using colored patches on the articulation points of the hand. • If we can extract the patch positions reliably then we can try to extract some of the features that make up the hand state by trying to estimate the 3D pose of the hand from 2D pose. • Thus we have 2 steps: • Extract the color marker positions • Estimate 3D pose

The Color-Coded Hand • The Vision task is simplified using colored tapes on the joints and articulation points • The First step of hand configuration analysis is to locate the color patches unambiguously (not easy!). Use color segmentation. But we have to compensate for lighting, reflection, shading and wrinkling problems: Robust color detection

Robust Detection of the Colors – RGB space • A color image in a computer is composed of a matrix of pixels triplets (Red,Green,Blue) that define the color of the pixel. • We want to label a given pixel color as belonging to one of the color patches we used to mark the hand, or as not belonging to any class. • A straightforward way to detect whether a given target color (R’,G’,B’) matches the pixel color (R,G,B) is to look at the squared distance (R-R’)2 + (G-G’)2 + (B-B’) 2with a threshold to do the classification. • This does not work well, because the shading and different lighting conditions effect R,G,B values a lot and a our simple nearest neighbor method fails. For example an orange patch under shadow is very close to red in RGB space. • But we can do better: Train a neural network that can do the labeling for us

Robust Detection of Colors – the Color Expert Create a training set using a test image by manuallypicking colors from the image and specifying their labels. Create a NN – in our case a one hidden layer feed-forward network - that will accept the R,G,B values as input and put out the marker label, or 0 for a non-marker color. Make sure that the network is not too “powerful” so that it does not memorize the training set (as distinct from generalization) Train it then Use it: When given a pixel to classify, apply the RGB values of the pixel to the trained network and use the output as the marker that the pixel belongs to. One then needs a segmentation system to aggregate the pixels into a patch with a single color label.

Color Expert: Summary Color Expert (Network weights) Preprocessing Training phase: A color expert is generated by training a feed-forward network to approximate human perception of color.

Color Segmentation and Feature Extraction Features NN augmented segmentation system Actual processing: The hand image is fed to an augmented segmentation system. The color decision during segmentation is done by the consulting color expert.

Hand Configuration Extraction Color Coded Hand Feature Extraction Step 1 of hand shape recognition: system processes the color-coded hand image and generates a set of features to be used by the second step Model Matching Step 2: The feature vector generated by the first step is used to fit a 3D-kinematics model of the hand by the model matching module. The resulting hand configuration is sent to the classification module. Hand Configuration

MNS: Reach and Grasp generation

Virtual Hand/Arm and Reach/Grasp Simulator A precision pinch A power grasp and a side grasp

Kinematics model of arm and hand • 19 DOF freedom: Shoulder(3), Elbow(1), Wrist(3), Fingers(4*2), Thumb (3) • Implementation Requirements • Rendering: Given the 3D positions of links’ start and end points, generate a 2D representation of the arm/hand (easy) • Forward Kinematics: Given the 19 angles of the joints compute the position of each link (easy) • Reach & Grasp execution: Harder than simple inverse kinematics since there are more constraints to be satisfied (e.g. multiple target positions to be achieved at the same time) • Inverse Kinematics: Given a desired position in space for a particular link what are the joint angles to achieve the desired position (semi-hard)

A 2D, 3DOF arm example P(x,y) c Forward kinematics: given joint angles A,B,C compute the end effector position P: X = a*cos(A) + b*cos(B) + c*cos(C) Y = a*sin(A) + b*sin(B) + c*sin(C) C b B a A Radius=c Inverse kinematics: given joint position P there are infinitely many joint angle triplets to achieve P(x,y) b Radius of the circles are a and c and the segments connecting the circles are all equal length of b b b Radius=a

A Simple Inverse Kinematics Solution • Consider just the arm. • The forward kinematics of the arm can be represented as a vector function that maps joint angles of the arm to the wrist position. • (x,y,z)=F(s1,s2,s3,e) , where s1,s2,s3 are the shoulder angles and e is the elbow angle. • We can formulate the inverse kinematics problem as an optimization problem: Given the desired P’ = (x’,y’,z’) to be achieved we can introduce the error function • J = || (P’-F(s1,s2,s3,e)) || • Then we can compute the gradient with respect to s1,s2,s3,e and follow the minus gradient to reach the minimum of J. • This method is called to Jacobian Transpose method as the partial derivatives of F encountered in the above process can be arranged into the transpose of a special derivative matrix called the Jacobian (of F).

Power grasp time series data +: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1-axisdisp2; : speed; : distance.

A single grasp trajectory viewed from three different angles How the network classifies the action as a power grasp. Empty squares: power grasp output; filled squares: precision grasp; crosses: side grasp output The wrist trajectory during the grasp is shown by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals.

Power and precision grasp resolution (a) Precision Pinch Mirror Neuron (b) Power Grasp Mirror Neuron Note that the modeling yields novel predictions for time course of activity across a population of mirror neurons.

“Spatial Perturbation” Experiment with trained core mirror circuit Figure A. A regular precision grasp (the hand spatially coincides with the target). Figure B. The response of the network as precision grasp. Figure C. The target object is displaced to create a ‘fake’ grasp. Figure D. The response of the network to action in Figure C. The activity of the precision mirror neuron is reduced. In the graphs the x axes represent the normalized time (0 for start of grasp, 1 for the contact with object) and y axes represent the cell firing rate. A B C D

“Kinematics Alteration” Experiment with the trained core mirror circuit A Figure A. A regular precision grasp (the wrist has a bell shaped velocity profile). Figure B. The velocity profile is (almost) linear. Figure C. Classification of the action in Figure A as precision grasp. Figure D. The activity vanished during the observation of action Note that the scales of the graphs C and D are different. B Normalized speed Normalized time Firing rate Firing rate C D D E Normalized time Normalized time

Research Plan • Development of the Mirror System • Development of Grasp Specificity in F5 Motor and Canonical Neurons • Visual Feedback for Grasping: A Possible Precursor of the Mirror Property • Recognition of Novel and Compound Actions and their Context • The Pliers Experiment: Extending the Visual Vocabulary • Recognition of Compounds of Known Movements • From Action Recognition to Understanding: Context and Expectation

Modeling Challenges How can MNS be plugged into a learning-by-imitation system with faith to biological constrains (BG, Cerebellum, SMA, PFC etc..) How does the brain handle temporal data? Transform the learning network into a one which can work directly on temporal data. Eliminate the preprocessing required before the input can be applied to MNS core circuit. Extend the action to be recognized beyond simple grasps. Model the complementary circuit, learning to grasp by trial and error. And a lot more!

Experimental Challenges What are “poor” mirror neurons coding? - temporal recognition codes - transient response to actions which are not exactly the preferred stimuli How can we relate different cells’ responses to each other? - Fix the condition and record from as many as possible cells with the exactly the same condition. Is it possible to record from mirror cells in different age groups of monkeys ( i.e. infant to adult)?

Lecture 6. The Mirror Neuron System Model (MNS) 1