RESOLVER: To ask or to sense?

RESOLVER:To ask or to sense? A brief presentation of the ongoing project by Nikolaos Mavridis

Resolver: To ask or to sense? • Resolver: Selecting and mixing questions with sensing actions towards referent resolution • For machines to speak with humans, they must at times resolve ambiguities. Imagine having a conversational robot, which is able to carry out sensing actions in order to collect more data about its world; for example through active visual attention and touch. Suppose it is also able to gain new information linguistically by asking its human partner questions. Each kind of action, sensing and speech, has associated costs and expected payoffs. • Resolver is a planning algorithm that treats these actions in a common framework, enabling such a robot to integrate both kinds of action into coherent behavior, taking into account their costs and expected goal-oriented information-theoretic rewards. • Early motivation: ripley’s primitive ambiguity resolution dialogue system • Similar information-theoretic / utilitarian frame of thought: • E. Horvitz (Microsoft), A. Gorin (AT&T) • Wider picture: Language – action parallels (speech act theory, also motor neurons etc.) FINDING THE NEXT QUESTION!

Resolver:Overview • The problem • The program • The algorithm • Performance evaluation • Potential as cognitive model • Extensions • Other applications: • Parallel theory refinement/experiment selection in science

The problem • Imagine the following scenario: • A human user and a robot are sitting around a table, where some objects have been placed. The human user has selected one of the objects on the table, and asks the robot to give it to him. The robot has not yet attended to the objects. What should the robot’s next moves be? Should it attend to the color of the first object, and then to the sizes of all? Should it attempt to weigh an object? Or should it ask for further information, for example if the desired object is red? • Slight variation: the user provides an ambiguous partial description in his request. IN ESSENCE: Active matching under double uncertainty: for the desired target as well as the options available

The program:Initial state • 4 Modes: Virtual world standalone (self-answering) / Virtual world text I/O / Virtual world speech I/O / Full mental-model and ripley connectivity

The program:Intermediate state • After: “The heavy one” - “Is it small? No” - measuresize1-3 - “Is it medium?”

The program:Final state • After: “Is it medium? Yes”–“Is it black? No”–“Is it magenta? Yes” • Note cost breakdown. Costs might be given by master planner (tired, curious…)

The algorithm:Assumptions • Assumption families: • The objects and the intended referent • Nobj a priori known, unbiased choice • Measurements and descriptions of the objects • Properties, senses->prop, words->prop (me&user), referent uniqueness up to linguistic description • State and gradation of uncertainty • Contents of state (I, O, moves), initial state, full confidence in senses/answers (unchanging, unbiased by construction, cooperative user/hearing and nature/senses) • Priors on a solitary object • “Proximal” sensory natural modes, & their linguistic reflection • Priors on the set of objects and the intended referent • I belongs to O: interdepend., I unique in O: U interdepend. • Allowable actions • Q1:“Is it red?” / Q2:”What color is it?”/ Q3:”Is it this one?” • A1:Measure prop of one O / A2:Measure prop of all O

The algorithm:Stages • Stages: • State at each moment • Encoding distributions • Effect of answers and sensory results on the state as a whole • I-O and O-O interdependence • Evaluation of present state • Calculating prob(I=Oi) • Choosing the next move • Expected entropy reward of consistent answers • Approx and computational tractability (underlying state of world and answer) • The effect of different cost settings • Change ordering – choose dominance / Q1-Q2, A1-A2, Q-A • Fusion of expected information gain with associated costs • Requirements for function

Performance evaluation &Potential as cognitive model • Performance evaluation: • Quantitative: • 2 baselines so far (random non-repetitive, consistent) • Metrics: σ, μ of nmoves, and Σcost • 20-25% better (parameters have effect!) • normal modes help even more! • Qualitative: • Subjective evaluation of robot behavior for specific cost settings • Potential as cognitive model: • Tunable generative model (play with costs!) • Acquiring experimental human data • fixations, saccades, words: relative position sensitive costs • Input-output equivalence vs. inner workings • One step ahead • Cost-reward fusion • Artificial setting but general applicability (non-sit. context etc.)

Future plans:Extensions • Relax assumptions: • Encoding distributions • Pruning by clustering and satisficing acceptability threshold • Unknown starting parameters (nobj etc.) • Non-cooperative user/nature • Imperfect linguistic/sensory channel (FUSE) • Other molecular action combinations • Multi-view object/shape id. sensory actions • Multi-step, non-approx (another baseline)

Other applications • In essence: • Active matching under double uncertainty: for the desired target as well as the options available • Thus, to apply, just choose an interpretation of the structures involved! Parallel theory refinement / experiment selection in science • A number of groups of theoreticians are constructing theories in order to explain a phenomenon. The extension of these theories to wider domains of validity is a costly process. But so is the setting up of experiments in order to verify the applicability of their predictions to various domains. • Consider the identification of: • Sensory property dimensions with application domains of the theories • Questions for a property dimension with experiments in an application domain • Answers with experimental results of the above questions • Sensory actions with theoretical work towards extending theories to a domain • And sensory data with the theoretical predictions which are the outcome of the above work • Thus: • The user is now identified with nature; nature is questioned by experiments, and answers in the form of experimental results (or freely collected data in a domain). The table previously filled with objects now corresponds to a part of the platonic universe; a subset of the set of possible theories is on the table. One can either examine nature, or examine possible theories in order to reach a (hopefully somewhat permanent) temporary best match*. And thus science marches on, hopefully with resources better targeted towards more vital experiments and theoretical groups.

Resolver: To ask or to sense? • Recap: • We started by wanting to expand Ripley’s ambiguity resolution dialogues • Created a general algorithm for active matching under uncertainty • It performs well for the original task • Interesting theoretical points have arisen • Also attractive as cognitive model • Many possible extensions… • Many alternative applications!

Resolver: Aftermath • One can never underestimate the importance and joy of Finding the next question!

RESOLVER: To ask or to sense?