Selective Perception Policies in Multimodal Systems: Big Ideas from ICMI’03 Paper

Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems Brief Presentation of ICMI’03 N.Oliver & E.Horvitz paper Nikolaos Mavridis, Feb ‘02

Introduction • The menu for today: • An application that served as testbed & excuse • The architecture of recognition engines used • Two varieties of selective perception • Results • Big Ideas • An intro to resolver • The main big idea: NO NEED TO NOTICE AND PROCESS EVERYTHING ALWAYS!

The Application • SEER: • A multimodal system for recognizing office activity • General setting: • A basic requirement for visual surveillance and multimodal HCI, is the provision of of rich, human-centric notions of context in a tractable manner… • Prior work: mainly particular scenarios (waiving the hand etc.), HMM, DynBN • Output Categories: • PC=Phone Conversation • FFC=Face2Face Conversation • P=Presentation • O=Other Activity • NP=Nobody Present • DC=Distant Conversation (out of field of view) • Input: • Audio: PCA of LPC coeffs, energy, μ,σ ofω0, zero cr. rate • Audio Localisation: Time Delay of Arrival (TDOA) • Video: skin color, motion, foreground and face densities • Mouse & Keyboard: History of 1,5 and 60sec of activity

Recognition Engine • Recognition engine: LHMM (Layered!) • First level: • Parallel discriminative HMM’s for categories: • Audio: human speech, music, silence, noise, ring, keyboard • Video: nobody, static person, moving person, multiperson • Second level: • Input: Outputs of above + derivative of sound loc + keyb histories • Output: PC, P, FFC, P, DC, N – longer temporal extent! • Selective Perception Strategies usable for both levels! • Selecting which features to use at the input of the HMM’s! • Example: • motion & skin density for one active person • Skin density & face detection for multiple people • Also for second stage: selecting which first stage HMM’s to run… • HMM’s vs LHMM’s • Compared to CP HMM’s (cart. Product, one long feature vector) • Prior knowledge about problem encoded in structure for LHMM’s • I.e. decomposition into smaller subproblems -> less training required, more filtered output for second stage, only first level needs retraining!

Selective Perception Strategies Why sense everything and compute everything always?!? • Two approaches: • EVI: Expected Value of Information (ala RESOLVER) • Decision theory and uncertainty reduction • EVI computed for different overlapping subsets, real time, every frame • Greedy, one-step lookahead approach for computing the next best set of observation to evaluate • Rate-based perception (somewhat similar to RIP BEHAVIOR) • Policies defined heuristically for specifying observational frequencies and duty cycles for each computed feature • Two baselines for comparison: • Compute everything! • Randomly select feature subsets

Expected Value of Information Endowing the perceptual system with knowledge of the value of action in the world…

Expected Value of Information But what we are really interested in is what we have to gain! Thus: Where we also account for: • What we would given no sensing at all • Cost of sensing – but have to map cost and utility to the same currency! • HMM-ised implementation used! • Richer cost models: • Non-identity U matrix • Constant vs. activity-dependent costs (what else is running?) – successful results! (no significant decrease in accuracy;-))

Rate-based perception • Simple idea: • In this case, no online-tuning of rates… • Doesn’t capture sequential prerequisites etc.

Results EVI: No significant performance decrease with much less computational cost! Also effective in activity-dependent mode. And even more to be gained!

Take home message:Big Ideas • No need to sense & compute everything always! • In essence we have a Planner: • a planner for goal-based sensing and cognition! • Not only useful for AI: • Approach might be useful for computational modeling of human performance, too… • Simple satisficing works: • No need for fully-optimised planning; with some precautions, one-step ahead with many approximations is sufficient – ALSO more plausible for Humans! (ref:Ullman) • Easy co-existence with other goal-based modules: • We just need a method for distributing time-varying costs of sensing and cognitising actions (centralised stockmarket?) • As a future direction: time-decreasing confidence mentioned

Selective Perception Policies in Multimodal Systems: Big Ideas from ICMI’03 Paper

Selective Perception Policies in Multimodal Systems: Big Ideas from ICMI’03 Paper

Presentation Transcript

MULTIMODAL EMOTION PERCEPTION: ANALOGOUS TO SPEECH PROCESSES

Systems of Perception

4.3 Computation in Positional Systems

Perception: selective, distortion

Policies for Autonomy in Open Distributed Systems

Multimodal surveillance systems

Computation in Positional Systems

Research Activity in multimodal and BN systems

S-Seer: A Selective Perception System for Multimodal Office Activity Recognition

Sensing and Actuation in Miniaturized Systems PRESENTATION

Selective Sensing

Sensing / Perception

Computation and Neural Systems

The guiding principles and concepts of EU health policies : Health in All Policies approach

Remote computation Systems

Combined Selective Systems :-

52300 Sensing and Actuation in Miniaturized Systems

52300 Sensing and Actuation in Miniaturized Systems

CENTER FOR SUBSURFACE SENSING AND IMAGING SYSTEMS

CENTER FOR SUBSURFACE SENSING AND IMAGING SYSTEMS

Distilled Sensing: Selective Sampling for Sparse Signal Recovery