1 / 10

Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems

Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems. Brief Presentation of ICMI ’ 03 N.Oliver & E.Horvitz paper Nikolaos Mavridis, Feb ‘ 02. Introduction. The menu for today: An application that served as testbed & excuse

ollie
Download Presentation

Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems Brief Presentation of ICMI’03 N.Oliver & E.Horvitz paper Nikolaos Mavridis, Feb ‘02

  2. Introduction • The menu for today: • An application that served as testbed & excuse • The architecture of recognition engines used • Two varieties of selective perception • Results • Big Ideas • An intro to resolver • The main big idea: NO NEED TO NOTICE AND PROCESS EVERYTHING ALWAYS!

  3. The Application • SEER: • A multimodal system for recognizing office activity • General setting: • A basic requirement for visual surveillance and multimodal HCI, is the provision of of rich, human-centric notions of context in a tractable manner… • Prior work: mainly particular scenarios (waiving the hand etc.), HMM, DynBN • Output Categories: • PC=Phone Conversation • FFC=Face2Face Conversation • P=Presentation • O=Other Activity • NP=Nobody Present • DC=Distant Conversation (out of field of view) • Input: • Audio: PCA of LPC coeffs, energy, μ,σ ofω0, zero cr. rate • Audio Localisation: Time Delay of Arrival (TDOA) • Video: skin color, motion, foreground and face densities • Mouse & Keyboard: History of 1,5 and 60sec of activity

  4. Recognition Engine • Recognition engine: LHMM (Layered!) • First level: • Parallel discriminative HMM’s for categories: • Audio: human speech, music, silence, noise, ring, keyboard • Video: nobody, static person, moving person, multiperson • Second level: • Input: Outputs of above + derivative of sound loc + keyb histories • Output: PC, P, FFC, P, DC, N – longer temporal extent! • Selective Perception Strategies usable for both levels! • Selecting which features to use at the input of the HMM’s! • Example: • motion & skin density for one active person • Skin density & face detection for multiple people • Also for second stage: selecting which first stage HMM’s to run… • HMM’s vs LHMM’s • Compared to CP HMM’s (cart. Product, one long feature vector) • Prior knowledge about problem encoded in structure for LHMM’s • I.e. decomposition into smaller subproblems -> less training required, more filtered output for second stage, only first level needs retraining!

  5. Selective Perception Strategies Why sense everything and compute everything always?!? • Two approaches: • EVI: Expected Value of Information (ala RESOLVER) • Decision theory and uncertainty reduction • EVI computed for different overlapping subsets, real time, every frame • Greedy, one-step lookahead approach for computing the next best set of observation to evaluate • Rate-based perception (somewhat similar to RIP BEHAVIOR) • Policies defined heuristically for specifying observational frequencies and duty cycles for each computed feature • Two baselines for comparison: • Compute everything! • Randomly select feature subsets

  6. Expected Value of Information Endowing the perceptual system with knowledge of the value of action in the world…

  7. Expected Value of Information But what we are really interested in is what we have to gain! Thus: Where we also account for: • What we would given no sensing at all • Cost of sensing – but have to map cost and utility to the same currency! • HMM-ised implementation used! • Richer cost models: • Non-identity U matrix • Constant vs. activity-dependent costs (what else is running?) – successful results! (no significant decrease in accuracy;-))

  8. Rate-based perception • Simple idea: • In this case, no online-tuning of rates… • Doesn’t capture sequential prerequisites etc.

  9. Results EVI: No significant performance decrease with much less computational cost! Also effective in activity-dependent mode. And even more to be gained!

  10. Take home message:Big Ideas • No need to sense & compute everything always! • In essence we have a Planner: • a planner for goal-based sensing and cognition! • Not only useful for AI: • Approach might be useful for computational modeling of human performance, too… • Simple satisficing works: • No need for fully-optimised planning; with some precautions, one-step ahead with many approximations is sufficient – ALSO more plausible for Humans! (ref:Ullman) • Easy co-existence with other goal-based modules: • We just need a method for distributing time-varying costs of sensing and cognitising actions (centralised stockmarket?) • As a future direction: time-decreasing confidence mentioned

More Related