150 likes | 255 Views
Explore the modal logic framework used in robotics to reason about knowledge states, goals, and planning strategies. Learn about signal procedures, modalities, inference axioms, goal reduction APIs, and useful functions for efficient planning algorithms.
E N D
Goals, plans, and planning Northwestern UniversityCS 395 Behavior-Based Robotics Ian Horswill
Modal logic • Need to reason about • States of knowledge • Goals • These aren’t propositions about objects … • … but rather about other propositions (define-signal front-sonar … (mode (know (< front-sonar 2000))))…(define-signal fspace(min front-sonar front-left-sonar front-right-sonar)) (define-signal advance (behavior (know fspace) (rt-vector 0 fspace)))
Modalities in GRL • In GRL, a modality is a special kind of signal procedure • The signal it returns is just a default • You can override it with a mode declaration • It’s memoized so that it always returns the same signal object when called on the same signal object (define-signal-modality (mymode x) … compute default …)(define-signal sigexpr (mode (mymode expr)))
Simplified modality definitions (define-signal-modality (know x) (define inputs (signal-inputs x)) (signal-expression (apply and (know inputs)))) (define-signal-modality (goal x) (define the-mode (signal-expression (accumulate or))) (define (forward-goal y) (drive-signal! x y)) (for-each forward-goal (signal-inputs x)) the-mode)
GRL modal logic API • (know x)Whether x’s value is known • (goal x)True if x is a goal of achievementRobot “wants” to make it true and move on • (maintain-goal x)True if x is a maintenance goalRobot “wants” to make it true and keep it true • (know-goal x)True if x is a knowledge goalRobot “wants” to determine the value of x
Built-in inference axioms (know (operatorarg …)) (and (know arg) …) (goal (know x)) (know-goal x) (goal (maintain x)) (maintain-goal x) (know (know x)) true (know (goal x)) true
Goal reduction API • (define-signal s (and a b c …))(define-reduction s parallel) • When s is a goal, all its inputs are goals • This is what was shown three slides ago • (define-signal s (and a b c …))(define-reduction s serial) • When s is a goal, a is a goal • When s is a goal and a is true, b is a goal • When s is a goal and both a and b are goals, c is a goal
Useful functions • (know-that x)True if (know x) and x • (satisfied-goal x)True if x is a goal and is true • (unsatisfied-goal x)True if x is a goal and is false • (parallel-and a b c …)And gate with parallel goal reduction • (serial-and a b c …)And gate with parallel goal reduction
Planning • Given • Goal (desired state of the environment) • Current state of the environment • Set of actions • Descriptions of how actions change the state of the environment • Actions are essentially functions from states to states • Find a series of actions (called a plan)that will result in the desired goal state
A bad planning algorithm • Key idea: simulate every possible series of actions until your simulation finds the goal Plan(s, g) {for each action a { let s’ = a(s) the state after running aif s == g return s else try { return a+plan(s’,g) } catch backtrack {}; // Try another action}throw backtrack; }
Complexity • Have to search a tree of plans • If there are n possible actions, there are nm possible m-step plans • Naïve algorithm is exponential • Cleaver optimizations possible, but it’s still basically an exponential problem
Generalizations • Conditional planning • Allow ifs inside of the plan to handle contingencies • More robust • More expensive to plan • Automatic programming • Plans can be arbitrary programs • Fully undecidable
Generalizations (2) • Markov Decision Problems (MDPs) • Actions aren’t deterministic • Only know a probability distribution on the possible result states for each action • Actions are now functions from probability distributions to probability distributions • Plan can’t be a program anymore (how do you know what the output state is?) • Payoff function that tells you how good a state is • Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff • Really really expensive
Generalizations (3) • Partially Observable MDPs (POMDPs) • Actions aren’t deterministic • Don’t know what state you’re in • Sensors only give us a probability distribution on states • Not states • Policy has to map probability distributions (called “belief states”) to actions • Not states to actions • Payoff function that tells you how good a state is • Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff • Really really really expensive
Generalizations (4) • Can you detect a pattern here? • How to get tenure • Find a complicated instance of a problem that current technology can’t handle • Devise an elegant yet prohibitively expensive technology to solve it • Write a paper that starts with “To survive in complex dynamic worlds, an agent must …” • Add a description of your technique • Prove a lot of theorems about how your technique will solve all instances of the problem given more CPU time than the lifetime of the universe • Write: “Future work: make it fast”