Section 10

Section 10 Mid-term Review II

Topics

Brew the coffee! • Three operators: 1. load(x) precond: coffee(x), loaded(none) effects: loaded(x), ¬loaded(none) 2. brew(x) precond: loaded(x), ¬loaded(none), ¬loaded(waste) effects: ¬loaded(x), loaded(waste), pot(x) 3. unload(x) precond: loaded(x), ¬loaded(none) effects: ¬loaded(x), loaded(none) • Two types of coffee: caf & decaf; waste; none • Initial state: coffee(caf), coffee(decaf), loaded(none) • Goal state: pot(caf), pot(decaf)

Graphplan! (Problem 1) • Graphplan works only for propositional planning problems! • Core elements: Expand-Graph keep track of mutex action and propositions Extract-Solution

Propositionalize the PDDL • Eliminate variables by replacing them with constant symbols • Example of propositionalizedfluents: loadedCaf: loaded(caf) • Example of propositionalized actions: brewCaf: brew(caf) precond: loaded(caf), ¬loaded(none), ¬loaded(waste) effects: ¬loaded(caf), loaded(waste), pot(caf) • Propositionalized initial state: coffeeCaf, coffeeDecaf, loadedNone • Propositionalized goal state: potCaf, potDecaf

Expand the Graph loadCaf loadDecaf coffeeCaf loadedCaf coffeeDecaf loadedDecaf loadedNone ¬loadedNone coffeeCaf coffeeDecaf loadedNone coffeeCaf coffeeDecaf loadedNone P0 A1 P1

Keep track of the Mutex • Mutex actions Not independent Action A deletes Action B’s precondition Action A deletes Action B’s positive effect Any of the precondition pairs are mutex • Mutex propositions All producer pairs are mutex

Mutex Actions and Propositions • Mutex actions in A1: (loadCaf, loadDecaf) (loadCaf, loadedNone) (loadDecaf, loadedNone) • Mutexpropositions in P1: (loadedCaf, loadedDecaf) (loadedCaf, loadedNone) (loadedDecaf, loadedNone) (loadedNone, ¬loadedNone)

Continue Expand the Graph brewCaf brewDecaf PotCaf PotDecaf coffeeCaf loadedCaf coffeeDecaf loadedDecaf loadedWaste loadedNone ¬loadedNone loadCaf loadCaf loadDecaf unloadCaf loadDecaf coffeeCaf loadedCaf coffeeDecaf loadedDecaf loadedNone ¬loadedNone coffeeCaf coffeeDecaf loadedNone unloadDecaf coffeeCaf coffeeCaf coffeeDecaf coffeeDecaf loadedCaf loadedDecaf loadedNone loadedNone ¬ loadedNone

Extract Solution • Graphplan starts to extract solution iff All goal state fluents appear in a proposition level None of the goal state fluent pairs is mutex • Extract the solution Graphplan gives you a valid plan, but not necessarily an optimal one (with the minimum number of actions) Multiple actions can take place in one action level!

Partial-Order Planning (Problem 2) • Causal links Action A: Action B: precond: … precond: p(y), … effects: p(x), … effects: … A—p—B! • Threats Action C: precond: … effects: ¬p(z), … C is a threat to the A—B causal link!

Causal Links and Threats • Causal Links Example load(x)—loaded(x)—brew(x) • Threats Example unload(x) could be a threat to the causal link above!

Demotion and Promotion • A—p(x)—B, C is a threat to this causal link Demotion: C—A—B Promotion: A—B—C • load(x)—loaded(x)—brew(x) is a causal link, unload(x) is a threat to this causal link Demotion: unload(x1)—load(x2)—brew(x3) possible variable bindings: x1=waste, x2=x3=decaf Promotion: load(x1)—brew(x2)—unload(x3) possible variable bindings:x1= x2=decaf, x3=waste

HTN (Problem 3) • Serve_two_things(t) task: serve_coffee_and_cake(t) precond: table(t) subtasks: serve(coffee,t), serve(cake,t) • Serve_coffee(x, t) task: serve(x,t) precond: coffee(x), table(t) subtasks: make-coffee(x), move(x, t) • Serve_cake(x, t) task: serve(x,t) precond: cake(x), table(t) subtasks: make-cake(x), move(x, t)

HTN(cont’d) • Make-Caf-Coffee(x, b, m) task: make-coffee(x) precond: bean(b), caf-bean(b), coffee-maker(m), coffee(x) subtasks: load(b, m), brew(b, m, x) • Make-Decaf-Coffee(x, b, m) task: make-coffee(x) precond: bean(b), decaf-bean(b), coffee-maker(m), coffee(x) subtasks: load(b, m), brew(b, m, x) • Load(b, m)[Primitive task!] precond: bean(b), coffee-maker(m), unloaded(m) effects: loaded(b, m) • Brew(b, m, x) [Primitive task!] precond: loaded(b, m), bean(b), coffee-maker(m) effects: coffee(x), in(x, m)

serve_coffee_and_cake (t0) Serve_two_things(t0) table(t0) serve(coffee) serve(cake) Serve_coffee(coffee, t0) coffee(coffee), table(t0) make-coffee(coffee) move(coffee, t0) Make-Caf-Coffee(coffee, caf-bean, machine) Make-Decaf-Coffee(coffee, decaf-bean, machine)

MDP (Problem 4) • You are making a three-year investment plan now. After your research, you find there are two companies which you’re interested in investing: Boston Medicine and San Francisco Chips. • Currently the stock price is $10 per share for Boston Medicine and $12 per share for San Francisco Chips. • At the beginning of each year, you will decide which company to invest in, and once you make the decision, you will buy 1000 shares from that company. • At the end of each year, you will earn / loss money depending on whether the stock price of the company you invest goes up or down.

MDP (Problem 4) • Particularly, the stock prices change according to the following transition matrices: For Boston Medicine: For San Francisco Chips:

MDP (Problem 4) • States? • <prevPriceBM, prevPriceSFC, currPriceBM, currPriceSFC, prevAct> • Actions? • BM, SFC • Rewards? • <prevPriceBM, prevPriceSFC, currPriceBM, currPriceSFC, BM>  (currPriceBM-prevPriceBM)*1000 • <prevPriceBM, prevPriceSFC, currPriceBM, currPriceSFC, SFC>  (currPriceSFC - prevPriceSFC)*1000

MDP (Problem 4) • Transitions?

Logic-Based vs. Decision-Theory • Decision theory: • Utilities (rewards) • Uncertainties (transition probabilities) • View the world as states • Policy defines: given a state, which action to take • Logic based (propositional, PDDL) • Goal state we want to reach • Actions with preconditions and deterministic affects • Factored state representation • In HTNs, Hierarchical representation of tasks

Which approach would you use? • What approach would you use to model each of the following planning problems? If both options seem reasonable, explain the advantages and limitations of each: • Planning how your team should work on a class project • Programing a robot that participates in RoboCup • Deciding where to eat on campus every day

Other Questions • Assume that we wanted to model what to eat in the dining room every day using an MDP. We defined the states as the available options, and we defined rewards based on our food preferences and taking into account other considerations as not wanting to eat the same food for two days in a row. • How would you go about defining the transition function? • If we use an optimal algorithm like value iteration to solve our MDP, are we guaranteed to have the optimal policy?

Section 10

Section 10

Presentation Transcript

Section 10-1

Section 10

Section 10-3

Section 1-10

Section 10: Ethics

Section 10 – 1

Section 10

Section 10-3

SECTION 10

Section 10 Vault

Section 10

Section 10: Layout

Section 10

Section 10-1

Section 10-3

Section 10 Vault

Section 10: Last section!

Section 4-10

Section 10-1