A hybridized planner for stochastic domains
Download
1 / 47

A Hybridized Planner for Stochastic Domains - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

A Hybridized Planner for Stochastic Domains. Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento. Planning under Uncertainty (ICAPS’03 Workshop). Qualitative (disjunctive) uncertainty Which real problem can you solve?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Hybridized Planner for Stochastic Domains' - kenyon-hendrix


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A hybridized planner for stochastic domains

A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. Weld

University of Washington, Seattle

Piergiorgio Bertoli

ITC-IRST, Trento


Planning under uncertainty icaps 03 workshop
Planning under Uncertainty(ICAPS’03 Workshop)

  • Qualitative (disjunctive) uncertainty

  • Which real problem can you solve?

  • Quantitative (probabilistic) uncertainty

  • Which real problem can you model?


The quantitative view
The Quantitative View

  • Markov Decision Process

  • models uncertainty with probabilistic outcomes

  • general decision-theoretic framework

  • algorithms are slow

  • do we need the full power of decision theory?

  • is an unconverged partial policy any good?


The qualitative view
The Qualitative View

  • Conditional Planning

  • Model uncertainty as logical disjunction of outcomes

  • exploits classical planning techniques  FAST

  • ignores probabilities  poor solutions

  • how bad are pure qualitative solutions?

  • can we improve the qualitative policies?


Hybplan a hybridized planner
HybPlan: A Hybridized Planner

  • combine probabilistic + disjunctive planners

    • produces good solutions in intermediate times

    • anytime: makes effective use of resources

    • bounds termination with quality guarantee

  • Quantitative View

    • completes partial probabilistic policy by using qualitative policies in some states

  • Qualitative View

    • improves qualitative policies in more important regions


Outline
Outline

  • Motivation

  • Planning with Probabilistic Uncertainty (RTDP)

  • Planning with Disjunctive Uncertainty (MBP)

  • Hybridizing RTDP and MBP (HybPlan)

  • Experiments

  • Conclusions and Future Work


Markov decision process
Markov Decision Process

< S, A, Pr, C, s0, G >

S : a set of states

A : a set of actions

Pr : prob. transition model

C : cost model

s0 : start state

G: a set of goals

Find a policy (S!A)

  • minimizes expected cost to reach a goal

  • for an indefinite horizon

  • for a fully observable

  • Markov decision process.

Optimal cost function, J*, ~ optimal policy


Example
Example

2

Longer path

s0

Goal

All states are

dead-ends

2

Wrong direction,

but goal still reachable


Optimal state costs
Optimal State Costs

2

2

3

3

4

4

1

1

3

2

1

1

4

0

1

1

3

2

1

Goal

8

8

2

7

7

6


Optimal policy
Optimal Policy

3

2

1

4

0

3

2

1

Goal


Bellman backup
Bellman Backup:

Create better approximation to cost function @ s


Bellman backup1
Bellman Backup:

Create better approximation to cost function @ s

Trial=simulate greedy policy & update visited states


Bellman backup2

Real Time Dynamic Programming(Barto et al. ’95; Bonet & Geffner’03)

Bellman Backup:

Create better approximation to cost function @ s

Repeat trials until cost function converges

Trial=simulate greedy policy & update visited states


Planning with disjunctive uncertainty
Planning with Disjunctive Uncertainty

  • < S, A, T, s0, G >

    S : a set of states

    A : a set of actions

    T : disjunctive transition model

    s0 : the start state

    G: a set of goals

  • Find a strong-cyclic policy (S!A)

    • that guarantees reaching a goal

    • for an indefinite horizon

    • for a fully observable

    • planning problem


Model based planner bertoli et al
Model Based Planner (Bertoli et. al.)

  • States, transitions, etc. represented logically

    • Uncertainty  multiple possible successor states

  • Planning Algorithm

    • Iteratively removes “bad” states.

    • Bad = don’t reach anywhere or reach other bad states


Mbp policy
MBP Policy

Sub-optimal

solution

Goal


Outline1
Outline

  • Motivation

  • Planning with Probabilistic Uncertainty (RTDP)

  • Planning with Disjunctive Uncertainty (MBP)

  • Hybridizing RTDP and MBP (HybPlan)

  • Experiments

  • Conclusions and Future Work


Hybplan top level code
HybPlan Top Level Code

0. run MBP to find a solution to goal

  • run RTDP for some time

  • compute partial greedy policy (rtdp)

  • compute hybridized policy (hyb) by

    • hyb(s) = rtdp(s) if visited(s) > threshold

    • hyb(s) = mbp(s) otherwise

  • cleanhyb by removing

    • dead-ends

    • probability 1 cycles

  • evaluatehyb

  • save best policy obtained so far

repeat until

1) resources

exhaust

or

2)a satisfactory

policy found


First rtdp trial
First RTDP Trial

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

0

Goal

0

0

0

0

0

0

0

2

0

0

0


Bellman backup3
Bellman Backup

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

Goal

0

0

0

0

0

0

0

Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0

Q1(s,N) = 1

Q1(s,S) = Q1(s,W) = Q1(s,E) = 1

J1(s) = 1

Let greedy action be North

2

0

0

0


Simulation of greedy action
Simulation of Greedy Action

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0


Continuing first trial
Continuing First Trial

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0


Continuing first trial1
Continuing First Trial

0

run RTDP for some time

2

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0


Finishing first trial
Finishing First Trial

run RTDP for some time

2

1

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0


Cost function after first trial
Cost Function after First Trial

2

run RTDP for some time

2

1

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0


Partial greedy policy
Partial Greedy Policy

2

2. compute greedy policy (rtdp)

2

1

0

1

1

Goal


Construct hybridized policy w mbp
Construct Hybridized Policy w/ MBP

2

3. compute hybridized policy (hyb)

(threshold = 0)

2

1

0

0

1

1

Goal


Evaluate hybridized policy
Evaluate Hybridized Policy

2

2

5. evaluatehyb

6. store hyb

2

1

0

3

3

0

1

4

4

1

Goal

5

After first trial

J(hyb) = 5


Second trial
Second Trial

2

2

1

0

0

1

0

0

0

0

0

2

1

Goal

1

1

0

0

0

0

0

2

0

0

0



Absence of mbp policy
Absence of MBP Policy

2

2

1

0

MBP Policy doesn’t exist!

no path to goal

0

1

0

£

2

1

Goal

1

1


Third trial
Third Trial

2

2

1

0

0

1

0

0

0

0

0

2

1

Goal

1

1

0

0

0

0

1

2

1

0

3



Probability 1 cycles
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3


Probability 1 cycles1
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3


Probability 1 cycles2
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3


Probability 1 cycles3
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3


Probability 1 cycles4
Probability 1 Cycles

2

2

1

0

0

1

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

Goal

0

1

2

1

0

3


Error bound
Error Bound

2

2

2

1

0

3

3

0

1

4

4

J*(s0) · 5

J*(s0) ¸ 1

) Error(hyb) = 5-1 = 4

1

Goal

5

After 1st trial

J(hyb) = 5


Termination
Termination

  • when a policy of required error bound is found

  • when the planning time exhausts

  • when the available memory exhausts

Properties

  • outputs a proper policy

  • anytime algorithm (once MBP terminates)

  • HybPlan = RTDP, if infinite resources available

  • HybPlan = MBP, if extremely limited resources

  • HybPlan = better than both, otherwise


Outline2
Outline

  • Motivation

  • Planning with Probabilistic Uncertainty (RTDP)

  • Planning with Disjunctive Uncertainty (MBP)

  • Hybridizing RTDP and MBP (HybPlan)

  • Experiments

    • Anytime Properties

    • Scalability

  • Conclusions and Future Work


Domains
Domains

NASA Rover Domain Factory Domain Elevator domain





Conclusions
Conclusions

  • First algorithm that integrates disjunctive and probabilistic planners.

  • Experiments show that HybPlan is

    • anytime

    • scales better than RTDP

    • produces better quality solutions than MBP

    • can interleaved planning and execution


Hybridized planning a general notion
Hybridized Planning: A General Notion

  • Hybridize other pairs of planners

    • an optimal or close-to-optimal planner

    • a sub-optimal but fast planner

      to yield a planner that produces

    • a good quality solution in intermediate running times

  • Examples

    • POMDP : RTDP/PBVI with POND/MBP/BBSP

    • Oversubscription Planning : A* with greedy solutions

    • Concurrent MDP : Sampled RTDP with single-action RTDP