a hybridized planner for stochastic domains
Download
Skip this Video
Download Presentation
A Hybridized Planner for Stochastic Domains

Loading in 2 Seconds...

play fullscreen
1 / 47

A Hybridized Planner for Stochastic Domains - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

A Hybridized Planner for Stochastic Domains. Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento. Planning under Uncertainty (ICAPS’03 Workshop). Qualitative (disjunctive) uncertainty Which real problem can you solve?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Hybridized Planner for Stochastic Domains' - kenyon-hendrix


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a hybridized planner for stochastic domains

A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. Weld

University of Washington, Seattle

Piergiorgio Bertoli

ITC-IRST, Trento

planning under uncertainty icaps 03 workshop
Planning under Uncertainty(ICAPS’03 Workshop)
  • Qualitative (disjunctive) uncertainty
  • Which real problem can you solve?
  • Quantitative (probabilistic) uncertainty
  • Which real problem can you model?
the quantitative view
The Quantitative View
  • Markov Decision Process
  • models uncertainty with probabilistic outcomes
  • general decision-theoretic framework
  • algorithms are slow
  • do we need the full power of decision theory?
  • is an unconverged partial policy any good?
the qualitative view
The Qualitative View
  • Conditional Planning
  • Model uncertainty as logical disjunction of outcomes
  • exploits classical planning techniques  FAST
  • ignores probabilities  poor solutions
  • how bad are pure qualitative solutions?
  • can we improve the qualitative policies?
hybplan a hybridized planner
HybPlan: A Hybridized Planner
  • combine probabilistic + disjunctive planners
    • produces good solutions in intermediate times
    • anytime: makes effective use of resources
    • bounds termination with quality guarantee
  • Quantitative View
    • completes partial probabilistic policy by using qualitative policies in some states
  • Qualitative View
    • improves qualitative policies in more important regions
outline
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
  • Conclusions and Future Work
markov decision process
Markov Decision Process

< S, A, Pr, C, s0, G >

S : a set of states

A : a set of actions

Pr : prob. transition model

C : cost model

s0 : start state

G: a set of goals

Find a policy (S!A)

  • minimizes expected cost to reach a goal
  • for an indefinite horizon
  • for a fully observable
  • Markov decision process.

Optimal cost function, J*, ~ optimal policy

example
Example

2

Longer path

s0

Goal

All states are

dead-ends

2

Wrong direction,

but goal still reachable

optimal state costs
Optimal State Costs

2

2

3

3

4

4

1

1

3

2

1

1

4

0

1

1

3

2

1

Goal

8

8

2

7

7

6

optimal policy
Optimal Policy

3

2

1

4

0

3

2

1

Goal

bellman backup
Bellman Backup:

Create better approximation to cost function @ s

bellman backup1
Bellman Backup:

Create better approximation to cost function @ s

Trial=simulate greedy policy & update visited states

bellman backup2

Real Time Dynamic Programming(Barto et al. ’95; Bonet & Geffner’03)

Bellman Backup:

Create better approximation to cost function @ s

Repeat trials until cost function converges

Trial=simulate greedy policy & update visited states

planning with disjunctive uncertainty
Planning with Disjunctive Uncertainty
  • < S, A, T, s0, G >

S : a set of states

A : a set of actions

T : disjunctive transition model

s0 : the start state

G: a set of goals

  • Find a strong-cyclic policy (S!A)
    • that guarantees reaching a goal
    • for an indefinite horizon
    • for a fully observable
    • planning problem
model based planner bertoli et al
Model Based Planner (Bertoli et. al.)
  • States, transitions, etc. represented logically
    • Uncertainty  multiple possible successor states
  • Planning Algorithm
    • Iteratively removes “bad” states.
    • Bad = don’t reach anywhere or reach other bad states
mbp policy
MBP Policy

Sub-optimal

solution

Goal

outline1
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
  • Conclusions and Future Work
hybplan top level code
HybPlan Top Level Code

0. run MBP to find a solution to goal

  • run RTDP for some time
  • compute partial greedy policy (rtdp)
  • compute hybridized policy (hyb) by
    • hyb(s) = rtdp(s) if visited(s) > threshold
    • hyb(s) = mbp(s) otherwise
  • cleanhyb by removing
    • dead-ends
    • probability 1 cycles
  • evaluatehyb
  • save best policy obtained so far

repeat until

1) resources

exhaust

or

2)a satisfactory

policy found

first rtdp trial
First RTDP Trial

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

0

Goal

0

0

0

0

0

0

0

2

0

0

0

bellman backup3
Bellman Backup

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

Goal

0

0

0

0

0

0

0

Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0

Q1(s,N) = 1

Q1(s,S) = Q1(s,W) = Q1(s,E) = 1

J1(s) = 1

Let greedy action be North

2

0

0

0

simulation of greedy action
Simulation of Greedy Action

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0

continuing first trial
Continuing First Trial

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0

continuing first trial1
Continuing First Trial

0

run RTDP for some time

2

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0

finishing first trial
Finishing First Trial

run RTDP for some time

2

1

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0

cost function after first trial
Cost Function after First Trial

2

run RTDP for some time

2

1

0

0

1

0

0

0

0

0

0

1

Goal

0

0

0

0

0

0

0

2

0

0

0

partial greedy policy
Partial Greedy Policy

2

2. compute greedy policy (rtdp)

2

1

0

1

1

Goal

construct hybridized policy w mbp
Construct Hybridized Policy w/ MBP

2

3. compute hybridized policy (hyb)

(threshold = 0)

2

1

0

0

1

1

Goal

evaluate hybridized policy
Evaluate Hybridized Policy

2

2

5. evaluatehyb

6. store hyb

2

1

0

3

3

0

1

4

4

1

Goal

5

After first trial

J(hyb) = 5

second trial
Second Trial

2

2

1

0

0

1

0

0

0

0

0

2

1

Goal

1

1

0

0

0

0

0

2

0

0

0

absence of mbp policy
Absence of MBP Policy

2

2

1

0

MBP Policy doesn’t exist!

no path to goal

0

1

0

£

2

1

Goal

1

1

third trial
Third Trial

2

2

1

0

0

1

0

0

0

0

0

2

1

Goal

1

1

0

0

0

0

1

2

1

0

3

probability 1 cycles
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3

probability 1 cycles1
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3

probability 1 cycles2
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3

probability 1 cycles3
Probability 1 Cycles

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

0

1

2

1

0

3

probability 1 cycles4
Probability 1 Cycles

2

2

1

0

0

1

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

Goal

0

1

2

1

0

3

error bound
Error Bound

2

2

2

1

0

3

3

0

1

4

4

J*(s0) · 5

J*(s0) ¸ 1

) Error(hyb) = 5-1 = 4

1

Goal

5

After 1st trial

J(hyb) = 5

termination
Termination
  • when a policy of required error bound is found
  • when the planning time exhausts
  • when the available memory exhausts

Properties

  • outputs a proper policy
  • anytime algorithm (once MBP terminates)
  • HybPlan = RTDP, if infinite resources available
  • HybPlan = MBP, if extremely limited resources
  • HybPlan = better than both, otherwise
outline2
Outline
  • Motivation
  • Planning with Probabilistic Uncertainty (RTDP)
  • Planning with Disjunctive Uncertainty (MBP)
  • Hybridizing RTDP and MBP (HybPlan)
  • Experiments
    • Anytime Properties
    • Scalability
  • Conclusions and Future Work
domains
Domains

NASA Rover Domain Factory Domain Elevator domain

conclusions
Conclusions
  • First algorithm that integrates disjunctive and probabilistic planners.
  • Experiments show that HybPlan is
    • anytime
    • scales better than RTDP
    • produces better quality solutions than MBP
    • can interleaved planning and execution
hybridized planning a general notion
Hybridized Planning: A General Notion
  • Hybridize other pairs of planners
    • an optimal or close-to-optimal planner
    • a sub-optimal but fast planner

to yield a planner that produces

    • a good quality solution in intermediate running times
  • Examples
    • POMDP : RTDP/PBVI with POND/MBP/BBSP
    • Oversubscription Planning : A* with greedy solutions
    • Concurrent MDP : Sampled RTDP with single-action RTDP
ad