- 62 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' A Hybridized Planner for Stochastic Domains' - kenyon-hendrix

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. Weld

University of Washington, Seattle

Piergiorgio Bertoli

ITC-IRST, Trento

Planning under Uncertainty(ICAPS’03 Workshop)

- Qualitative (disjunctive) uncertainty
- Which real problem can you solve?

- Quantitative (probabilistic) uncertainty
- Which real problem can you model?

The Quantitative View

- Markov Decision Process
- models uncertainty with probabilistic outcomes
- general decision-theoretic framework
- algorithms are slow
- do we need the full power of decision theory?
- is an unconverged partial policy any good?

The Qualitative View

- Conditional Planning
- Model uncertainty as logical disjunction of outcomes
- exploits classical planning techniques FAST
- ignores probabilities poor solutions
- how bad are pure qualitative solutions?
- can we improve the qualitative policies?

HybPlan: A Hybridized Planner

- combine probabilistic + disjunctive planners
- produces good solutions in intermediate times
- anytime: makes effective use of resources
- bounds termination with quality guarantee
- Quantitative View
- completes partial probabilistic policy by using qualitative policies in some states
- Qualitative View
- improves qualitative policies in more important regions

Outline

- Motivation
- Planning with Probabilistic Uncertainty (RTDP)
- Planning with Disjunctive Uncertainty (MBP)
- Hybridizing RTDP and MBP (HybPlan)
- Experiments
- Conclusions and Future Work

Markov Decision Process

< S, A, Pr, C, s0, G >

S : a set of states

A : a set of actions

Pr : prob. transition model

C : cost model

s0 : start state

G: a set of goals

Find a policy (S!A)

- minimizes expected cost to reach a goal
- for an indefinite horizon
- for a fully observable
- Markov decision process.

Optimal cost function, J*, ~ optimal policy

Bellman Backup:

Create better approximation to cost function @ s

Bellman Backup:

Create better approximation to cost function @ s

Trial=simulate greedy policy & update visited states

Real Time Dynamic Programming(Barto et al. ’95; Bonet & Geffner’03)

Bellman Backup:Create better approximation to cost function @ s

Repeat trials until cost function converges

Trial=simulate greedy policy & update visited states

Planning with Disjunctive Uncertainty

- < S, A, T, s0, G >

S : a set of states

A : a set of actions

T : disjunctive transition model

s0 : the start state

G: a set of goals

- Find a strong-cyclic policy (S!A)
- that guarantees reaching a goal
- for an indefinite horizon
- for a fully observable
- planning problem

Model Based Planner (Bertoli et. al.)

- States, transitions, etc. represented logically
- Uncertainty multiple possible successor states
- Planning Algorithm
- Iteratively removes “bad” states.
- Bad = don’t reach anywhere or reach other bad states

Outline

- Motivation
- Planning with Probabilistic Uncertainty (RTDP)
- Planning with Disjunctive Uncertainty (MBP)
- Hybridizing RTDP and MBP (HybPlan)
- Experiments
- Conclusions and Future Work

HybPlan Top Level Code

0. run MBP to find a solution to goal

- run RTDP for some time
- compute partial greedy policy (rtdp)
- compute hybridized policy (hyb) by
- hyb(s) = rtdp(s) if visited(s) > threshold
- hyb(s) = mbp(s) otherwise
- cleanhyb by removing
- dead-ends
- probability 1 cycles
- evaluatehyb
- save best policy obtained so far

repeat until

1) resources

exhaust

or

2)a satisfactory

policy found

Bellman Backup

0

run RTDP for some time

2

0

0

0

0

0

0

0

0

0

0

Goal

0

0

0

0

0

0

0

Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0

Q1(s,N) = 1

Q1(s,S) = Q1(s,W) = Q1(s,E) = 1

J1(s) = 1

Let greedy action be North

2

0

0

0

Probability 1 Cycles

2

2

1

0

0

1

repeat

find a state s in cycle

hyb(s) = mbp(s)

until cycle is broken

1

Goal

0

1

2

1

0

3

Termination

- when a policy of required error bound is found
- when the planning time exhausts
- when the available memory exhausts

Properties

- outputs a proper policy
- anytime algorithm (once MBP terminates)
- HybPlan = RTDP, if infinite resources available
- HybPlan = MBP, if extremely limited resources
- HybPlan = better than both, otherwise

Outline

- Motivation
- Planning with Probabilistic Uncertainty (RTDP)
- Planning with Disjunctive Uncertainty (MBP)
- Hybridizing RTDP and MBP (HybPlan)
- Experiments
- Anytime Properties
- Scalability
- Conclusions and Future Work

Domains

NASA Rover Domain Factory Domain Elevator domain

Conclusions

- First algorithm that integrates disjunctive and probabilistic planners.
- Experiments show that HybPlan is
- anytime
- scales better than RTDP
- produces better quality solutions than MBP
- can interleaved planning and execution

Hybridized Planning: A General Notion

- Hybridize other pairs of planners
- an optimal or close-to-optimal planner
- a sub-optimal but fast planner

to yield a planner that produces

- a good quality solution in intermediate running times
- Examples
- POMDP : RTDP/PBVI with POND/MBP/BBSP
- Oversubscription Planning : A* with greedy solutions
- Concurrent MDP : Sampled RTDP with single-action RTDP

Download Presentation

Connecting to Server..