- 136 Views
- Uploaded on
- Presentation posted in: General

18 th Feb

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

18th Feb

Using reachability heuristics for PO planning

Planning using Planning Graphs

Then it was cruelly

UnPOPped

In the beginning it was all POP.

The good times

return with Re(vived)POP

1970s-1995

1995

1997

2000 -

Domination of heuristic

state search approach:

HSP/R [Bonet & Geffner]

UNPOP [McDermott]:

POP is dead!

Importance of good

Domain-independent

heuristics

Hoffman’s FF – a state

search planner won the

AIPS-00 competition!

… but NASA’s highly

publicized RAX still a

POP dinosaur!

POP believed to be

good framework to

handle temporal

and resource planning

[Smith et al, 2000]

UCPOP, Zeno

[Penberthy &Weld]

IxTeT

[Ghallab et al]

The whole world

believed in POP

and was happy to

stack 6 blocks!

Advent of CSP style

compilation approach:

Graphplan

[Blum & Furst]

SATPLAN

[Kautz & Selman]

Use of reachability

analysis and

Disjunctive constraints

RePOP

UCPOP

UNPOP

A recent (turbulent) history of planning

Outline

RePOP: A revival for partial order planning

- To show that POP can be made very efficient by exploiting the same ideas that scaled up state search and Graphplan planners
- Effective heuristic search control
- Use of reachability analysis
- Handling of disjunctive constraints

- RePOP, implemented on top of UCPOP
- Dramatically better than all known partial-order planners
- Outperforms Graphplan and competitive with
state search planners in many (parallel) domains

p

p

Si

Si

Sj

Sj

POP background

P = (A,O,L,OC,UL)

A: set of action steps in the plan

S0 ,S1 ,S2 …,Sinf

O: set of action ordering Si < Sj ,…

L: set of causal links

OC: set of open conditions

(subgoals remain to be satisfied)

UL: set of unsafe links

where p is deleted by some

action Sk

I={q1 ,q2 }

G={g1 ,g2 }

p

q1

S1

S3

g1

g2

Sinf

S0

g2

oc1

oc2

S2

~p

- Flaw: Open condition OR unsafe link
- Solution plan: A partial plan with no remaining flaw
- Every open condition must be satisfied by some action
- No unsafe links should exist (i.e. the plan is consistent)

POP background

g1

g2

1. Initial plan:

Sinf

S0

- 1. Let P be an initial plan
- 2. Flaw Selection: Choose a flaw f (either
- open condition or unsafe link)
- 3. Flaw resolution:
- If f is an open condition,
- choose an action S that achieves f
- If f is an unsafe link,
- choose promotion or demotion
- Update P
- Return NULL if no resolution exist
- 4. If there is no flaw left, return P
- elsego to 2.

2. Plan refinement (flaw selection and resolution):

p

q1

S1

S3

g1

Sinf

S0

g2

g2

oc1

oc2

S2

~p

- Choice points
- Flaw selection (open condition? unsafe link?)
- Flaw resolution (how to select (rank) partial plan?)
- Action selection (backtrack point)
- Unsafe link selection (backtrack point)

State-space idea of distance heuristic

1. Ranking partial plans:

use an effective distance-based heuristic estimator

.

2. Exploit reachability analysis:

use invariants to discover implicit conflicts in the plan.

3. Unsafe links are resolved by posting disjunctive ordering constraints

into the partial plan:

avoid unnecessary and exponential multiplication of failures

due to promotion/demotion splitting

CSP ideas of consistency enforcement

P

O’

- Ranking Function: f(P) = g(P) + w h(P)
- g(P): number of actions in P
- h(P): estimate of number of new actions needed
- to refine P to become a solution plan
- w: increase the greediness of the heuristic
- search
- 2. Estimating h(P)h(P) |O’|
- Estimating |O’|
- Difficulty: How to account for positive and negative
- - Interactions among actions in O’
- - Interactions among actions in P
- - Interactions between O’ and P

S3

q1

p

S1

g1

S0

S5

g2

Sinf

g2

q

r

S4

S2

~p

h(P) |O’| = 2

P

O’

- Assumption: Negative effects of actions are relaxed
- (which are to be dealt with later in unsafe link set)
- P has no unsafe link flaws
- no negative interactions among actions in P
- no negative interactions between O’ and P
- |O’| ~ cost(S) needed to achieve the set of open
- conditions S from the initial state
- Any state-space distance heuristic can be adapted
- Informedness of heuristic estimate can be improved
- by using weaker relaxation assumption

S3

q1

p

S1

g1

S0

S5

g2

Sinf

g2

q

r

S4

S2

~p

Open condition set

S={p,q,r,..}

0

1

2

3

a

p

a

S+Prec(a)-Eff(a)

S

Distance-based heuristic estimateusing length of relaxed plans(adapted from state-space heuristics extracted from planning graphs [Nguyen & Kambhampati 2000], [Hoffman 2000],…)

Estimate h(P) = cost(S)

1. Build a planning graph PG from the

initial state.

2. Cost(S) := 0if all subgoals in S are in

level 0.

3. Let p be a subgoal in S that appears

last in PG.

4. Pick an action a in the graph that

first achieves p

5. Update

cost(S) := cost(a) + cost(S+Prec(a) – Eff(a))

where cost(a) = 0 if a P, and 1 otherwise

6. Replace S = S+Prec(a) – Eff(a), goto 2

p

Si

Sj

p

- 1. For each unsafe link
- threatened by another step Sk:
- Add disjunctive constraint to O
- Sk < Si V Sj< Sk
- 2. Whenever a new ordering constraint
- is introduced to O (or whenever you feel like it),
- perform the constraint propagations:
- S1 < S2 V S3 < S4 ^S4< S3 S1 < S2
- S1 < S2 ^ S2 < S3 S1 < S3
- S1 < S2 ^ S2 < S1 False

Si

Sj

~p

q

Prec(a)

Sk

- Avoid the unnecessary exponential
- multiplication of failing partial plans

p

p

Si

Si

Sj

Sj

- Reachability analysis to detect inconsistency
- on(a,b) and clear(b)
- How to get state information in a
- partial plan?

- 3. Cutset: Set of literals that must be true
- at some point during execution of plan
- For each action a,
- pre-C(Sk) = Prec(Sk) U {p |
- is a link and Si < Sk < Sj }
- post-C(Sk) = Eff(Sk) U {p |
- is a link and Si < Sk < Sj }
- 4. If there exists a cutset that violates of an invariant
- the partial plan is invalid and should
- be pruned

p

Sj

Si

q

Sn

Sm

Prec(Sk)

Eff(Sk)

Sk

Prec(Sk) + p + q

Eff(Sk) + p + q

- Disadvantage:
- Inconsistency checking is passive
- and maybe expensive

p

Si

Sj

- Generalizing unsafe link: Sk threatens
- iff p is mutually exclusive
- (mutex) with either Prec(Sk) or Eff(Sk)
- Unsafe link is resolved by posting
- disjunctive constraints (as before)
- Sk < Si V Si < Sj

p

Sj

Si

q

Sn

Sm

Prec(Sk)

Eff(Sk)

Sk

- Detects indirect conflicts early
- Derives more disjunctive constraints to be propagated

- RePOP is implemented on top of UCPOP planner using the three ideas presented
- Written in Lisp, runs on Linux, 500MHz, 250MB
- RePOP deals with set of totally instantiated actions thus avoids binding constraints

- Compared RePOP against UCPOP, Graphplan and AltAlt in a number of benchmark domains
- Performance metrics
- Time
- Solution quality

- Performance metrics

Repop vs. UCPOP

Graphplan

AltAlt

Repop vs. UCPOP

Graphplan

AltAlt

- RePOP is very good in parallel domains (gripper, logistics, rocket, parallel blocks world)
- Completely dominates UCPOP
- Outperforms Graphplan in many domains
- Competitive with AltAlt

- RePOP still inefficient in serial domains:
- Travel, Grid, 8-puzzle

Repop vs. UCPOP

Graphplan

AltAlt

- 1. Number of actions
- 2. Makespan:
- minimum completion time
- (number of time steps)
- 3. Flexibility:
- Average number of actions
- that do not have ordering
- constraints with other actions

3

1

Num_act=4

Makespan=2

Flex = 1

2

4

1

Num_act=4

Makespan=2

Flex = 2

3

2

4

1

2

3

4

Num_act=4

Makespan=4

Flex = 0

- RePOP generates partially ordered plans
- Number of actions: RePOP typically returns
- shortest plans
- Number of time steps (makespan):
- Graphplan produces optimal number of time steps
- (strictly when all actions have the same durations)
- RePOP comes close

- Flexibility:
- RePOP typically returns the most flexible plans

CE: Consistency enforcement techniques (reachability analysis and

disjunctive constraint handling

HP: Distance-based heuristic

- RePOP doesn’t particularly concentrate on flaw selection order
Any order will guarantee completeness but different orders have different efficiency

- For RePOP, unsafe links are basically handled by disjunctive ordering constraints
- So, we need an order for open conditions
- Ideas:
- LIFO/FIFO
- Pick open conditions with the least # of resolution choices (LCFR)
- Pick open conditions that have the highest cost (in terms of reachability).
- Try a whole bunch in parallel! (this is what VHPOP does—although it doesn’t use reachability based ordering)

- Progression/Regression/Partial order planners
- Reachability heuristics for focusing them
- In practice, for classical planning, progression planners with reachability heuristics (e.g. FF) seem to do best
- Assuming that we care mostly about “finding” a plan that is cheapest in terms of # actions
(sort of) Open issues include:

- Handling lifted actions (i.e. considering partially instantiated actions)
- Handling optimality criteria other than # actions
- Minimal cost (assuming actions have non-uniform costs)
- Minimal make-span
- Maximal flexibility

- Assuming that we care mostly about “finding” a plan that is cheapest in terms of # actions

PGs can be used as a basis for finding plans directly

If there exists a k-length plan, it will be a subgraph of the

k-length planning graph.

(see the highlighted subgraph of the PG for our example problem)

20th Feb

Finding the subgraphs that correspond to valid solutions..

--Can use specialized graph travesal techniques

--start from the end, put the vertices

corresponding to goals in.

--if they are mutex, no solution

--else, put at least one of the supports of those

goals in

--Make sure that the supports are not

mutex

--If they are mutex, backtrack and

choose other set of supports.

{No backtracking if we have no

mutexes; basis for “relaxed plans”}

--At the next level subgoal on the preconds of

the support actions we chose.

--The recursion ends at init level

--Consider extracting the plan from the PG

directly

-- This search can also be cast as a CSP or SAT or IP

The idea behind Graphplan

A5

P1

P1

P1

A6

A1

G1

X

I1

X

P2

P2

A7

G2

I2

A2

P3

P3

A8

G3

P4

P4

I3

A9

G4

A3

X

P5

P5

A10

P6

P6

P6

A4

A11

Animated

- Avrim Blum & Merrick Furst (1995) first came up with Graphplan idea—when planning community was mostly enamored with PO planning
- Their original motivation was to develop a planner based on “max-flow” ideas
- Think of preconditions and effects as pipes and actions as valves… You want to cause maximal fluid flow from init state to a certain set of literals in the goal level
- Maxflow is polynomial (but planning isn’t—because of the nonlinearity caused by actions—unless ALL preconditions are in, the “action valve” won’t activate the effect pipes…
- So they wound up finding a backward search idea instead

- Check out the animation…

- Their original motivation was to develop a planner based on “max-flow” ideas

- Memos essentially tell us that a particular set S of conditions cannot be achieved at a particular level k in the PG.
- We may as well remember this information—so in case we wind up subgoaling on any set S’ of conditions, where S’ is a superset of S, at that level, you can immediately declare failure
- “Nogood” learning—Storage/matching cost vs. benefit of reduced search.. Generally in our favor

- We may as well remember this information—so in case we wind up subgoaling on any set S’ of conditions, where S’ is a superset of S, at that level, you can immediately declare failure
- But, just because a set S={C1….C100} cannot be achieved together doesn’t necessarily mean that the reason for the failure has got to do with ALL those 100 conditions. Some of them may be innocent bystanders.
- Suppose we can “explain” the failure as being caused by the set U which is a subset of S (say U={C45,C97})—then U is more powerful in pruning later failures
- Idea called “Explanation based Learning”
- Improves Graphplan performance significantly….

[Rao, IJCAI-99; JAIR 2000]

A5

P1

A6

X

X

P2

A7

P3

A8

P4

A9

P5

X

A10

P6

A11

Whenever P can’t be given a value v because

it conflicts with the assignment of Q, add Q to

P’s conflict set

Conflict set for P4 = P4

P2

P1

A5

P1

P1

A6

X

X

P2

Conflict set for P4 = P4

P2

P1

P2

A7

--Skip over P3 when backtracking from P4

P3

P3

A8

Conflict set for P1 = P4

Conflict set for P2 = P4

P2

P2

P1

P1

P4

P4

Absorb conflict set being passed up

A9

Conflict set for P3 = P3

P2

P5

X

A10

P6

P3

A11

Store P1 P2 P3P4 as a memo

When we reach a variable V with conflict set C

during backtracking

--Skip other values of V if V is not in C (DDB)

--Absorb C into conflict set of V if V is in C

--Store C as a memo if V is the first variable

at this level

P3

P1

P1

P1

A1

G1

P2

P2

G2

A2

P3

P3

G3

P4

P4

G4

A3

P5

P6

P6

A4

Regression: What is the minimum set of goals

at the previous level, whose chosen action

supports generate a sub-goal set that covers

the memo

--Minimal set

--When there is a choice, choose a goal

that has been assigned earlier

--Supports more DDB

P1 P2 P3P4 regresses to

G1 G2

-P1 could have been

regressed to G4 but

G1 was assigned earlier

--We can skip over G4 & G3(DDB)

Costlier memo-matching strategy

--Clever indexing techniques available

Set Enumeration Trees [Rymon, KRR92]

UBTrees [Hoffman & Koehler, IJCAI-99]

Allows generation of more effective

memos at higher levels…

Not possible with normal memoization

Smaller memos are more general and

thus prune more failing branches

If any stored memo is a subset of the current goal set, backtrack immediately

- Return the memo as the conflict set

- Pick hardest to satisfy variables (goals) first
- Pick easiest to satisfy values (actions) first
- Hardness as
- Cardinality (goals that are supported by 15 actions are harder than those that can be supported by 17 actions)
- COST
- Level of the goal (or set of action preconditions) in the PG
- The length of the relaxed plan for supporting that goal in the PG

- Hardness as

[Romeo, AIPS-2000; also second

part of AltAlt paper]

- Graphplan differentiated between Static Interference and Mutex
- Two actions interfere statically if ones effects are inconsistent with the other actions preconditions/effects
- Two actions are mutex if they are either statically interfering or have been marked mutex by the mutex propagation procedure
- As long as we have static interference relations marked, then we are guaranteed to find a solution with backward search!
- Mutex propagation only IMPROVES the efficiency of the backward search..
Mutex propagation is thus very similar to consistency enforcement in CSP

Memoization improves it even further

Efficient memoization can improve it even more further…

- Original Graphplan algorithm used “parallel planing graphs” (rather than serial planning graphs).
- Not every pair of non-noop actions are marked mutex
- This meant that you can get multiple actions per time step
- Serial PG has more mutex relations (apart from interferences that come because of precondition/effects, we basically are adding some sort of “resource-based” mutexes—saying the agent doesn’t have resources to do more than one action per level).

- Original Graphplan will produce “step-optimal” plans
- NOT optimal wrt #actions
- Can get it with serial Graphplan

- NOT cost optimal
- Need Multi-PEGG..(according to Terry)

- NOT optimal wrt #actions

- Suppose we grew the graph to level-off and still did not find a solution.
- Is the problem unsolvable?
- Example: Actions A1…A100 gives goals G1…G100. Can’t do more than one action at a level (assume we are using serial PG)
- Level at which G1..G100 are true=?
- Length of the plan=?

- Is the problem unsolvable?
- One can see the process of extracting the plan as verifying that at least one execution thread is devoid of n-ary mutexes
- Unsolvable if memos also do not change from level to level

Conversion to CSP

-- This search can also be cast as a CSP

Variables: literals in proposition lists

Values: actions supporting them

Constraints: Mutex and Activation constraints

Variables/Domains:

~cl-B-2: { #, St-A-B-2, Pick-B-2}

he-2: {#, St-A-B-2, St-B-A-2,Ptdn-A-2,Ptdn-B-2}

h-A-1: {#, Pick-A-1}

h-B-1: {#,Pick-B-1 }

….

Constraints:

he-2 = St-A-B-2 => h-A-1 !=#

{activation}

On-A-B-2 = St-A-B-2 => On-B-A-2 != St-B-A-2

{mutex constraints}

Goals:

~cl-B-2 != # he-2 !=#

Do & Kambhampati, 2000

But but WHY?

--We are taking the cost of converting PG into CSP

(and also tend to lose the ability to use previous level search)

--there is NO reason why the search for finding the

valid subgraph has to go level-by-level and back to front.

--CSP won’t be hobbled by level-by-level and back-to-front

- Suppose we start with a PG that only marks every pair of “interfering” actions as mutex
- Any pair of non-noop actions are interfering
- Any pair of actions are interfering if one gives P and other gives or requires ~P
- No propagation is done

- Converting this PG and CSP and solving it will still give a valid solution (if there is one)
- So what is mutex propagation doing?
- It is “explicating” implicit constraints
- A special subset of “3-consistency” enforcement
- Recall that enforcing k-consistency involves adding (k-1)-ary constraints
- *Not* full 3-consistency (which can be much costlier)
- So enforcing the consistency on PG is cheaper than enforcing it after conversion to CSP...

- The problem of finding a valid plan from the planning graph can be encoded on any combinatorial substrate
- Alternatives:
- CSP [GP-CSP]
- SAT [Blackbox; SATPLAN]
- IP [Vossen et. Al]

Goals: In(A),In(B)

[Do & Kambhampati, 2000]

CSP: Given a set of discrete variables,

the domains of the variables, and constraints

on the specific values a set of variables can take

in combination, FIND an assignment of values to

all the variables which respects all constraints

Variables: Propositions (In-A-1, In-B-1, ..At-R-E-0 …)

Domains: Actions supporting that proposition in the plan

In-A-1 : { Load-A-1, #} At-R-E-1: {P-At-R-E-1, #}

Constraints: Mutual exclusion

~[ ( In-A-1 = Load-A-1) & (At-R-M-1 = Fly-R-1)] ; etc..

Activation

In-A-1 != # & In-B-1 != # (Goals must have action assignments)

In-A-1 = Load-A-1 => At-R-E-0 != # , At-A-E-0 != #

(subgoal activation constraints)

[Corresponds to a

regression-based proof]

[Kautz & Selman]

Goals: In(A),In(B)

SAT is CSP with Boolean Variables

Init: At-R-E-0 & At-A-E-0 & At-B-E-0

Goal: In-A-1 & In-B-1

Graph: “cond at k => one of the supporting actions at k-1”

In-A-1 => Load-A-1 In-B-1 => Load-B-1

At-R-M-1 => Fly-R-1 At-R-E-1 => P-At-R-E-1

Load-A-1 => At-R-E-0 & At-A-E-0 “Actions => preconds”

Load-B-1 => At-R-E-0 & At-B-E-0

P-At-R-E-1 => At-R-E-0h

~In-A-1 V ~ At-R-M-1 ~In-B-1 V ~At-R-M-1“Mutexes”

ILP: Given a set of real valued variables, a linear objective function on the variables,

a set of linear inequalities on the variables, and a set of integrality restrictions on the variables, Find the values of the feasible variables for which the objective function attains the maximum value

-- 0/1 integer programming corresponds closely to SAT problem

- Motivations
- Ability to handle numeric quantities, and do optimization
- Heuristic value of the LP relaxation of ILP problems

- Conversion
- Convert a SAT/CSP encoding to ILP inequalities
- E.g. X v ~Y v Z => x + (1 - y) + z >= 1

- Explicitly set up tighter ILP inequalities (Cutting constraints)
- If X,Y,Z are pairwise mutex, we can write x+y+z <= 1
(instead of x+y <=1 ; y+z <=1 ; z +x <= 1)

- If X,Y,Z are pairwise mutex, we can write x+y+z <= 1

- Convert a SAT/CSP encoding to ILP inequalities

[ Walser & Kautz;

Vossen et. al;

Bockmayr & Dimopolous]

- CSP encodings support implicit representations
- More compact encodings [Do & Kambhampati, 2000]
- Easier integration with Scheduling techniques

- ILP encodings support numeric quantities
- Seamless integration of numeric resource constraints [Walser & Kautz, 1999]
- Not competitive with CSP/SAT for problems without numeric constraints

- SAT encodings support axioms in propositional logic form
- May be more natural to add (for whom ;-)

Do & Kambhampati, 2000

Size of learning: k = 10 for both size-based and relevance-based

Speedup over GP-CSP up to 10x

Faster than SAT in most cases, up to 70x over Blackbox

Need to adapt CSP/SAT techniques

Can exploit approaches for compacting the plan

Can make the search incremental across iterations

Can exploit the latest advances in SAT/CSP solvers

Compilation stage can be time consuming, leads to memory blow-up

Makes it harder to exploit search from previous iterations

Makes it easier to add declarative control knowledge

Compiled

DIRECT