A polynomial time algorithm for constructing k-maintainable policies

A polynomial time algorithm for constructing k-maintainable policies Chitta Baral Arizona State University and Thomas Eiter Vienna University of Technology

Motivation: What is `maintain’ f? • Alwaysf, also written as □f - too strong for many kind of maintainability (eg. maintain the room clean) • Always Eventuallyf, also written as □◊f. - Weak in the sense it does not give an estimate on when f will be made true. - May not be achievable in presence of continuous interference by belligerent agents. • □ f ------------------ □◊k f -------------------------- □◊ f • □◊3 f is a shorthand for □ ( f VOf VOOf VOOOf ) • But if an external agent keeps interfering how is one supposed to guarantee □◊3 f . • k-maintain f: If there is a break from the environment for k steps, then during that the agent will reach a state where f is true.

Motivation: a controller-agent transcript Controller (to the agent/robot):Your goal is to maintain the room clean. Robot/Agent:Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions? Controller:You can only clean when the room is unoccupied. Controller:By ‘maintain’ I mean ALWAYSclean. Robot/Agent:I won’t be able to guarantee that. What if while the room is occupied some one makes it dirty? Controller:Ok, I understand. How about ALWAYS EVENTUALLLYclean. Controller’s Boss:‘Eventually’ is too lenient. We can’t have the room unclean for too long. We should put some bound.

Controller-agent transcript (cont) Controller:Sorry, Sir. I should have made it more precise. ALWAYSEVENTUALLY3 clean Robot/Agent:Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used. Controller:You have a good point. Let me clarify again. If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time. Robot/Agent:I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

Formulating k-maintainability: a system • A system is a quadrupleA = (S,A,Ф, poss), where – S is the set of system states; – A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv; – Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions; – poss : S → 2 A is a function that describes which actions are possible (by the agent or the environment) in which states.

a c d a a a’ a b f h e g S = {b,c,d,f,g,h} A = {a, a’, e} Aag = {a, a’} Aenv = {e} Ф : as shown in the picture poss(b) = {a} when our policy dictates a to be executed at b.

Controls and super-controls • Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions, – a control policy for A w.r.t. Aag is a partial function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined. – a super-control policy for A w.r.t. Aag is a partial function K : S → 2 Aag such that K(s) is a subset of poss(s) and K(s) ≠ { } whenever K(s) is defined.

Reachable states and closure • Reachable statesR(A,s)from anindividual state s: Given a system A = (S,A,Ф, poss) and a state s, R(A, s) is the smallest set of states that satisfy the following conditions: • (i) s is in R(A, s); and • (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) . • Closure(S,A)of a set of states S: Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .

a c d a a a’ a b f h e g A = (S,A,Ф, poss) R(A,d) = {d,h} R(A,f) = {f, g, h} Closure({d,f}, A) = {d,f,g,h}

Unfoldk(s,A,K) • An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s.

a c d a a a’ a b f h e a g Consider policy K : Do action a in states b, c, and d Unfold3(b,A,K) = { <b,c,d,h>, <b,g>} Unfold3(c,A,K) = { <c,d,h> }

Definition of k-maintainability: the parameters 1. a system A = (S,A,Ф, poss), 2. a set Aag ⊆ A of agent actions, 3. set of initial states S 4. a set of desired states Ethat we want to maintain, 5. Maintainability parameter k. 6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and 7. a controlK (mapping a relevant part of S to Aag) such that K(s) belongs to poss(s).

Basic Idea • Ignoring interference: • From any state under consideration by following the control policy one should visit E in k steps. • Accounting for interference: • Broaden the states under consideration from the initial states to all reachable states due to control and the environment. (Use ``Closure’’.) • When using Closure • Account for the control policy. • Ignore other agent actions. • Also only consider exogenous actions in exo(s).

Definition of k-maintainability • possK,exo(s) is the set {K(s)} Uexo(s). • AK,exo = (S,A,Ф, possK,exo) • Given a system A = (S,A,Ф, poss),a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset Sof S with respect to subset E ofS, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ = s0, s1, . . . , srin Unfoldk(s,A,K)with s0 = s, it holds that {s0, s1, . . . , sr} ∩ E ≠ { }.

a c d a a a’ a b f h e g Consider policy K: Do action a in states b, c, and d. poss(b) = {a,a’} possK,exo(b) = {a} Closure({b,c},A)= {b,c,d,f,g,h} Closure({b,c},AK,exo)= {b,c,d,h}

a c d a a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} Such a policy: Do a in b, c, and d

a c d a e a a’ a b f h e g Goal: Find 3-maintainable policy for S={b} w.r.t. E={h} No such policy!

Constructing k-maintainable control policies: pre-formulation attempts • Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols. • Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing. • Kaelbling and Rosenschein 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

a c d a a a’ a b f h e g Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack. Backward Search: Should we include both d and f.

Propositional Encoding of solutions • Input: An input I is a system A= (S, A,Φ, poss), set of goal states E  S , set of initial states S S, a set AagA, a function exo, and an integer k  0 • Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is ``NO’’. • AIM: Given input I, construct sat(I) in PTIME s.t. • sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, • satisfying assignments for sat(I) encode possible such controls, and • sat(I) is polynomially solvable.

Propositional encoding: notation • si denotes that • there is a path from state s to some state in E using only agent actions and at most i of them. (to which we refer as “there is an a-path from s to E of length at most i”’)

The encoding sat(I) (0) For all states s, and for all j, 0  j <k: sj sj+1 (1) For all initial states s in E : s0 (2) For all states s, t such that Φ(a,s) = t for some action a  exo(s): sk tk (3) For all states s not in E and all i, 1  i  k: sit PS(s) ti-1, where PS(s) = {t  S|  a Aag poss(s): t= Φ(a,s)} (4) For all initial states not in E: sk (5) For all states s not in E:  s0

Constructing policies from the models of sat(I) • Let M be a model of sat(I). • CM = {sS| M╞sk} • LM (s): the smallest index j such that M╞sj(i.e., s0, s1 ,…, sj-1 are false and sj is true) • K(s) is defined iff s CM \ E and K(s) {a Aag| Φ(s,a) = t , t CM , LM (t) < LM (s) }

Proposition • Let I consist of a system A= (S, Aag, Φ, poss), where Φ is deterministic, a set AagA, sets of states E  S, and S  S, an exogenous function exo, and a integer k. Then, (i) S is k-maintainable w.r.t E iff sat(I) is satisfiable. (ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

Reverse Encoding • a  b is equivalent to •  a  b is equivalent to •  ( b)   a is equivalent to • b  a is equivalent to • b’  a’ is equivalent to • a’  b’

Rearranging sat(I) to Horn (0) For all states s and for all j, 0  j <k: sj sj+1 s’j  s’j+1 (1) For all initial states s in E: s0 s’0 (2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk' (3) For all state s not in E and all i, 1  i  k: sitPS(s) ti-1 , s’i ^tPS(s) t’i-1 where PS(s) = {t S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states s not in E: sk s’k (5) For all states not in E:  s0 s’0

a c d a a a’ a b f h e g (6) b’0, c’0, d’0, f’0, g’0 (From 5) (7) g’1, g’2, g’3 (From 3) (8) b’1, c’1 (From 6 and 3) (9) f’3 (From 7 and 2) (10) f’2 (From 9 and 0) (11) f’1 (From 10 and 0) (12) b’2 (From 8, 11, and 3) Thus M = {f’3, f’2, f’1 , f’0, g’3, g’2, g’1 , g’0, b’2, b’1, b’0, c’1, c’0, d’0} LM(b) = 3 LM(c) = 2 LM(d) = 1

Big picture of the algorithm: summary • Initialization about states not in E (5) and states with no agent transitions to compute si’ (3). • Backward reasoning from there using (2) and (3) and downward propagation using (0). • Use (1) and (4) for inconsistency detection. • Computation of LM (s). • Use LM (s) to compute the control K(s).

Polynomial time generation of control policy and maximal control policy • Horn satisfiability is a well-known polynomial problem • Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time. • ``Maximal Control’’: • Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models. • MT is computable in linear time in the size of the encoding. • MT leads to a maximal control, in the sense that it works on a greatest set S‘ of statesw.r.t. E such that S is a subset ofS‘ . • I.e. robust with respect to increasing S.

Dealing with non-deterministic transition functions • Notation: s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a. • The encoding sat'(I) has again groups (0)-(5) of clauses as follows: • (0), (1), (4) and (5) are the same as in sat(I). • (2) For any state s and tsuch that tΦ(a,s)for some action a  exo(s): sk tk

Dealing with non-deterministic transition functions (cont.) (3) For every state s not in E and for all i, 1  i  k : (3.1) si(a  Aagposs(s))s_ai; (3.2) for every a  Aag poss(s) and t Φ(s,a) : s_ai ti-1; (3.3) for every a Aag poss(s) if i < k: s_ais_ai+1; Leading to a Horn theory !

Direct algorithm using counters • Idea: c[s] = i means s’0 … s’i andc[s_a] = i means s_a’0 … s_a’i • Initialization • For all states s not in E make s’0 true. c[s]:= 0. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. c[s_a] := k. • The other steps are similar. • The idea can then be extended to actions with durations (or costs).

Computational Complexity • k-maintainability is PTIME-complete (under log-space reduction). • PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action • k-maintainability is EXPTIME-complete when we have a compact representation (e.g. STRIPS like) • EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

Conclusion • k-maintainability is an important notion. • Most specifications over infinite trajectories would be better off with k-maintainability like notions as part of the specification. • Role 1 of k: length of the window of opportunity • Role 2 of k: bound within which maintenance is guaranteed • k-maintainability is related to Dijkstra's notion of self-stabilization. • There is a big research community of self-stabilization in distributed control and fault tolerance. • But they have not much focused on automatic generation of control (protocol, in their parlance) • They have focused more on proving correctness of hand written protocol • Sat encoding to Horn logic program encoding – an interesting and fruitful approach to design a polynomial algorithm • One does not often think in terms of negative propositions. • We have a prototype implementation using DLV.

THANK YOU!

A polynomial time algorithm for constructing k-maintainable policies

A polynomial time algorithm for constructing k-maintainable policies

Presentation Transcript

A Randomized Polynomial-Time Simplex Algorithm for Linear Programming

A Linear Time Algorithm for the k Maximum Sums Problem

A Polynomial Space and Polynomial Delay Algorithm for Enumeration of Maximal Motifs in a Sequence

A Polynomial-Time Algorithm for Global Value Numbering

A Polynomial Time Algorithm for 2 -Player Rank 1 Games

A Random Polynomial-Time Algorithm for Approximating the Volume of Convex Bodies

A Polynomial-Time Cutting-Plane Algorithm for Matchings

4.2-2 Constructing Polynomial Functions

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

Polynomial-Time Hierarchy

Constructing a Time Line

A Polynomial Time Exact Algorithm for Self-Aligned Double Patterning Layout Decomposition

A Practical Algorithm for Constructing Oblivious Routing Schemes

From high level goals to policies: a polynomial time algorithm for k-maintainable goals

The Algorithm for Constructing Phylogenetic Tree

R-Max: A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

Polynomial-time reductions

A Polynomial-Time Algorithm for Global Value Numbering

A Randomized Polynomial-Time Simplex Algorithm for Linear Programming

Polynomial-Time Hierarchy

A linear time algorithm for recognizing a K 5 -minor