1 / 35

A polynomial time algorithm for constructing k-maintainable policies

A polynomial time algorithm for constructing k-maintainable policies. Chitta Baral Arizona State University and Thomas Eiter Vienna University of Technology. Motivation: What is `maintain’ f?. Always f , also written as □ f

rbeckham
Download Presentation

A polynomial time algorithm for constructing k-maintainable policies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A polynomial time algorithm for constructing k-maintainable policies Chitta Baral Arizona State University and Thomas Eiter Vienna University of Technology

  2. Motivation: What is `maintain’ f? • Alwaysf, also written as □f - too strong for many kind of maintainability (eg. maintain the room clean) • Always Eventuallyf, also written as □◊f. - Weak in the sense it does not give an estimate on when f will be made true. - May not be achievable in presence of continuous interference by belligerent agents. • □ f ------------------ □◊k f -------------------------- □◊ f • □◊3 f is a shorthand for □ ( f VOf VOOf VOOOf ) • But if an external agent keeps interfering how is one supposed to guarantee □◊3 f . • k-maintain f: If there is a break from the environment for k steps, then during that the agent will reach a state where f is true.

  3. Motivation: a controller-agent transcript Controller (to the agent/robot):Your goal is to maintain the room clean. Robot/Agent:Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions? Controller:You can only clean when the room is unoccupied. Controller:By ‘maintain’ I mean ALWAYSclean. Robot/Agent:I won’t be able to guarantee that. What if while the room is occupied some one makes it dirty? Controller:Ok, I understand. How about ALWAYS EVENTUALLLYclean. Controller’s Boss:‘Eventually’ is too lenient. We can’t have the room unclean for too long. We should put some bound.

  4. Controller-agent transcript (cont) Controller:Sorry, Sir. I should have made it more precise. ALWAYSEVENTUALLY3 clean Robot/Agent:Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used. Controller:You have a good point. Let me clarify again. If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time. Robot/Agent:I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

  5. Formulating k-maintainability: a system • A system is a quadrupleA = (S,A,Ф, poss), where – S is the set of system states; – A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv; – Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions; – poss : S → 2 A is a function that describes which actions are possible (by the agent or the environment) in which states.

  6. a c d a a a’ a b f h e g S = {b,c,d,f,g,h} A = {a, a’, e} Aag = {a, a’} Aenv = {e} Ф : as shown in the picture poss(b) = {a} when our policy dictates a to be executed at b.

  7. Controls and super-controls • Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions, – a control policy for A w.r.t. Aag is a partial function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined. – a super-control policy for A w.r.t. Aag is a partial function K : S → 2 Aag such that K(s) is a subset of poss(s) and K(s) ≠ { } whenever K(s) is defined.

  8. Reachable states and closure • Reachable statesR(A,s)from anindividual state s: Given a system A = (S,A,Ф, poss) and a state s, R(A, s) is the smallest set of states that satisfy the following conditions: • (i) s is in R(A, s); and • (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) . • Closure(S,A)of a set of states S: Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .

  9. a c d a a a’ a b f h e g A = (S,A,Ф, poss) R(A,d) = {d,h} R(A,f) = {f, g, h} Closure({d,f}, A) = {d,f,g,h}

  10. Unfoldk(s,A,K) • An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s.

  11. a c d a a a’ a b f h e a g Consider policy K : Do action a in states b, c, and d Unfold3(b,A,K) = { <b,c,d,h>, <b,g>} Unfold3(c,A,K) = { <c,d,h> }

  12. Definition of k-maintainability: the parameters 1. a system A = (S,A,Ф, poss), 2. a set Aag ⊆ A of agent actions, 3. set of initial states S 4. a set of desired states Ethat we want to maintain, 5. Maintainability parameter k. 6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and 7. a controlK (mapping a relevant part of S to Aag) such that K(s) belongs to poss(s).

  13. Basic Idea • Ignoring interference: • From any state under consideration by following the control policy one should visit E in k steps. • Accounting for interference: • Broaden the states under consideration from the initial states to all reachable states due to control and the environment. (Use ``Closure’’.) • When using Closure • Account for the control policy. • Ignore other agent actions. • Also only consider exogenous actions in exo(s).

  14. Definition of k-maintainability • possK,exo(s) is the set {K(s)} Uexo(s). • AK,exo = (S,A,Ф, possK,exo) • Given a system A = (S,A,Ф, poss),a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset Sof S with respect to subset E ofS, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ = s0, s1, . . . , srin Unfoldk(s,A,K)with s0 = s, it holds that {s0, s1, . . . , sr} ∩ E ≠ { }.

  15. a c d a a a’ a b f h e g Consider policy K: Do action a in states b, c, and d. poss(b) = {a,a’} possK,exo(b) = {a} Closure({b,c},A)= {b,c,d,f,g,h} Closure({b,c},AK,exo)= {b,c,d,h}

  16. a c d a a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} Such a policy: Do a in b, c, and d

  17. a c d a e a a’ a b f h e g Goal: Find 3-maintainable policy for S={b} w.r.t. E={h} No such policy!

  18. Constructing k-maintainable control policies: pre-formulation attempts • Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols. • Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing. • Kaelbling and Rosenschein 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

  19. a c d a a a’ a b f h e g Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack. Backward Search: Should we include both d and f.

  20. Propositional Encoding of solutions • Input: An input I is a system A= (S, A,Φ, poss), set of goal states E  S , set of initial states S S, a set AagA, a function exo, and an integer k  0 • Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is ``NO’’. • AIM: Given input I, construct sat(I) in PTIME s.t. • sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, • satisfying assignments for sat(I) encode possible such controls, and • sat(I) is polynomially solvable.

  21. Propositional encoding: notation • si denotes that • there is a path from state s to some state in E using only agent actions and at most i of them. (to which we refer as “there is an a-path from s to E of length at most i”’)

  22. The encoding sat(I) (0) For all states s, and for all j, 0  j <k: sj sj+1 (1) For all initial states s in E : s0 (2) For all states s, t such that Φ(a,s) = t for some action a  exo(s): sk tk (3) For all states s not in E and all i, 1  i  k: sit PS(s) ti-1, where PS(s) = {t  S|  a Aag poss(s): t= Φ(a,s)} (4) For all initial states not in E: sk (5) For all states s not in E:  s0

  23. Constructing policies from the models of sat(I) • Let M be a model of sat(I). • CM = {sS| M╞sk} • LM (s): the smallest index j such that M╞sj(i.e., s0, s1 ,…, sj-1 are false and sj is true) • K(s) is defined iff s CM \ E and K(s) {a Aag| Φ(s,a) = t , t CM , LM (t) < LM (s) }

  24. Proposition • Let I consist of a system A= (S, Aag, Φ, poss), where Φ is deterministic, a set AagA, sets of states E  S, and S  S, an exogenous function exo, and a integer k. Then, (i) S is k-maintainable w.r.t E iff sat(I) is satisfiable. (ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

  25. Reverse Encoding • a  b is equivalent to •  a  b is equivalent to •  ( b)   a is equivalent to • b  a is equivalent to • b’  a’ is equivalent to • a’  b’

  26. Rearranging sat(I) to Horn (0) For all states s and for all j, 0  j <k: sj sj+1 s’j  s’j+1 (1) For all initial states s in E: s0 s’0 (2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk' (3) For all state s not in E and all i, 1  i  k: sitPS(s) ti-1 , s’i ^tPS(s) t’i-1 where PS(s) = {t S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states s not in E: sk s’k (5) For all states not in E:  s0 s’0

  27. a c d a a a’ a b f h e g (6) b’0, c’0, d’0, f’0, g’0 (From 5) (7) g’1, g’2, g’3 (From 3) (8) b’1, c’1 (From 6 and 3) (9) f’3 (From 7 and 2) (10) f’2 (From 9 and 0) (11) f’1 (From 10 and 0) (12) b’2 (From 8, 11, and 3) Thus M = {f’3, f’2, f’1 , f’0, g’3, g’2, g’1 , g’0, b’2, b’1, b’0, c’1, c’0, d’0} LM(b) = 3 LM(c) = 2 LM(d) = 1

  28. Big picture of the algorithm: summary • Initialization about states not in E (5) and states with no agent transitions to compute si’ (3). • Backward reasoning from there using (2) and (3) and downward propagation using (0). • Use (1) and (4) for inconsistency detection. • Computation of LM (s). • Use LM (s) to compute the control K(s).

  29. Polynomial time generation of control policy and maximal control policy • Horn satisfiability is a well-known polynomial problem • Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time. • ``Maximal Control’’: • Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models. • MT is computable in linear time in the size of the encoding. • MT leads to a maximal control, in the sense that it works on a greatest set S‘ of statesw.r.t. E such that S is a subset ofS‘ . • I.e. robust with respect to increasing S.

  30. Dealing with non-deterministic transition functions • Notation: s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a. • The encoding sat'(I) has again groups (0)-(5) of clauses as follows: • (0), (1), (4) and (5) are the same as in sat(I). • (2) For any state s and tsuch that tΦ(a,s)for some action a  exo(s): sk tk

  31. Dealing with non-deterministic transition functions (cont.) (3) For every state s not in E and for all i, 1  i  k : (3.1) si(a  Aagposs(s))s_ai; (3.2) for every a  Aag poss(s) and t Φ(s,a) : s_ai ti-1; (3.3) for every a Aag poss(s) if i < k: s_ais_ai+1; Leading to a Horn theory !

  32. Direct algorithm using counters • Idea: c[s] = i means s’0 … s’i andc[s_a] = i means s_a’0 … s_a’i • Initialization • For all states s not in E make s’0 true. c[s]:= 0. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. c[s_a] := k. • The other steps are similar. • The idea can then be extended to actions with durations (or costs).

  33. Computational Complexity • k-maintainability is PTIME-complete (under log-space reduction). • PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action • k-maintainability is EXPTIME-complete when we have a compact representation (e.g. STRIPS like) • EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

  34. Conclusion • k-maintainability is an important notion. • Most specifications over infinite trajectories would be better off with k-maintainability like notions as part of the specification. • Role 1 of k: length of the window of opportunity • Role 2 of k: bound within which maintenance is guaranteed • k-maintainability is related to Dijkstra's notion of self-stabilization. • There is a big research community of self-stabilization in distributed control and fault tolerance. • But they have not much focused on automatic generation of control (protocol, in their parlance) • They have focused more on proving correctness of hand written protocol • Sat encoding to Horn logic program encoding – an interesting and fruitful approach to design a polynomial algorithm • One does not often think in terms of negative propositions. • We have a prototype implementation using DLV.

  35. THANK YOU!

More Related