1 / 45

Decision-Theoretic Planning with Asynchronous Events

Decision-Theoretic Planning with Asynchronous Events. Håkan L. S. Younes Carnegie Mellon University. Introduction. Asynchronous processes are abundant in the real world Discrete-time models are inappropriate for systems with asynchronous events

Download Presentation

Decision-Theoretic Planning with Asynchronous Events

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University

  2. Introduction • Asynchronous processes are abundant in the real world • Discrete-time models are inappropriate for systems with asynchronous events • Generalized semi-Markov (decision) processes are great for this!

  3. Stochastic Processes with Asynchronous Events m1 m2 m1 upm2 up t = 0

  4. Stochastic Processes with Asynchronous Events m1 m2 m2 crashes m1 upm2 up m1 upm2 down t = 0 t = 2.5

  5. Stochastic Processes with Asynchronous Events m1 m2 m2 crashes m1 crashes m1 upm2 up m1 upm2 down m1 downm2 down t = 0 t = 2.5 t = 3.1

  6. Stochastic Processes with Asynchronous Events m1 m2 m1 crashes m1 crashes m2 rebooted m1 upm2 up m1 upm2 down m1 downm2 down m1 downm2 up t = 0 t = 2.5 t = 3.1 t = 4.9

  7. A Model of StochasticDiscrete Event Systems • Generalized semi-Markov process (GSMP) [Matthes 1962] • A set of events E • A set of states S

  8. Events • In a state s, events EsE are enabled • With each event e is associated: • A distribution Ge governing the time e must remain enabled before it triggers • A next-state probability distribution pe(s′|s)

  9. Semantics of GSMP Model • Associate a real-valued clock te with e • For each e Essample te from Ge • Let e* = argmine Este, t* = mine Este • Sample s′ from pe*(s′|s) • For each e  Es′ • te′ = te – t* if e  Es \ {e*} • sample te′ from Ge otherwise • Repeat with s = s′ and te = te′

  10. Semantics: Example m2 crashes m1 upm2 up m1 upm2 down m2 crashes: 2.5 m1 crashes: 3.1 m1 crashes: 0.6 reboot m2: 2.4

  11. Notes on Semantics • Events that remain enabled across state transitions without triggering are not rescheduled • Asynchronous events! • Differs from semi-Markov process in this respect • Continuous-time Markov chain if all Ge are exponential distributions

  12. General State-Space Markov Chain (GSSMC) • Model is Markovian if we include the clocks in the state space • Extended state space X • Next-state distribution f(x′|x) well-defined • Clock values are not know to observer • Time events have been enabled is know

  13. Observation Model • An observation o is a state s and a real value ue for each event representing the time e has currently been enabled • f(x|o) is well-defined

  14. Observations: Example m2 crashes m1 upm2 up m1 upm2 down Actual model: m2 crashes: 2.5 m1 crashes: 3.1 m1 crashes: 0.6 reboot m2: 2.4 m2 crashes m1 upm2 up m1 upm2 down Observed model: m2 crashes: 0.0 m1 crashes: 0.0 m1 crashes: 2.5 reboot m2: 0.0

  15. Actions and Policies (GSMDPs) • Identify a set A Eof controllable events (actions) • A policy is a mapping from observations to sets of actions • Action choice can change at any time in a state s

  16. Rewards and Discounts • Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e • Continuous reward rate r(s,a) associated with a being enabled in s • Discount factor  • Unit reward earned at time t counts as e–t

  17. Value Function for GSMDPs

  18. GSMDP Solution Method [Younes & Simmons 2004] GSMDP GSMDP Continuous-time MDP Continuous-time MDP Discrete-time MDP Phase-type distributions Uniformization [Jensen 1953]

  19. Continuous Phase-Type Distributions [Neuts 1981] • Time to absorption in a continuous-time Markov chain with n transient states

  20. Exponential Distribution  0

  21. Two-Phase Coxian Distribution p1 2 0 1 (1 – p)1

  22. Generalized Erlang Distribution p    0 1 … n – 1 (1 – p)

  23. Method of Moments • Approximate general distribution G with phase-type distribution PH by matching the first n moments

  24. Moments of a Distribution • The ith moment: i= E[Xi] • Mean: 1 • Variance: 2 = 2 – 12 • Coefficient of variation: cv = /1

  25. Matching One Moment • Exponential distribution:  = 1/1

  26. Matching Two Moments Exponential Distribution cv2 0 1

  27. Matching Two Moments Exponential Distribution cv2 0 1 Generalized Erlang Distribution

  28. Matching Two Moments Exponential Distribution Two-Phase Coxian Distribution cv2 0 1 Generalized Erlang Distribution

  29. 1/10 1/10 0 1 F(t) 1.0 9/10 0.5 W(1,1/2) t 1 2 3 4 5 6 7 8 Matching Moments: Example 1 • Weibull distribution: W(1,1/2) • 1 = 2, cv2 = 5

  30. 6 6 6 0 1 2 F(t) 1.0 0.5 U(0,1) t 1 2 Matching Moments: Example 2 • Uniform distribution: U(0,1) • 1 = 1/2, cv2 = 1/3

  31. Matching More Moments • Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] • Combination of Erlang distribution and two-phase Coxian distribution

  32. Approximating GSMDP with Continuous-time MDP • Each event with a non-exponential distribution is approximated by a set of events with exponential distributions • Phases become part of state description

  33. Policy Execution • Phases represent discretization into random-length intervals of the time events have been enabled • Phases are not part of real model • Simulate phase transition during execution

  34. The Foreman’s Dilemma • When to enable “Service” action in “Working” state? Service Exp(10) Fail G Servicedc = 0.5 Workingc = 1 Failedc = 0 Return Exp(1) Replace Exp(1/100)

  35. The Foreman’s Dilemma: Optimal Solution • Find t0 that maximizes v0 Y is the time to failure in “Working” state

  36. The Foreman’s Dilemma: SMDP Solution • Same formulas, but restricted choice: • Action is immediately enabled (t0 = 0) • Action is never enabled (t0 = ∞)

  37. The Foreman’s Dilemma: Performance Failure-time distribution: U(5,x) 100 90 80 Percent of optimal 70 60 50 x 5 10 15 20 25 30 35 40 45 50

  38. The Foreman’s Dilemma: Performance Failure-time distribution: W(1.6x,4.5) 100 90 80 Percent of optimal 70 60 50 x 0 5 10 15 20 25 30 35 40

  39. System Administration • Network of n machines • Reward rate c(s) = k in states where k machines are up • One crash event and one reboot action per machine • At most one action enabled at any time

  40. System Administration: Performance Reboot-time distribution: U(0,1) 50 45 40 35 Reward 30 25 20 15 n 1 2 3 4 5 6 7 8 9 10 11 12 13

  41. System Administration: Performance

  42. The Role of Phases • Foreman’s dilemma: • Phases permit delay in enabling of actions • System administration: • Phases allow us to take into account the time an action has already been enabled

  43. Summary • Generalized semi-Markov (decision) processes allow asynchronous events • Phase-type distributions can be used to approximate a GSMDP with an MDP • Allows us to approximately solve GSMDPs using existing MDP techniques • Phase does matter!

  44. Future Work • Discrete phase-type distributions • Handles deterministic distributions • Avoids uniformization step • Value function approximation • Take advantage of GSMDP structure • Other optimization criteria • Finite horizon, etc.

  45. Tempastic-DTP • A tool for GSMDP planning: http://www.cs.cmu.edu/~lorens/tempastic-dtp.html

More Related