1 / 55

Discounting the Future in Systems Theory

Discounting the Future in Systems Theory. Luca de Alfaro , UC Santa Cruz Tom Henzinger , UC Berkeley Rupak Majumdar , UC Los Angeles. Chess Review May 11, 2005 Berkeley, CA. A Graph Model of a System. a. b. c. Property  c ("eventually c").

rfreda
Download Presentation

Discounting the Future in Systems Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discounting the Future in Systems Theory Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar, UC Los Angeles Chess Review May 11, 2005 Berkeley, CA

  2. A Graph Model of a System a b c

  3. Property c ("eventually c") a b c

  4. Property c ("eventually c") a b c  c … some trace has the property c

  5. Property c ("eventually c") a b c  c … some trace has the property c  c … all traces have the property c

  6. Richer Models FAIRNESS: -automaton Parity game ADVERSARIAL CONCURRENCY: game graph graph Stochastic game PROBABILITIES: Markov decision process

  7. Concurrent Game 1,12,2 1,11,22,2 1,22,1 a b c 2,1 player "left"player "right" -for modeling open systems [Abramsky, Alur, Kupferman, Vardi, …] -for strategy synthesis ("control") [Ramadge, Wonham, Pnueli, Rosner]

  8. Property c 1,1 2,2 1,1 1,2 2,2 1,2 2,1 a b c 2,1 hhleftii c … player "left" has a strategy to enforce c

  9. Property c 1,1 2,2 1,11,2 2,2 1,2 2,1 a b c Pr(1): 0.5 Pr(2): 0.5 2,1 hhleftii c … player "left" has a strategy to enforce cleft c … player “left" has a randomized strategy to enforce c

  10. Qualitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces 9 or 8 (98) B

  11. Stochastic Game a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  12. Property c Probability with which player "left" can enforce c ? a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  13. Semi-Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces sup or inf (sup inf) [0,1] µ R

  14. A Systems Theory Class of properties p over traces Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values

  15. A Systems Theory w-regular properties Class of properties p over traces Algorithm for computing Value(p,m) over models m GRAPHS Distance between models w.r.t. property values bisimilarity m-calculus

  16. Transition Graph Q states d: Q  2Q transition relation

  17. Graph Regions Q states d: Q  2Q transition relation  = [ Q !B ] regions 9pre, 8pre:    9 9pre(R) 8pre(R) R µ Q 8

  18. Graph Property Values: Reachability  R Given RµQ, find the states from which some trace leads to R. R

  19. Graph Property Values: Reachability  R = (m X) (R Ç9pre(X)) Given RµQ, find the states from which some trace leads to R. R R R [ pre(R) . . . R [ pre(R) [ pre2(R)

  20. Concurrent Game Q states l, r moves of both players d: Q  l  r  Q transition function

  21. Game Regions Q states l, r moves of both players d: Q  l  r  Q transition function  = [ Q !B ] regionslpre, rpre:   q  lpre(R) iff (l  l ) (r  r) d(q,l,r)  R 2,* lpre(R) R µ Q 1,2 1,1

  22. Game Property Values: Reachability left R Given RµQ, find the states from which player "left" has a strategy to force the game to R. R

  23. Game Property Values: Reachability left R = (m X) (R Ç lpre(X)) Given RµQ, find the states from which player "left" has a strategy to force the game to R. R left R R [ lpre(R) . . . R [ lpre(R) [ lpre2(R)

  24. An Open Systems Theory w-regular properties Class of winning conditions p over traces GAME GRAPHS Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values alternating bisimilarity [Alur, H, Kupferman, Vardi] (lpre,rpre) fixpoint calculus

  25. An Open Systems Theory w-regular properties hhleftiiR Class of winning conditions p over traces GAME GRAPHS Algorithm for computing Value(p,m) over models m Every deterministic fixpoint formula f computes Value(p,m), where p is the linear interpretation [Vardi] of f. (lpre,rpre) fixpoint calculus (m X) (R Ç lpre(X))

  26. An Open Systems Theory Two states agree on the values of all fixpoint formulas iff they are alternating bisimilar [Alur, H, Kupferman, Vardi]. GAME GRAPHS Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values alternating bisimilarity (lpre,rpre) fixpoint calculus

  27. Stochastic Game Q states l, r moves of both players d: Q  l  r  Dist(Q) probabilistic transition function

  28. Quantitative Game Regions Q states l, r moves of both players d: Q  l  r  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] quantitative regions lpre, rpre:   lpre(R)(q) = (sup l  l ) (inf r  r) R(d(q,l,r))

  29. Quantitative Game Regions Q states l, r moves of both players d: Q  l  r  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] quantitative regions lpre, rpre:   lpre(R)(q) = (sup l  l ) (inf r  r) R(d(q,l,r)) B 9 8

  30. Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0 0 1 a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  31. Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0 1 1 a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  32. Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0.8 1 1 a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  33. Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0.96 1 1 a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2

  34. Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 1 1 1 a b c right right 1 2 1 2 left left a: 0.6 b: 0.4 a: 0.5 b: 0.5 a: 0.0 c: 1.0 a: 0.0 c: 1.0 1 1 a: 0.1 b: 0.9 a: 0.2 b: 0.8 a: 0.7 b: 0.3 a: 0.0 b: 1.0 2 2 In the limit, the deterministic fixpoint formulas work for all w-regular properties [de Alfaro, Majumdar].

  35. A Probabilistic Systems Theory w-regular properties Class of properties p over traces MARKOV DECISION PROCESSES Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values quantitative bisimilarity [Desharnais, Gupta, Jagadeesan, Panangaden] quantitative fixpoint calculus

  36. A Probabilistic Systems Theory quantitativew-regular properties Class of properties p over traces max expected value of satisfying R MARKOV DECISION PROCESSES Algorithm for computing Value(p,m) over models m Every deterministic fixpoint formula f computes expected Value(p,m), where p is the linear interpretation of f. quantitative fixpoint calculus (m X) (R Ç9pre(X))

  37. Qualitative Bisimilarity e: Q2! {0,1} … equivalence relation F … function on equivalences F(e)(q,q') = 0 if q and q' disagree on observations = min { e(r,r’) | r2 9pre(q) Ær’2 9pre(q’) } else Qualitative bisimilarity … greatest fixpoint of F

  38. Quantitative Bisimilarity d: Q2! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of supl infr d(d(q,l,r),d(q',l,r)) supr infl d(d(q,l,r),d(q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F Natural generalization of bisimilarity from binary relations to pseudo-metrics.

  39. A Probabilistic Systems Theory Two states agree on the values of all quantitative fixpoint formulas iff their quantitative bisimilarity distance is 0. MARKOV DECISION PROCESSES Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values quantitative bisimilarity quantitative fixpoint calculus

  40. Great BUT … 1 The theory is too precise. Even the smallest change in the probability of a transition can cause an arbitrarily large change in the value of a property. 2 The theory is not computational. We cannot bound the rate of convergence for quantitative fixpoint formulas.

  41. Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today: 1 Tomorrow: a for discount factor 0 < a < 1 Day after tomorrow: a2 etc.

  42. Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today: 1 Tomorrow: a for discount factor 0 < a < 1 Day after tomorrow: a2 etc. Engineering: A bug today is worse than a bug tomorrow.

  43. Discounted Reachability Reward (ac) =ak if c is first true after k transitions 0 if c is never true The reward is proportional to how quickly c is satisfied.

  44. Discounted Property ac 1 a a b c a2  ac

  45. Discounted Property ac 1 a a b c a2  ac Discounted fixpoint calculus:pre(f) a ¢ pre(f)

  46. Fully Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace real reward Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces sup or inf (sup inf) [0,1] µ R

  47. Discounted Bisimilarity d: Q2! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of supl infr d(d(q,l,r),d(q',l,r)) supr infl d(d(q,l,r),d(q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F a ¢

  48. A Discounted Systems Theory discounted w-regular properties Class of winning rewards p over traces STOCHASTIC GAMES Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values discounted bisimilarity discounted fixpoint calculus

  49. A Discounted Systems Theory discounted w-regular properties max expected reward aR achievable by left player Class of expected rewards p over traces STOCHASTIC GAMES Algorithm for computing Value(p,m) over models m Every discounted deterministic fixpoint formula f computes Value(p,m), where p is the linear interpretation of f. discounted fixpoint calculus (m X) (R Ça ¢ lpre(X))

  50. A Discounted Systems Theory The difference between two states in the values of discounted fixpoint formulas is bounded by their discounted bisimilarity distance. STOCHASTIC GAMES Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values discounted bisimilarity discounted fixpoint calculus

More Related