1 / 46

Uri Zwick – Tel Aviv Univ.

Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds. MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012. TexPoint fonts used in EMF.

odina
Download Presentation

Uri Zwick – Tel Aviv Univ.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uri Zwick –Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

  2. Deterministic pivoting rules Largest improvement Largest slope Dantzig’s rule – Largest modified cost Bland’s rule – avoids cycling Lexicographic rule – also avoids cycling All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvátal (1978), Goldfarb-Sit (1979), … , Amenta-Ziegler (1996)

  3. Klee-Minty cubes (1972) Taken from a paper by Gärtner-Henk-Ziegler

  4. Randomized pivoting rules Random-Edge Choose a random improving edge Random-Facet Described in previous lecture ☺ [Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Random-Facetis sub-exponential! Are Random-Edge and Random-Facet polynomial ???

  5. Abstract objective functions (AOFs) Acyclic Unique Sink Orientations (AUSOs) Every face shouldhave a unique sink

  6. AUSOs of n-cubes 2n facets2n vertices USOs and AUSOs Stickney, Watson (1978) Morris (2001) Szabó, Welzl (2001) Gärtner (2002) The directeddiameter is exactly n Exercise: Prove it.

  7. AUSO results Random-Facet is sub-exponential[Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matoušek (1994)] Sub-exponential lower boundfor Random-Edge [Matoušek-Szabó (2006)] Lower bounds do not correspondto actual linear programs Can geometry help?

  8. Random-Edge , Random-Facetare not polynomial for LPs Consider LPs that correspond toMarkov Decision Processes (MDPs) Simplex Policy iteration Obtain sub-exponential lower bounds for theRandom-Edge and Random-Facet variantsof the Policy Iteration algorithm for MDPs

  9. Randomized Pivoting Rules Lower bounds obtained for LPs whose diameter is n [Kalai’92][Matousek-Sharir-Welzl’92] [Friedmann-Hansen-Z ’11]

  10. 3-bit counter

  11. Turn-based 2-PlayerStochastic Games[Shapley ’53] [Gillette ’57] … [Condon ’92] Total reward version Discounted version Limiting average version Both players have optimal positional strategies Can optimal strategies be found in polynomial time?

  12. Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games

  13. Strategies / Policies A deterministicstrategy specifies which actionto take given every possible history A mixedstrategy is a probability distributionover deterministic strategies A memorylessstrategy is a strategy that depends only on the current state A positionalstrategy is a deterministicmemoryless strategy

  14. Values general positional general positional Both players have positionaloptimal strategies There are positional strategies that are optimal for every starting position

  15. Markov Decision Processes [Shapley ’53] [Bellman ’57] [Howard ’60] … Total reward version Discounted version Limiting average version Optimal positionalpoliciescan be found using LP Is there a strongly polynomialtime algorithm?

  16. Stochastic shortest paths (SSPs) Minimize the expected costof getting to the target

  17. Turn-based non-Stochastic Games[Ehrenfeucht-Mycielski(1979)] Total reward version Easy Limiting average version Discounted version Both players have optimal positional strategies Still no polynomialtime algorithms known!

  18. Turn-basedStochastic Games (SGs)long-term planning in a stochasticandadversarial environment 2½-players Non-StochasticGames (MPGs)adversarialnon-stochastic Markov Decision Processes (MDPs)non-adversarialstochastic 2-players 1½-players Deterministic MDPs (DMDPs) non-stochastic,non-adversarial 1-player

  19. Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even

  20. 8 3 ODD EVEN Parity Games (PGs) EVEN wins if largest priorityseen infinitely often is even Equivalent to many interesting problemsin automata and verification: Non-emptyness of -tree automata modal -calculus model checking

  21. 8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges

  22. Let’s focus on MDPs

  23. Evaluating a policy MDP + policy  Markov Chain Values of a fixed policy can be found by solving a system of linear equations

  24. Improving a policy (using a single switch)

  25. Policy iteration for MDPs [Howard ’60]

  26. Dual LP formulation for MDPs

  27. Dual LP formulation for MDPs a is not an improving switch Basic solution  (positional) Policy

  28. Primal LP formulation for MDPs Vertex  Complement of a Policy

  29. TB2SG  NP  co-NP TB2SG  P ???

  30. Policy iteration variants

  31. Random-Facet for MDPs • Choose a random action not in the current policy and ignore it. • Solve recursively without this action. • If the ignored action is not an improving switch with respect to the returned policy,we are done. • Otherwise, switch to the ignored action and solve recursively.

  32. Policy iteration for 2-player games • Keep a strategy of player 1 and an optimal counter-strategy of player 2. • Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithmfor turn-based 2-player stochastic games!

  33. Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann ’09] Switch-All for MDPs is exponential [Fearnley ’10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z ’11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ’11]

  34. Lower bound for Random-Facet Implement a randomized counter

  35. Lower bound for Random-Facet Implement a randomized counter • Lower bound for Random-Edge Implement a standard counter

  36. 3-bit counter (−N)15

  37. 3-bit counter 0 1 0

  38. 3-bit counter – Improving switches Random-Edge can choose eitherone of these improving switches… 0 1 0

  39. Cycle gadgets Cycles close one edge at a time Shorter cycles close faster

  40. Cycle gadgets Cycles open “simultaneously”

  41. 3-bit counter 23 1 0 1 0

  42. From b to b+1 in seven phases Bk-cycle closes Ck-cycle closes U-lane realigns Ai-cycles and Bi-cycles for i<k open Ak-cycle closes W-lane realigns Ci-cycles of 0-bits open

  43. 3-bit counter 34 1 0 1

  44. Size of cycles Various cycles and lanes compete with each other Some are trying to open while some are trying to close We need to make sure that our candidates win! Length of all A-cycles = 8n Length of all C-cycles = 22n Length of Bi-cycles = 25i2n O(n4)vertices for an n-bit counter Can be improved using a more complicated construction and an improved analysis (work in progress)

  45. Concluding remarks and open problems “Game-theoretic” perspective help understandthe behavior of randomized pivoting rules Polynomial pivoting rule? Polynomialbound on diameter? Strongly polynomial algorithms for MDPs? Polynomialalgorithms 2-player games?

More Related