An Affine-invariant Approach to Linear Programming

An Affine-invariant Approach to Linear Programming Mihály Bárász and Santosh Vempala

Talk outline • Brief history of LP • Ideas • Algorithm • Analysis • Conclusion

Brief history of LP • Max c.x s.t. Ax ≤ b. (several equivalent formulations) • Simplex method,1947, Dantzig. Many variants. Still most popular LP algorithm. No polynomial guarantee known.

History: polynomial algorithms • Ellipsoid method, Khachiyan 1979. • Interior point method, Karmarkar, 1984. • Nestorov, Nemirovski, Vaidya, Renegar et al., • Perceptron method, Block, Novikoff, Minsky-Papert 62 • Polynomial perceptron, Dunagan-V. 2004 • Random walk method, Bertsimas-V. 2002. • Simplex+Scaling, Kelner-Spielman, 2006. • … All these methods are geometric. They “scale” space for efficiency Complexity depends on #bits in the input.

Strongly polynomial cases • Maximum weight matching in general graphs Edmonds, 1965. • Linear programming in fixed dimension Megiddo, 1984. • Minimum cost flow E. Tardos, 1986. • MDPs with fixed discount rate Ye, 2010. • … These are specialized combinatorial algorithms

Simplex pivot rules • Assume polyhedron is nondegenerate, i.e., each vertex is the intersection of n facets. • Then simplex moves from vertex to vertex, improving objective value. • The rule for determining the next vertex is a “pivot” rule.

Simplex pivot rules Deterministic pivot rules • Dantzig’s largest coefficient rule. Exponential example: Klee and Minty, 1972. • Greatest increase. Example: Jeroslow, 1973. • Bland’s least index rule. Ex: Avis and Chvatal, 1978. • Steepest increase. Ex: Goldfarb and Sit, 1979. • Shadow vertex rule. Ex: Murty, 1980 and Goldfarb, 1983. • These constructions are “variations” of Klee–Minty’s. • They fit a generalization called deformed products defined by Amenta and Ziegler.

Simplex pivot rule • Randomized edge rule: Among all pivots that improve the objective, pick one at random. • Complexity was open for a long time. • Best known bound (G. Kalai), is subexponential. • Now a subexponential lower bound (FHZ10) • “Is there a polynomial simplex pivot rule?” This question has dominated the search for strongly polynomial algorithms.

Polytope diameter • (Hirsch) Diameter of a polytope with m facets is at most m-n (Diameter = Diameter of graph induced by vertices and edges of polytope) • Best known upper bound is super-polynomial (G. Kalai). • Long-standing open problem to improve this.

Hirsch vs Simplex • Complexity of any simplex pivot rule gives an upper bound on diameter of polytope graph. • Thus proving an efficient simplex rule would also be a major combinatorial breakthrough. • Is strongly polynomial LP really a nongeometric, combinatorial question?

But, • [Matousek-Szabo] Take a polytope combinatorially equivalent to a hypercube; orient the edges arbitrarily, so that each face has a unique sink. There exist orientations for which the random edge pivot rule is exponential! (pivot rule runs by picking a random out-edge at each vertex visited) • Other less abstract constructions [E.g. EHRR10] • Does this imply strongly polynomial pivot rules are impossible? NO, because not all orientations are geometrically realizable. • However, it does suggest that the geometry plays an important role.

New Approach • Algorithm will be affine-invariant • So complexity will not depend on how the input is scaled. Two step iteration: • At current vertex, pick a line to travel along in an affine-invariant manner and move along the line. • Go to vertex of at least as high objective value in the face reached.

Step 1: Affine-invariant direction How to pick a direction? • Compute the set of improving rays (edges that lead to vertices of higher objective value) • Take a linear combination of the improving rays. Two candidate rules: • Average of all improving rays (centroid rule) • Random convex combination of improving rays (random rule). Lemma: Both rules are affine-invariant. (i.e. applying an affine transformation before computing the direction gives the same effect as applying it after)

New algorithm, roughly • At current vertex, • pick direction • Follow to get to new face • Go to vertex of at least as high value • Introduces many geometric shortcuts in the polytope. • No bearing on polytope diameter or Hirsch conjecture!

Step 2: go to vertex • How to go to a vertex of at least as high value? • E.g., Follow gradient, keep adding facets hit as equalities till a vertex is reached. • Doing this step arbitrarily can lead to an exponential number of iterations. • So we will go to a vertex in an affine-invariant manner.

Algorithm: AFFINE INPUT: Polyhedron P: Ax ≤ b, objective c, vertex z. OUTPUT: A max objective vertex, or “unbounded” While the current vertex z is not optimal, repeat: • (Initialize) H = indices of active inequalities at z Find edges: For every t in H compute a vector vt s.t. ah.vt = 0 for h H\{t} and at .vt < 0. (vt satisfies all as equalities except at) T = {t H : c.vt≥ 0}, S = H\T.

Algorithm: AFFINE While the current vertex z is not optimal, repeat: 1. (initialize) H = active inequalities at z Find edges at z; let T = improving edges and S = H\T. 2. (Iteration) While T is nonempty, repeat: Compute improving rays: For every t in T compute a vector vt≠ 0 : ah. vt= 0 for h H\{t}, c. vt≥ 0 length of vt = largest value for which z + vt remains feasible. Pick direction: compute a nonnegative combination v of T. Move: Let r be max s.t. z + r v is in P (if no max, return “unbounded”) Move the current point: z := z + r v. Update inequalities: s = inequality that becomes active. t: any index in T such that {ah: h {s} U S U T \ t} is linearly independent. Set S := S U {s}; T := T\{t} and H := S U T.

Algorithm: notes • In Step 2, one new active inequality is added in each iteration; thus a vertex is reached in at most n iterations. • Step 2 can be viewed as a recursive application of the original procedure in lower dimensional faces. • Each iteration: O(mn).

Analysis: How many iterations? • Klee-Minty: n iterations. • Main Theorem: For any polytope that is a deformed product, Algorithm Affine takes at most n outer iterations. (O(n2) total iterations). • Thus, algorithm is efficient on all simplex counterexamples (prior to [FHZ10]).

Metrics • f(P,c,z) = (expected) Max #vertices visited when AFFINE is applied to P,c,z. • f(P) = supc,z f(P,c,z) • h(P,c) = Max length of a directed path in the graph of P, when edges are oriented according to c. • h(P) = supc h(P,c)

Deformed products • P: k-dim polytope Ax ≤ a • V,W: dim polytopes, normally equivalent • linear function

Deformed products

Properties • Q is combinatorially same as P x V. • If q = (y, y v + (1- y w) is a vertex of Q, then y, v, w are vertices of P,V,W resp. with v,w corresponding. • Facets of Q come from P or V/W.

Affine on deformed products Theorem. f(Q) ≤ h(V) + f(P) (max #vertices visited in Q ≤ max directed path length in V + max #vertices visited in P.) Corollary. Let Q be recursively defined, with dim(V) ≤ 2 at each step of the recursion. Then, f(Q) ≤ dim(Q) + |facets(Q)|

Analysis • n = Dim(Q). Intersection of n-1 defining hyperplanes of Q is called a (V,W)-ray if at most dim(V)-1 planes correspond to V,W inequalities. • Claim: The first dim(P) coordinates of a (V,W)-ray are zero.

Partial order For vertex x of Q, Lemma. Let x, y be consecutive vertices visited by the algorithm applied to Q. Then in the partial order induced on the vertices of V by cV unless is already maximal.

Proof of Theorem 1 Theorem. f(Q) ≤ h(V) + f(P) Proof. • #vertices of Q visited before πV(x) is maximal is at most h(V) (no vertex of V is repeated). • After reaching a maximal vertex of V, (V,W)-rays are never improving. • So algorithm proceeds as if in P and takes f(P) additional steps.

Products • Consider Q = P x V (standard product) • Then vertex x of Q can be written as x = (y,v) • Would like to say f(Q) ≤ f(P) + f(V). • But, moving along a ray in Q projects stops after hitting a facet in P or in V, but not both. • How to bound complexity?

Projected AFFINE (for analysis only) Each time a combination is picked in AFFINE, repeat arbitrarily many times: • Stop on the chord at any interior point • Recompute the coefficients and the combination Let g(P,c,z) = (expected) max #vertices visited by Proj. AFFINE Inserted steps are *not* counted

Product polytopes • Q = P x V Theorem. f(Q,c,z) ≤ g(Q,c,z) ≤ g(P,cP,zP) + g(V,cV,zP). Proof idea. • From x = (xp,xv) we go along a ray to x’ on a facet of either P or V, say P. • Projection is a step in P and a partial step in V. • Next step in Q projects to a step of AFFINE in P and a step of Proj. AFFINE in V.

Analysis • Proof also works for mild perturbations of deformed polytopes, which can change the combinatorial structure considerably (thus algorithm is not specifically designed for these counterexample classes). • Algorithm makes heavy use of geometric shortcuts through the interior of the polytope.

Next steps • Polynomial upper bound? (not strongly polynomial, so one could use a geometric scaling type argument showing progress towards optimum). • Counterexample? • Upper bound for combinatorial cubes? • Upper bound for random polytopes? (to get away from the combinatorial structure of known counterexamples) • Complexity for Markov Decision Processes?

Q. Is AFFINE currently the only explicit candidate for strong polynomiality?! (for LP or even MDP?)Thank you!

An Affine-invariant Approach to Linear Programming