Scalable Multi-Stage Stochastic Programming

Scalable Multi-Stage Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory LANS Informal Seminar March 2010 Work sponsored by U.S. Department of Energy Office of Nuclear Energy, Science & Technology

Problem formulation Multi-stage stochastic programming (SP) problem with recourse subj. to.

Deterministic approximation Stage t=1 • Discrete and finite random variables 0.4 0.3 0.3 Scenario Tree Stage t=2 50 45 49 0.4 0.6 0.5 0.5 0.35 0.35 0.3 3 5 3 7 4 5 6 Stage t=3 • Sample average approximation (SAA) is used for continuous or discrete infinitely supported random variables.

The Deterministic SAA SP Problem subj. to.

Two-Stage SP Problem Block separable obj. fcn. subj. to. Half-arrow shaped Jacobian

Multistage SP Problems 0 • Depth-first traversal of the scenario tree • Nested half-arrow shaped Jacobian • Block separable obj. func. 4 1 5 6 3 7 2 s.t.

Linear Algebra of Primal-Dual Interior-Point Methods (IPM) Convex quadratic problem IPM Linear System Min subj. to. Two-stage SP Arrow-shaped linear system (via a permutation)

Linear Algebra for Multistage SP Problems • Two stages SP -> arrow shape • T stages (T>2) • Nested arrow shape: depth-first traversal of the scenario tree • Each block diagonal has the structure 2-stage H has. • For each non-leaf node a two-stage problem is solved.

The Direct Schur Complement Method (DSC) • Uses the arrow shape of H • Solving Hz=r Implicit factorization Back substitution Diagonal solve Forward substitution

Parallelizing the Schur Complement • Scenario-based parallelization • Bottlenecks from the Schur Complement Implicit factorization Back substitution Diagonal solve Forward substitution • Solving S scenarios each with cost w, cost of SC is c, using P processes • Gondzio OOPS, Zavala et.al., 2007 (in IPOPT) 10

The Preconditioned Schur Complement (PSC) • Goal: “Hide” the term from the parallel execution flow. • How? Krylov subspace iterative methods for the solution of . • Solve iteratively • A each iteration and are needed. • A preconditioner M for C should • be cheap to invert (or cheap to compute ) • cluster the eigenvalues of .

The Stochastic Preconditioner • The exact structure of C is • IID subset of n scenarios: • The stochastic preconditioner of is • For C use the constraint preconditioner (Keller et. all., 2000)

But I said that … • it has to be cheap to solve with the preconditioner • Solve with the factors of M • Factorization of M done before C is computed • Cost of the Krylov solve is slightly larger than before. • Even when one process is dedicated to M (separate process)

Quality of the Stochastic Preconditioner • “Exponentially” better preconditioning • Proof: Hoeffding inequality • Assumptions on the problem’s random data • Boundedness • Uniform full rank of and

Quality of the Constraint Preconditioner • has an eigenvalue 1 with order of multiplicity . • The rest of the eigenvalues satisfy • Proof: based on Bergamaschi et. al., 2004

The Krylov Methods Used for • BiCGStab using constraint preconditioner M • Projected CG (PCG) (Gould et. al., 2001) • Preconditioned projection onto the • Does not compute the basis for Instead,

Observations • Real-life performance on linear SAA SP ( ) • few Krylov iterations for more than half of IPM iterations • several tenths of inner iterations as IPM approaches the solution • PCG takes fewer iterations than BiCGStab • Affected by the well-known ill-conditioning of IPMs. • For convex quadratic SP the performance should improve.

A Parallel Interior-Point Solver for Stochastic Programming (PIPS) • Convex QP SAA SP problems • Input: users specify the scenario tree • Object-oriented design based on OOQP • Linear algebra: tree vectors, tree matrices, tree linear systems • Scenario based parallelism • tree nodes (scenarios) are distributed across processors • inter-process communication based on MPI • dynamic load balancing • Mehrotra predictor-corrector IPM

Tree Linear Algebra – Data, Operations & Linear Systems Min • Data • Tree vector: b, c, x, etc • Tree symmetric matrix: Q • Tree general matrix: A • Operations • Linear systems: for each non-leaf node a two-stage problem is solved via Schur complement methods as previously described. subj. to. 0 4 1 5 6 3 7 2

Parallelization – Tree Distribution • The tree is distributed across processes. • Example: 3 processes • Dynamic load balancing of the tree • Number partitioning problem --> graph partitioning --> METIS 0 0 0 1 4 4 7 6 5 2 3

Numerical Experiments • Experiments on two 2-stage SP problems • Economic Optimization of a Building Energy System • Unit Commitment with Wind Power Generation • The Building Energy System Problem • The size of the problem was artificially increased • There is no benefit of solving with that many CPUs • The Unit Commitment Problem • Original problem is using the Illinois wind farms grid • Solved problems = 3 x Original pb. + up to 13 x more sampling data • Strong scaling investigated

Economic Optimization of a Building Energy System • Zavala et. al., 2009 • 1.2 mil variables, few degrees of freedom • Size of Schur complement = 357 Uncertainty (in RHS of the SP only)

PIPS on the Building Energy System Problem • The Direct Schur Complement – parallel scaling • 97.6% parallel efficiency from 5 to 50 processors

Stochastic Unit Commitment with Wind Power Generation • Constantinescu et. al., 2009 • MILP (relaxation solved) • Largest instance: • 664k variables • 1.4M constraints • 400 scenarios • 2.2k Schur Complement Matrix. • 70% efficiency from 10 to 200 CPUs Uncertainty is also only in RHS

PIPS on a Unit Commitment Problem • DSC on P processes vs PSC on P+1 process • 120 scenarios Optimal use of PSC Factorization of the preconditioner can not be hidden anymore.

Conclusions • The DSC method offers a good parallelism in an IPM framework. • The PSC method improves the scalability. • PIPS – solver for SP problems. • PIPS seems to be ready for larger problems.

Future work • New scalable methods for a more efficient software • Linear algebra/Optimization methods that take into account the characteristics of the particular sampling method. • SAA error estimate. • Looking for applications • parallelize the 2nd stage sub-problems in a 2-stage setup • use multi-stage SP with slim nodes • PIPS • Continuous case: IPM hot-start, other stochastic preconditioners, etc. • Use with MILP/MINLP solvers. • A ton of other small enhancements.

Thank you for your attention! Questions?

Scalable Multi-Stage Stochastic Programming