Scalable stochastic programming
1 / 26

Scalable Stochastic Programming - PowerPoint PPT Presentation

  • Uploaded on

Scalable Stochastic Programming. Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing Society Conference Monterey, California January, 2011 [email protected] Motivation. Sources of uncertainty in complex energy systems

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Scalable Stochastic Programming' - otto-burns

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scalable stochastic programming

Scalable Stochastic Programming

Cosmin Petra and MihaiAnitescu

Mathematics and Computer Science Division

Argonne National Laboratory

Informs Computing Society Conference

Monterey, California

January, 2011

[email protected]


  • Sources of uncertainty in complex energy systems

    • Weather

    • Consumer Demand

    • Market prices

  • Applications @Argonne – Anitescu, Constantinescu, Zavala

    • Stochastic Unit Commitment with Wind Power Generation

    • Energy management of Co-generation

    • Economic Optimization of a Building Energy System

Stochastic unit commitment with wind power
Stochastic Unit Commitment with Wind Power

Zavala’s SA2 talk

  • Wind Forecast – WRF(Weather Research and Forecasting) Model

    • Real-time grid-nested 24h simulation

    • 30 samples require 1h on 500 CPUs ([email protected])

Slide courtesy of V. Zavala & E. Constantinescu

Optimization under uncertainty
Optimization under Uncertainty

  • Two-stage stochastic programming with recourse (“here-and-now”)

subj. to.

subj. to.



Sample average approximation (SAA)

subj. to.


Inference Analysis

M samples

Linear algebra of primal dual interior point methods
Linear Algebra of Primal-Dual Interior-Point Methods

Convex quadratic problem

IPM Linear System


subj. to.

Multi-stage SP

Two-stage SP


arrow-shaped linear system

(via a permutation)

The Direct Schur Complement Method (DSC)

  • Uses the arrow shape of H

  • 1.Implicit factorization 2. Solving Hz=r

  • 2.1. Back substitution 2.2. Diagonal Solve

2.3. Forward substitution

Parallelizing dsc 1 factorization phase
Parallelizing DSC – 1. Factorization phase

Process 1

Process 2

Process 1

2. Backsolve

Process p

Factorization of the 1st stage Schur complement matrix = BOTTLENECK

Parallelizing dsc 2 backsolve
Parallelizing DSC – 2. Backsolve

Process 1

Process 1

Process 2

Process 2

Process 1


Process p

Process p

1st stage backsolve = BOTTLENECK

Scalability of dsc
Scalability of DSC

Unit commitment

76.7% efficiency

butnot always the case

Large number of 1st stage variables: 38.6% efficiency

on Fusion @ Argonne

Preconditioned schur complement psc
Preconditioned Schur Complement (PSC)

(separate process)

REMOVES the factorization bottleneck

Slightly largerbacksolve bottleneck

The stochastic preconditioner
The Stochastic Preconditioner

  • The exact structure of C is

  • IID subset of n scenarios:

  • The stochastic preconditioner(Petra & Anitescu, 2010)

  • For C use the constraint preconditioner (Keller et. al., 2000)

The ugly unit commitment problem
The “Ugly” Unit Commitment Problem

  • DSC on P processes vs PSC on P+1 process

Optimal use of PSC – linear scaling

  • 120 scenarios

Factorization of the preconditioner can not be

hidden anymore.

Quality of the stochastic preconditioner
Quality of the Stochastic Preconditioner

  • “Exponentially” better preconditioning (Petra & Anitescu 2010)

  • Proof: Hoeffding inequality

  • Assumptions on the problem’s random data

    • Boundedness

    • Uniform full rank of and

not restrictive

Quality of the constraint preconditioner
Quality of the Constraint Preconditioner

  • has an eigenvalue 1 with order of multiplicity .

  • The rest of the eigenvalues satisfy

  • Proof: based on Bergamaschiet. al., 2004.

Performance of the preconditioner
Performance of the preconditioner

  • Eigenvalues clustering & Krylov iterations

  • Affected by the well-known ill-conditioning of IPMs.

Solution 2 paralellization of stage 1 linear algebra

Parallelizing the 1 st stage linear algebra
Parallelizing the 1st stage linear algebra

  • We distribute the 1st stage Schur complement system.

  • C is treated as dense.

  • Alternative to PSC for problems with large number of 1st stage variables.

  • Removes the memory bottleneck of PSC and DSC.

  • We investigated ScaLapack, Elemental (successor of PLAPACK)

    • None have a solver for symmetric indefinite matrices (Bunch-Kaufman);

    • LU or Cholesky only.

    • So we had to think of modifying either.

densesymm. pos. def.,

sparse full rank.

Scalapack ornl
ScaLapack (ORNL)

  • Classical block distribution of the matrix

  • Blocked “down-looking” Cholesky - algorithmic blocks

    • Size of algorithmic block = size of distribution block!

  • For cache-performance - large algorithmic blocks

  • For good load balancing - small distribution blocks

  • Must trade off cache-performance for load balancing

  • Communication: basic MPI calls

  • Inflexible in working with sub-blocks

Elemental ut austin
Elemental (UT Austin)

  • Unconventional “elemental” distribution: blocks of size 1.

  • Size of algorithmic block size of distribution block

  • Both cache-performance (large alg. blocks) and load balancing (distrib. blocks of size 1)

  • Communication

    • More sophisticated MPI calls

    • Overhead O(log(sqrt(p))), p is the number of processors.

  • Sub-blocks friendly

  • Better performance in a hybrid approach, MPI+SMP, than ScaLapack

Cholesky based like factorization
Cholesky-based -like factorization

  • Can be viewed as an “implicit” normal equations approach.

  • In-place implementation inside Elemental: no extra memory needed.

  • Idea: modify the Cholesky factorization, by changing the sign after processing p columns.

  • It is much easier to do in Elemental, since this distributes elements, not blocks.

  • Twice as fast as LU

  • Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block.

Distributing the 1 st stage schur complement matrix
Distributing the 1st stage Schur complement matrix

  • All processors contribute to all of the elements of the (1,1) dense block

  • A large amount of inter-process communication occurs.

  • Possibly more costly than the factorization itself.

  • Solution: use buffer to reduce the number of messages when doing a Reduce_scatter.

  • approach also reduces the communication by half – only need to send lower triangle.

Reduce operations
Reduce operations

  • Streamlined copying procedure - Lubin and Petra (2010)

    • Loop over continuous memory and copy elements in send buffer

    • Avoids divisions and modulus ops needed to compute the positions

  • “Symmetric” reduce for

  • Only lower triangle is reduced

    • Fixed buffer size

    • A variable number of columns reduced.

  • Effectively halves the communication (both data & # of MPI calls).

Large scale performance
Large-scale performance

  • First-stage linear algebra: ScaLapack (LU), Elemental(LU), and

  • Strong scaling of PIPS with and

    • 90.1% from 64 to 1024 cores

    • 75.4% from 64 to 2048 cores

    • > 4,000 scenarios

SAA problem:

1st stage variables: 82,000

Total #: 189 million

Thermal units: 1,000

Wind farms: 1,200

Concluding remarks
Concluding remarks

  • PIPS – parallel interior-point solver for stochastic SAA problems

    • Largest SAA prob.

      • 189 Mil vars = 82k 1st-stage vars + 4k scens * 47k 2nd-stage vars

      • 2048 cores

  • Specialized linear algebra layer

    • Small-sized 1st-stage subproblems DSC

    • Medium-sized 1st-stage  PSC

    • Large-sized 1st-stage  Distributed SC

  • Current work: Scenario parallelization in a hybrid programming model MPI+SMP