Scalable stochastic programming
1 / 26

Scalable Stochastic Programming - PowerPoint PPT Presentation

  • Uploaded on

Scalable Stochastic Programming. Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing Society Conference Monterey, California January, 2011 Motivation. Sources of uncertainty in complex energy systems

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Scalable Stochastic Programming' - otto-burns

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scalable stochastic programming

Scalable Stochastic Programming

Cosmin Petra and MihaiAnitescu

Mathematics and Computer Science Division

Argonne National Laboratory

Informs Computing Society Conference

Monterey, California

January, 2011


  • Sources of uncertainty in complex energy systems

    • Weather

    • Consumer Demand

    • Market prices

  • Applications @Argonne – Anitescu, Constantinescu, Zavala

    • Stochastic Unit Commitment with Wind Power Generation

    • Energy management of Co-generation

    • Economic Optimization of a Building Energy System

Stochastic unit commitment with wind power
Stochastic Unit Commitment with Wind Power

Zavala’s SA2 talk

  • Wind Forecast – WRF(Weather Research and Forecasting) Model

    • Real-time grid-nested 24h simulation

    • 30 samples require 1h on 500 CPUs (Jazz@Argonne)

Slide courtesy of V. Zavala & E. Constantinescu

Optimization under uncertainty
Optimization under Uncertainty

  • Two-stage stochastic programming with recourse (“here-and-now”)

subj. to.

subj. to.



Sample average approximation (SAA)

subj. to.


Inference Analysis

M samples

Linear algebra of primal dual interior point methods
Linear Algebra of Primal-Dual Interior-Point Methods

Convex quadratic problem

IPM Linear System


subj. to.

Multi-stage SP

Two-stage SP


arrow-shaped linear system

(via a permutation)

Scalable stochastic programming

The Direct Schur Complement Method (DSC)

  • Uses the arrow shape of H

  • 1.Implicit factorization 2. Solving Hz=r

  • 2.1. Back substitution 2.2. Diagonal Solve

2.3. Forward substitution

Parallelizing dsc 1 factorization phase
Parallelizing DSC – 1. Factorization phase

Process 1

Process 2

Process 1

2. Backsolve

Process p

Factorization of the 1st stage Schur complement matrix = BOTTLENECK

Parallelizing dsc 2 backsolve
Parallelizing DSC – 2. Backsolve

Process 1

Process 1

Process 2

Process 2

Process 1


Process p

Process p

1st stage backsolve = BOTTLENECK

Scalability of dsc
Scalability of DSC

Unit commitment

76.7% efficiency

butnot always the case

Large number of 1st stage variables: 38.6% efficiency

on Fusion @ Argonne

Preconditioned schur complement psc
Preconditioned Schur Complement (PSC)

(separate process)

REMOVES the factorization bottleneck

Slightly largerbacksolve bottleneck

The stochastic preconditioner
The Stochastic Preconditioner

  • The exact structure of C is

  • IID subset of n scenarios:

  • The stochastic preconditioner(Petra & Anitescu, 2010)

  • For C use the constraint preconditioner (Keller et. al., 2000)

The ugly unit commitment problem
The “Ugly” Unit Commitment Problem

  • DSC on P processes vs PSC on P+1 process

Optimal use of PSC – linear scaling

  • 120 scenarios

Factorization of the preconditioner can not be

hidden anymore.

Quality of the stochastic preconditioner
Quality of the Stochastic Preconditioner

  • “Exponentially” better preconditioning (Petra & Anitescu 2010)

  • Proof: Hoeffding inequality

  • Assumptions on the problem’s random data

    • Boundedness

    • Uniform full rank of and

not restrictive

Quality of the constraint preconditioner
Quality of the Constraint Preconditioner

  • has an eigenvalue 1 with order of multiplicity .

  • The rest of the eigenvalues satisfy

  • Proof: based on Bergamaschiet. al., 2004.

Performance of the preconditioner
Performance of the preconditioner

  • Eigenvalues clustering & Krylov iterations

  • Affected by the well-known ill-conditioning of IPMs.

Solution 2 paralellization of stage 1 linear algebra

Parallelizing the 1 st stage linear algebra
Parallelizing the 1st stage linear algebra

  • We distribute the 1st stage Schur complement system.

  • C is treated as dense.

  • Alternative to PSC for problems with large number of 1st stage variables.

  • Removes the memory bottleneck of PSC and DSC.

  • We investigated ScaLapack, Elemental (successor of PLAPACK)

    • None have a solver for symmetric indefinite matrices (Bunch-Kaufman);

    • LU or Cholesky only.

    • So we had to think of modifying either.

densesymm. pos. def.,

sparse full rank.

Scalapack ornl
ScaLapack (ORNL)

  • Classical block distribution of the matrix

  • Blocked “down-looking” Cholesky - algorithmic blocks

    • Size of algorithmic block = size of distribution block!

  • For cache-performance - large algorithmic blocks

  • For good load balancing - small distribution blocks

  • Must trade off cache-performance for load balancing

  • Communication: basic MPI calls

  • Inflexible in working with sub-blocks

Elemental ut austin
Elemental (UT Austin)

  • Unconventional “elemental” distribution: blocks of size 1.

  • Size of algorithmic block size of distribution block

  • Both cache-performance (large alg. blocks) and load balancing (distrib. blocks of size 1)

  • Communication

    • More sophisticated MPI calls

    • Overhead O(log(sqrt(p))), p is the number of processors.

  • Sub-blocks friendly

  • Better performance in a hybrid approach, MPI+SMP, than ScaLapack

Cholesky based like factorization
Cholesky-based -like factorization

  • Can be viewed as an “implicit” normal equations approach.

  • In-place implementation inside Elemental: no extra memory needed.

  • Idea: modify the Cholesky factorization, by changing the sign after processing p columns.

  • It is much easier to do in Elemental, since this distributes elements, not blocks.

  • Twice as fast as LU

  • Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block.

Distributing the 1 st stage schur complement matrix
Distributing the 1st stage Schur complement matrix

  • All processors contribute to all of the elements of the (1,1) dense block

  • A large amount of inter-process communication occurs.

  • Possibly more costly than the factorization itself.

  • Solution: use buffer to reduce the number of messages when doing a Reduce_scatter.

  • approach also reduces the communication by half – only need to send lower triangle.

Reduce operations
Reduce operations

  • Streamlined copying procedure - Lubin and Petra (2010)

    • Loop over continuous memory and copy elements in send buffer

    • Avoids divisions and modulus ops needed to compute the positions

  • “Symmetric” reduce for

  • Only lower triangle is reduced

    • Fixed buffer size

    • A variable number of columns reduced.

  • Effectively halves the communication (both data & # of MPI calls).

Large scale performance
Large-scale performance

  • First-stage linear algebra: ScaLapack (LU), Elemental(LU), and

  • Strong scaling of PIPS with and

    • 90.1% from 64 to 1024 cores

    • 75.4% from 64 to 2048 cores

    • > 4,000 scenarios

SAA problem:

1st stage variables: 82,000

Total #: 189 million

Thermal units: 1,000

Wind farms: 1,200

Concluding remarks
Concluding remarks

  • PIPS – parallel interior-point solver for stochastic SAA problems

    • Largest SAA prob.

      • 189 Mil vars = 82k 1st-stage vars + 4k scens * 47k 2nd-stage vars

      • 2048 cores

  • Specialized linear algebra layer

    • Small-sized 1st-stage subproblems DSC

    • Medium-sized 1st-stage  PSC

    • Large-sized 1st-stage  Distributed SC

  • Current work: Scenario parallelization in a hybrid programming model MPI+SMP