scalable stochastic programming
Download
Skip this Video
Download Presentation
Scalable Stochastic Programming

Loading in 2 Seconds...

play fullscreen
1 / 26

Scalable Stochastic Programming - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Scalable Stochastic Programming. Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing Society Conference Monterey, California January, 2011 petra@mcs.anl.gov. Motivation. Sources of uncertainty in complex energy systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Scalable Stochastic Programming' - otto-burns


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scalable stochastic programming

Scalable Stochastic Programming

Cosmin Petra and MihaiAnitescu

Mathematics and Computer Science Division

Argonne National Laboratory

Informs Computing Society Conference

Monterey, California

January, 2011

petra@mcs.anl.gov

motivation
Motivation
  • Sources of uncertainty in complex energy systems
    • Weather
    • Consumer Demand
    • Market prices
  • Applications @Argonne – Anitescu, Constantinescu, Zavala
    • Stochastic Unit Commitment with Wind Power Generation
    • Energy management of Co-generation
    • Economic Optimization of a Building Energy System
stochastic unit commitment with wind power
Stochastic Unit Commitment with Wind Power

Zavala’s SA2 talk

  • Wind Forecast – WRF(Weather Research and Forecasting) Model
    • Real-time grid-nested 24h simulation
    • 30 samples require 1h on 500 CPUs (Jazz@Argonne)

Slide courtesy of V. Zavala & E. Constantinescu

optimization under uncertainty
Optimization under Uncertainty
  • Two-stage stochastic programming with recourse (“here-and-now”)

subj. to.

subj. to.

continuous

discrete

Sample average approximation (SAA)

subj. to.

Sampling

Inference Analysis

M samples

linear algebra of primal dual interior point methods
Linear Algebra of Primal-Dual Interior-Point Methods

Convex quadratic problem

IPM Linear System

Min

subj. to.

Multi-stage SP

Two-stage SP

nested

arrow-shaped linear system

(via a permutation)

slide6

The Direct Schur Complement Method (DSC)

  • Uses the arrow shape of H
  • 1.Implicit factorization 2. Solving Hz=r
  • 2.1. Back substitution 2.2. Diagonal Solve

2.3. Forward substitution

parallelizing dsc 1 factorization phase
Parallelizing DSC – 1. Factorization phase

Process 1

Process 2

Process 1

2. Backsolve

Process p

Factorization of the 1st stage Schur complement matrix = BOTTLENECK

parallelizing dsc 2 backsolve
Parallelizing DSC – 2. Backsolve

Process 1

Process 1

Process 2

Process 2

Process 1

1.Factorization

Process p

Process p

1st stage backsolve = BOTTLENECK

scalability of dsc
Scalability of DSC

Unit commitment

76.7% efficiency

butnot always the case

Large number of 1st stage variables: 38.6% efficiency

on Fusion @ Argonne

preconditioned schur complement psc
Preconditioned Schur Complement (PSC)

(separate process)

REMOVES the factorization bottleneck

Slightly largerbacksolve bottleneck

the stochastic preconditioner
The Stochastic Preconditioner
  • The exact structure of C is
  • IID subset of n scenarios:
  • The stochastic preconditioner(Petra & Anitescu, 2010)
  • For C use the constraint preconditioner (Keller et. al., 2000)
the ugly unit commitment problem
The “Ugly” Unit Commitment Problem
  • DSC on P processes vs PSC on P+1 process

Optimal use of PSC – linear scaling

  • 120 scenarios

Factorization of the preconditioner can not be

hidden anymore.

quality of the stochastic preconditioner
Quality of the Stochastic Preconditioner
  • “Exponentially” better preconditioning (Petra & Anitescu 2010)
  • Proof: Hoeffding inequality
  • Assumptions on the problem’s random data
    • Boundedness
    • Uniform full rank of and

not restrictive

quality of the constraint preconditioner
Quality of the Constraint Preconditioner
  • has an eigenvalue 1 with order of multiplicity .
  • The rest of the eigenvalues satisfy
  • Proof: based on Bergamaschiet. al., 2004.
performance of the preconditioner
Performance of the preconditioner
  • Eigenvalues clustering & Krylov iterations
  • Affected by the well-known ill-conditioning of IPMs.
parallelizing the 1 st stage linear algebra
Parallelizing the 1st stage linear algebra
  • We distribute the 1st stage Schur complement system.
  • C is treated as dense.
  • Alternative to PSC for problems with large number of 1st stage variables.
  • Removes the memory bottleneck of PSC and DSC.
  • We investigated ScaLapack, Elemental (successor of PLAPACK)
    • None have a solver for symmetric indefinite matrices (Bunch-Kaufman);
    • LU or Cholesky only.
    • So we had to think of modifying either.

densesymm. pos. def.,

sparse full rank.

scalapack ornl
ScaLapack (ORNL)
  • Classical block distribution of the matrix
  • Blocked “down-looking” Cholesky - algorithmic blocks
    • Size of algorithmic block = size of distribution block!
  • For cache-performance - large algorithmic blocks
  • For good load balancing - small distribution blocks
  • Must trade off cache-performance for load balancing
  • Communication: basic MPI calls
  • Inflexible in working with sub-blocks
elemental ut austin
Elemental (UT Austin)
  • Unconventional “elemental” distribution: blocks of size 1.
  • Size of algorithmic block size of distribution block
  • Both cache-performance (large alg. blocks) and load balancing (distrib. blocks of size 1)
  • Communication
    • More sophisticated MPI calls
    • Overhead O(log(sqrt(p))), p is the number of processors.
  • Sub-blocks friendly
  • Better performance in a hybrid approach, MPI+SMP, than ScaLapack
cholesky based like factorization
Cholesky-based -like factorization
  • Can be viewed as an “implicit” normal equations approach.
  • In-place implementation inside Elemental: no extra memory needed.
  • Idea: modify the Cholesky factorization, by changing the sign after processing p columns.
  • It is much easier to do in Elemental, since this distributes elements, not blocks.
  • Twice as fast as LU
  • Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block.
distributing the 1 st stage schur complement matrix
Distributing the 1st stage Schur complement matrix
  • All processors contribute to all of the elements of the (1,1) dense block
  • A large amount of inter-process communication occurs.
  • Possibly more costly than the factorization itself.
  • Solution: use buffer to reduce the number of messages when doing a Reduce_scatter.
  • approach also reduces the communication by half – only need to send lower triangle.
reduce operations
Reduce operations
  • Streamlined copying procedure - Lubin and Petra (2010)
    • Loop over continuous memory and copy elements in send buffer
    • Avoids divisions and modulus ops needed to compute the positions
  • “Symmetric” reduce for
  • Only lower triangle is reduced
    • Fixed buffer size
    • A variable number of columns reduced.
  • Effectively halves the communication (both data & # of MPI calls).
large scale performance
Large-scale performance
  • First-stage linear algebra: ScaLapack (LU), Elemental(LU), and
  • Strong scaling of PIPS with and
    • 90.1% from 64 to 1024 cores
    • 75.4% from 64 to 2048 cores
    • > 4,000 scenarios

SAA problem:

1st stage variables: 82,000

Total #: 189 million

Thermal units: 1,000

Wind farms: 1,200

concluding remarks
Concluding remarks
  • PIPS – parallel interior-point solver for stochastic SAA problems
    • Largest SAA prob.
      • 189 Mil vars = 82k 1st-stage vars + 4k scens * 47k 2nd-stage vars
      • 2048 cores
  • Specialized linear algebra layer
    • Small-sized 1st-stage subproblems DSC
    • Medium-sized 1st-stage  PSC
    • Large-sized 1st-stage  Distributed SC
  • Current work: Scenario parallelization in a hybrid programming model MPI+SMP
ad