Efficient Parallel Software for Large-Scale Semidefinite Programs

Efficient Parallel Software for Large-Scale Semidefinite Programs Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University MSC 2010 @ Yokohama [2010/09/08]

Outline • SemiDefinite Programming • Conversion of stability condition for differential inclusions to an SDP • Primal-Dual Interior-Point Methods and its parallel implementation • Numerical Results

Many Applications of SDP • Control Theory • Stability Condition for Differential Inclusions • Discrete-Time Optimal Control Problem • Via SDP relaxation • Polynomial Optimization Problem • Sensor Network Problem • Quadratic Assignment Problem • Quantum Chemistry/Information • Large SDP ⇒ Parallel Solver

Standard form of SDP

Stability condition for differential inclusions to standard SDP Boyd et al • . • Does the solution remain in a bounded region? • i.e., • Yes, if

Conversion to SDP • . • To hold this inequality, • Bounding the condition number⇒SDP.

SDP from SCDI • . • Feasible solution ⇒ Boundness of the solution • Some translation for standard SDPby e.g. YALMIP [J. Löfberg].

Discrete-Time Optimal Control Problems • This Problem [Coleman et al] can be formulated as SDP via SparsePOP [Kim et al].

Primal-Dual Interior-Point Methods • Both Primal and Dual simultaneously in Polynomial-time • Many software are developed • SDPA [Yamashita et al] • SDPT3 [Toh et al] • SeDuMi [Sturm et al] • CSDP [Borcher et al]

Algorithmic Framework of Primal-Dual Interior-Point Methods Feasible Region of Central Path Initial Point Step Length to keep interior property Target Point Search Direction The most computational time is consumed by the Search Direction Optimal Solution

Bottlenecks in PDIPMand SDPARA • To obtain the direction, we solve • ELEMENTS • CHOLESKY • In SDPARA, parallel computation is applied to these two bottlenecks Xeon 5460,3.16GHz

Nonzero pattern ofSchur complement matrix (B) • Sparse Schur complement matrix • Fully dense Schur complement matrix DTOC SCDI

Exploitation of Sparsityin SDPA • We change the formula by row-wise • We keep this scheme on parallel computation F1 F2 F3

Row-wise distribution for dense Schur complement matrix • 4 CPU is available • Each CPU computes only their assigned rows • . • No communication between CPUs • Efficient memory management

Fomula-Cost Based distribution for sparse Schur complement

Parallel Computation for CHOLESKY • We employ • ScaLAPACK [Blackford et.al] ⇒ Dense • MUMPS [Amestoy et.al] ⇒ Sparse • Different data storage enhance the parallel Cholesky factorization

Problems for Numerical Results • 16 nodes • Xeon X5460 (3.16GHz) • 48GB memory

Total 15.02 times ELEMENTS 15.67 times CHOLESKY 14.20 times Computation time on SDP [SCDI1] Xeon X5460(3.16GHz) 48GB memory/node ELEMENTS attains high scalability

Total 4.85 times ELEMENTS 13.50 times CHOLESKY 4.34 times Computation time on SDP [DTOC1] Xeon X5460(3.16GHz) 48GB memory/node • Parallel Sparse Cholesky is difficult • ELEMENTS is still enhanced

Comparison with PCSDP [Ivanov et al] • SDPARA is faster than PCSDP • The scalability of SDPARA is higher • Only SDPARA can solve DTOC Time is second, O.M.:out of memory

Concluding Remarks & Future works • SDP has many applications including control theory • SDPARA solves Larse-scale SDPs effectively by parallel computation • Appropriate parallel computations are the key of SDPARA implementation • Improvement on Multi-Threading for sparse Schur complement matrix

Efficient Parallel Software for Large-Scale Semidefinite Programs

Efficient Parallel Software for Large-Scale Semidefinite Programs

Presentation Transcript

Model Checking Large-Scale Software

Efficient Large-Scale Structured Learning

Efficient Parallel kNN Joins for Large Data in MapReduce

Large-scale Hybrid Parallel SAT Solving

Large Scale Parallel Print Service

A New Approach in the System Software Design for Large-Scale Parallel Computers

Efficient Algorithms for Large-Scale GIS Applications

Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

High Performance Solvers for Semidefinite Programs

Efficient Eigensolvers for Large-scale Electronic Nanostructure Calculations

Large Scale Parallel I/O with HDF5

Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Large Scale Crawling the Web for Parallel Texts

Large-Scale Federal Weapons Programs

Efficient Algorithms for Large-Scale Topology Discovery

Efficient Data Collection for Large-Scale Mobile Monitoring

Model Checking Large-Scale Software

Efficient Algorithms for Large-Scale GIS Applications

Towards a parallel OpenCL-based solver for large-scale

Large Scale Parallel Print Service