Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU

Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie Mellon University Pittsburgh PA tcui@ece.cmu.edu This work is supported by NSF 0931978 & 1116802

Smart Grids Image by Dr. M.Sanchez

Smart Grids • New players in the grid • Challenges • Undispatchable, large variances, great impact on grid • Large population exhibits stochastic properties Images from wikipedia Source: Pantos 2011 Source: LBNL-3884e Source: ORNL/TM2004/291

Motivation • Conventional Distribution System • Passively receiving power • Few realtime monitoring or controls • Challenges in Distribution System • Solar, wind, stochastic • Large variance and impact • Smart Distribution System • New Sensors: Smart Meters • High Performance Deskside Supercomputer • A Computational Tool for Probabilistic Grid Monitoring Image from: Wikipedia ~Tflop/s $1000 1kW power Image from: Dell

Outline • Motivation • Distribution System Load Flow Analysis • Code Optimization • Real Time Implementation • Conclusion

Core: Distribution Load Flow • Distribution System: • Radial , high R/X ratio, varying Z, unbalanced • NOT suitable for transmission load flow • Forward / Backward Sweep (FBS) • Implicit Z-matrix, detail model, convergence • Generalized Component Model[Kersting2006] • One Terminal Node Model: Constant PQ: • Two Terminal Link Model: IEEE 37 NodeTest Feeder: Based on an actual feeder in California Source: IEEE PES Distribution System Analysis Subcommittee Backward: Forward:

Core: Distribution Load Flow • Forward / Backward Sweep [Kersting2006] • Branch current based • Input: substation voltage, load; output: all node voltages • Steps: • 1: Initial current = 0, Initial voltage V = V0; • 2: Compute node current In using Node model; • 3: Backward: Compute branch current Ib using Link model & KCL; • 4: Forward: Update Vk+1 = Vk based on Ib over Link model; • 5: Check convergence (|dS|<Error Limit) stop or go to step 2. Forward Backward IEEE 4 Node Test Feeder Example

Core: Distribution Load Flow • 3-Phase Voltage on IEEE 37 Nodes Test Feeder Phase C Phase A Phase B Nominal 1.1 0.90 ANSI C84.1: Nominal: 115, Range A:110~126V Range B:107~127V

Parallel High Performance Power Flow Solver Our Approach Random Variable Sampling • Random Number Generator • Basic Uniform RNG + Transformation for different PDFs • Parallel strategy for multi-thread implementation • Optimized Parallel Distribution Load Flow Solver • Code optimizations • Highly parallel implementation for Monte Carlo applications • Density Estimation & Visualization • Kernel density estimation

Optimization: Data Structures • Data Structure Optimization • Baseline: C++ object oriented, a tree object • Translate to array access, exploit spatial/temporal locality • Other techniques: unroll innermost loops, scalar replacement, pre-compute as much as possible (C++) (C Array) GridLab-D: the Smart Grid Simulator www.gridlabd.org, opensource since 2009

Optimization: Pattern Based Syntehsis • Algorithm-level Optimization • Pattern based matrix-vector multiplication For A,B,c,d matrices: • Multi-grounded Cable: diagonal matrix • Ignore shunt & coupling: c = 0, d = I, A = I Reduce unnecessary operations Reduce unnecessary storage for better memory access Similar to [Belgin2009] (C Pattern) switch (mat_type){ casereal_diag_equal_mat: output[0] = *constant * input[0]; ... output[5] = *constant * input[5]; break; caseimag_diag_equal_mat: output[0] = -*constant * input[3]; output[1] = -*constant * input[4]; output[2] = -*constant * input[5]; output[3] = *constant * input[0]; output[4] = *constant * input[1]; output[5] = *constant * input[2]; break; ... } … case 1: case 2: case N: code 1 code 2 code N

Data Parallelism (SIMD) • SIMD parallelization • SIMD: Single Instruction Multiple Data SSE: Streaming SIMD Extensions • 128bit, 4 floats in one register • eg. 4 “fadd” at cost of 1 “addps” AVX: Advanced Vector eXtensions (256bit, 8 floats), Larrabee (512bit, 16 floats) • Vectorized solver on SIMD level for MCS: • Assumptions & Limitations: converge at same step 4-way SSE example (SIMD) vector registerxmm1 xmm0 vector operationaddps xmm0, xmm1 xmm0

Variant Synthesis with SPIRAL • Symbolic process[Puschel2005]: • pattern based matrix vector product code … case 1: case 2: case N: SPL Compiler

Multithreading • Multithreading, Run across All CPUs • Vectorized load flow solver in each thread • Each thread pinned to a physical core exclusively • Fully utilize computation power of Multi-core CPUs • Double buffer (automatic load balancing for MCS application) (Multi-Core)

Performance Results: Across Sizes • Performance of Optimized Code, Mass Amount Load Flow Pseudo flop/s: >60 % peak Flop/s: 50% Peak

Details: Performance Gains >50x >20x

Performance Results: Across Machines

Accuracy • Convergence of Monte Carlo • Very crude. MCG59+ICDF, 50 trials with “time(NULL)” seeds Out: Voltage on Node 738 In: Active Power P~ u=0,std=100kw on Phase A of Node 738,711,741

System Implementation • Distribution System Probabilistic Monitoring System (DSPMS) • System Structure: • MCS solver running on Multi-core Desktop Server (Code optimization) • Results published via ECE Web Server (TCP/IP socket) • Web based dynamic User Interface by client side scripts (JavaScript) • Smart meters in campus building (MODBUS/TCP)

System Implementation • Distribution System Probabilistic Monitoring System (DSPMS) • Web Server and User Interface • Link: www.ece.cmu.edu/~tcui/test/DistSim/DSPMS.htm

Conclusion • Smart Distribution Network: Impact of renewable and stochastic • Commodity HPC & code optimization: Millions of cases /sec on $1K class machine • Distribution System Probabilistic Monitor: A prove of concept real time application source: LBNL-3884e

References • [LBNL-3884E]. Mills, A. Implications of Wide-Area Geographic Diversity for Short-Term Variability of Solar Power, LBNL-3884E. Lawrence Berkeley National Laboratory, Berkeley, • [ORNL/TM2004/291]. B. Kirby, "Frequency Regulation Basics and Trends," ORNL/TM 2004/291, Oak Ridge National Laboratory, December 2004. • [Pantos2011]. Miloš Pantoš Stochastic optimal charging of electric-drive vehicles with renewable energy Energy, Volume 36, Issue 11, November 2011 • [Ghosh97]. A.K. Ghosh, D. L. Lubkeman, M. J. Downey, R. H. Jones “Distribution Circuit State Estimation Using a Probabilistic Approach,” IEEE Transactions on Power Systems, vol. 12, no. 1, pp. 45-51, 1997 • [Belgin2009]. M. Belgin, G. Back, and C. J. Ribbens, “Pattern-based sparse matrix representation for memory-efficient smvm kernels,” in Proceedings of the 23rd international conference on Supercomputing, ser. ICS ’09. New York, NY, USA: ACM, 2009, pp. 100–109. • [Puschel2005]. M. Puschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP transforms,” Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005. • [Kersting2006]. W. Kersting, Distribution system modeling and analysis. CRC, 2006

The End Thank You! Q&A

Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU

Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU

Presentation Transcript

Load Balancing Parallel Applications on Heterogeneous Platforms

Multi-Commodity Flow Based Routing

Multi commodity flows

A Parallel Implicit Harmonic Balance Solver for Forced Motion Transonic Flow

Multi-Commodity Flow Joe Monahan Josh Onuska

P-QEMU: A Parallel Multi-core System Emulator Based On QEMU

Parallel Data Mining with Services on Multi-core systems

Parallel Implementation of Fast Fourier Transform on a Multi-core System

LOAD FLOW STUDIES

Multi Cycle CPU

Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM)

Online Multi-Commodity Flow with High Demands

The ROOT Project in the multi-core CPU era

Load Flow Solution of Radial Distribution Networks

View-Oriented Parallel Programming for multi-core systems

Multi-Commodity Flow Based Routing

A Parallel Implicit Harmonic Balance Solver for Forced Motion Transonic Flow

Stateless Optimization of Multi-Commodity Flow

The ROOT Project in the multi-core CPU era

Pile LOAD DISTRIBUTION

Load And Traffic On The Hosting CPU