Are Supercomputers returning to Investment Banking?

Are Supercomputers returning to Investment Banking? Dr Alistair Dunlop JPMorgan Co-head QR Analytics and Head of HPC Strategy

The infrastructure and scale today • What infrastructure is in use today? • Large scale grid computational environment based on commodity Linux servers • Distributed across multiple data centres in all key regions • Provides complete pricing and risk computation for all asset classes • Asset classes “own” compute capacity • Heterogeneous compute infrastructure • Focus is: • Supplier focused to deliver the minimum cost/core to the IB • Programming model supports: • Host allocation meeting requirements • Reliable task execution with expanding/contracting resources well suited to embarrassingly parallel applications • A single job could consist of > 100,000 tasks. • System scale and growth • Average core growth of 36% per annum • Core numbers of all tier 1 IB’s in 100k’s • Core growth accelerating • Total costs growing by over 20% per annum

Is this still the right technology? • Traditional problems being solved • Options pricing and risk are not suited for direct solution • Monte Carlo simulation is widely used to generate approximate solutions • -’ve: computationally intensive • +’ve: embarrassingly parallel • cost proportional to accuracy and inversely proportional to processing time • Historically the computational focus has been on individual option pricings within a desk • Options can be priced in parallel • Single options can be parallelised – multi-process; multi-threading; GPUs; any accelerator • Fastest growth area is firm-wide risk computation (Regulatory driven) • Credit Counterparty Risk (CCR) – risk of counterparty default • Credit Valuation Adjustment (CVA) – market value of CCR • Determines the Bank’s Capital Requirements • Computationally more complex than traditional problems due to : • Potentially large data sets for a single counterparty that may not fit into host memory • Ability to reuse computation • Is a x86 approach with accelerators in a loosely coupled infrastructure still optimal?

What CPUs and accelerators work best for our problems? • Reviewed multiple CPU and accelerator performance for pricing CDOs (Collateralised Debt Obligations) at representative size • single factor Gaussian copula model with stochastic recovery • Highly parallel • Baseline production code performance x86 “Sandy Bridge” • New CPU and accelerator ports done in conjunction with Intel, IBM, Nvidia • Performance (for medium to large tasks – run-time from 1 second to 10 seconds) • The biggest improvement was through optimising code (>4x) for the Xeon through vectorisation, with best absolute performance from the GPU • The best single socket CPU performance delivers similar performance to approximately 2 single CPU sockets • Intel x86 optimisation was simplest port, delivering 3x performance within a day. Speedup relative to existing (prior to optimisations) single processor Xeon

Single socket processing costs • Relative capex cost/result* (*costs correct as at June 2013) • Cost per result is very similar for all but the FPGA (which has higher capex costs per accelerator) • But Capex typically represents less than half the Opex over processor lifetime • Opex favours accelerator market very significantly • Other factors • Problem size: The problem size of interest can significantly change the results. • Developer effort: A critical part of the evaluation was to understand the amount of effort required to obtain the performance and maintain the subsequent code. • A large reduction in memory per core necessitates a rewrite/algorithm review. • Heterogeneous environments are harder to efficiently utilise • Improving existing single node performance through optimisations should be the first approach used as this is both simple and also identifies the parallelism that would need to be exploited if alternative methods are reviewed.

Are low-latency, high-bandwidth infrastructures worthwhile? • Evaluated a number of key larger applications on Cray XC-30 • Core Analytics Library and 2 largest applications used on the existing infrastructure • All applications were successfully moved without recompilation • The single node performance was 7%-15% better (with the identical memory and CPU) • Performance improvements attributed to memory organisation, Bios and streamlined OS. • All major applications did not demonstrate any substantive improvements due to the infrastructure (outside node level gains) • One highly parallel Market Risk application demonstrated up to 45% better scaling performance due to larger Trade and Market data per task (0.4GB). • Applications port easily, but need to be developed to leverage infrastructure • Are the additional infrastructure costs and code re-development costs offset by reduced core requirements? • Could new application requirements be solved in a more cost effective way using a low-latency, high-bandwidth infrastructure • Some key benefits were operational • Infrastructure capabilities facilitate different operational models • Application and service stack mounted via F/S name space • Minutes to scale from 500 cores to 10,000 • Stateless compute nodes facilitates global optimisation

Conclusions • Cost reductions are most easily attained through CPU code optimisation • Everything else is harder to realise cost reductions but is often easier and more attractive to sponsors • Accelerators deliver higher levels of performance than the CPU for appropriate problems, but economic case isn’t straightforward • Need large scale deployments and multi-tenancy to make sense • Initial development costs plus on-going costs of maintaining multiple code paths • Often more relevant for Capability rather than Capacity reasons • The scale of risk calculation in the IBs means that compute management is a large optimisation problem and infrastructures need to support this. • Requires high-performance networking; high-performance IO system. • The nature of computation is changing to become more integrated and inter-related to other computations • Efficient applications will need to exploit more than the embarrassingly parallel approach

Are Supercomputers returning to Investment Banking?