Highest performance parallel storage for HPC environments

Highest performance parallel storage for HPC environments Garth Gibson CTO & Founder IDC HPC User Forum, I/O and Storage PanelApril 21, 2009

IDC Storage Panel Question 1 With the formalization of Parallel NFS as a standard, what steps are being provided to enable this to be hosted on current (and future) platform choices? • Panasas has contributed two engineers to the open source Linux code effortfor pNFS. Code has started to be included in Linux (2.6.30) and RedHat is beginning to grease the path into Fedora and RHEL. • Panasas has an implementation working with the current Linux head of line code. • Los Alamos National Lab has started testing with Panasas on a 128 node cluster. Ultimate testing may be done on the world’s fastest supercomputer today, LANL’s Roadrunner, which runs PanFS now. • LANL also has plans to test with IBM, BlueArc and maybe with LSI.

IDC Storage Panel Question 2 What are the tools available to help optimize pNFS from the application level down? • Panasas collaborates with the DoE Petascale Data Storage Institute (PDSI) on its agenda to clear the path to Exascale. PDSI has built tracing tools and benchmarks for complex HPC codes that will be applied to pNFS. • In the toolkit of NFS development the most important tool is frequent interoperability testing at Connectathon face-to-face engineering meetings. For debugging, Wireshark has added NFSv4.1 (pNFS) trace specific parsing rules.

IDC Storage Panel Question 3 We are all facing complexity and cost issues. With IB or 10 GE (40/100 GE), where should the HPC community focus its resources? • Panasas supports 1GE, 10GE, and IB connectivity. We believe 10GE is the most compelling long term, especially with data center ethernet protocol enhancements, but IB is quite effective today.

IDC Storage Panel Question 4 There are too many standards, interconnects, media layers today. iSCSI/FCoIB/FCoE/FCoCEE have all been touted as the solution. Is this even relevant in the HPC arena? Is fragmentation the only choice? • Panasas believes in converged networks based on Ethernet. We use iSCSI today in solutions of up to 30 GB/s. • Converged and Enhanced Ethernet, or Data Center Ethernet, is an interesting optimization, but is not required. • Panasas does not see FC or any of its variants to be important in HPC. It might be incidentally present inside storage boxes, although SAS should be fine for this too.

IDC Storage Panel Question 5 What are the top 3 main (technical or human) issues in HPC I/O today? • While high performance parallel file systems now support concurrent writing into one file system from 1000s of nodes at the same time, large numbers of codes are still written for the era in which storage was connected to only one node. Pushing all I/O through one node does not scale, but adapting and optimizing stable code bases takes many years. • Enterprise-class reliability and availability has long been absent in high performance, scalable HPC systems. Too many solutions are too do-it-yourself to be reliable, well integrated, well supported AND scalable. • Multiple divergent business models confuse things – is scalable storage software developed and tested by hackers in their basements, is it a loss leader cost center buried in the price of compute cluster hardware, or is it a valuable asset in itself?

Highest performance parallel storage for HPC environments

Highest performance parallel storage for HPC environments

Presentation Transcript

High performance computing for a family of smooth trajectories using parallel environments

Engenio 7900 HPC Storage System

2012 HPC Summit for Wall Street Storage Architecture for High Performance

Approximate History Map for Massively Parallel Environments

Parallel Performance

BEST Overall Storage for Virtual Environments*

Optimizing Performance of HPC Storage Systems

Map/Reduce on Lustre Hadoop Performance in HPC Environments

Predicting Parallel Performance

Parallel Performance

Tools for Performance Debugging HPC Applications

Storage Systems in HPC

Scalable performance-analysis toolset for parallel codes Suitable for wide range of HPC platforms

Storage Accounting for Grid Environments

Interconnection Networks for Server, Storage and HPC Systems

A Recipe for Performance Analysis and Tuning in Parallel Computing Environments 20

Dynamic Resource Management for Virtualization HPC Environments

Performance Debugging Techniques For HPC Applications

Preparing for Extreme Parallel Environments: Models for Parallelism

Parallel Performance

A Recipe for Performance Analysis and Tuning in Parallel Computing Environments 20

Parallel Performance