Michael L. Norman Principal Investigator Interim Director, SDSC

Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely Co-Principal Investigator Project Scientist

What is Gordon? • A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory • Emphasizes MEM and IO over FLOPS • A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science • The NSF’s most recent Track 2 award to the San Diego Supercomputer Center (SDSC) • Coming Summer 2011

Why Gordon? • Growth of digital data is exponential • “data tsunami” • Driven by advances in digital detectors, networking, and storage technologies • Making sense of it all is the new imperative • data analysis workflows • data mining • visual analytics • multiple-database queries • on demand data-driven applications

The Memory Hierarchy Potential 10x speedup for random I/O to large files and databases Flash SSD, O(TB) 1000 cycles

Gordon Architecture: “Supernode” • 32 Appro Extreme-X compute nodes • Dual processor Intel Sandy Bridge • 240 GFLOPS • 64 GB • 2 Appro Extreme-X IO nodes • Intel SSD drives • 4 TB ea. • 560,000 IOPS • ScaleMPvSMP virtual shared memory • 2 TB RAM aggregate • 8 TB SSD aggregate 4 TB SSD I/O Node 240 GF Comp. Node 64 GB RAM 240 GF Comp. Node 64 GB RAM vSMP memory virtualization

Gordon Architecture: Full Machine • 32 supernodes = 1024 compute nodes • Dual rail QDR Infiniband network • 3D torus (4x4x4) • 4 PB rotating disk parallel file system • >100 GB/s SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN D D D D D D

Gordon Peak Capabilities

Such applications involve “very large data-sets or very large input-output requirements” Two data-intensive application classes are important and growing Gordon is designed specifically for data-intensive HPC applications • Data Mining • “the process of extracting hidden patterns from data… with the amount of data doubling every three years,data mining is becoming an increasingly important tool to transform this data into information.” Wikipedia • Data-Intensive • Predictive Science • solution of scientific problems via simulations that generate large amounts of data

High Performance Computing (HPC) vs High Performance Data (HPD)

Data mining applicationswill benefit from Gordon • De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations • Will benefit from large shared memory • Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. • Will benefit from low latency I/O from flash

Data-intensive predictive sciencewill benefit from Gordon • Solution of inverse problems in oceanography, atmospheric science, & seismology • Will benefit from a balanced system, especially large RAM per core & fast I/O • Modestly scalable codes in quantum chemistry & structural engineering • Will benefit from large shared memory

Dash: towards a supercomputer for data intensive computing

Project Timeline • Phase 1: Dash development (9/09-7/11) • Phase 2: Gordon build and acceptance (3/11-7/11) • Phase 3: Gordon operations (7/11-6/14)

Comparison of the Dash and Gordon systems 1.2TB per 100K tpm-C IOPS per tpm-C 150K tpm-C 1996 = .7 - .8 smaller memory subsystem 2006 = .3 - .5 Doubling capacity halves accessibility to any random data on a given media

Gordon project wins storage challenge at SC09 with Dash

We won SC09 Data Challenge with Dash! • With these numbers: • IOR 4KB • RAMFS 4Million+ IOPS on up to .750 TB of DRAM (1 supernode’s worth) • 88K+ IOPS on up to 1 TB of flash (1 supernode’s worth) • Speed up Palomar Transients database searches 10x to 100x • Best IOPS per dollar • Since that time we boosted flash IOPS to 540K (hitting our 2011 performance targets – it is now 2009 

Dash Update – early vSMP test results

Next Steps • Continue vSMP and flash SSD assessment and development on Dash • Prototype Gordon application profiles using Dash • New application domains • New usage modes and operational support mechanisms • New user support requirements • Work with TRAC to identify candidate apps • Assemble Gordon User Advisory Committee • International Data-Intensive Conference Fall 2010

Michael L. Norman Principal Investigator Interim Director, SDSC

Michael L. Norman Principal Investigator Interim Director, SDSC

Presentation Transcript

Principal Investigator Compliance Education

Principal Investigator Presentation

Cathy L. Melvin, PhD, MPH Principal Investigator Alexis Moore, MPH Project Director

Principal Investigator – Primary Collaborators –

Principal Investigator Effort Training

Dae Y. Kim, Principal Investigator Amy Cassata, Co-Principal Investigator

Principal Investigator Compliance Education

Principal Investigator/Project Director : Rachel Novotny

José Szapocznik , PhD, Principal Investigator, Director

Darlene Shearer, DrPH, Principal Investigator/Director MCS-WFD

Michael Hallstone, Ph.D., Principal Investigator Department of Justice Administration

Principal Investigator: Michael Williams, PhD

Michael L. Steinberg, MD, FACR Principal Investigator David Khan, MD Co-Principal Investigator

Principal Investigator Briefing

Carla Walker, OTR/L Program Director Kerri Morgan, OTR/L Principal Investigator

PRINCIPAL NET INVESTIGATOR:

Michael L. Norman Principal Investigator Interim Director, SDSC

Principal Investigator: Giuseppe Patti

Interim Director

Michael Hallstone, Ph.D., Principal Investigator Department of Justice Administration

Michael C. Mithoefer, MD, principal investigator

Principal Investigator Briefing