Models for University HPC Facilities. Dan Stanzione Director, Fulton High Performance Computing Presented at University of Hawai’i 2/18/09 [email protected] The Givens .
CFD Simulation of Golf Ball in Flight,1 billion unknowns in grid
The iPlant Collaborative- a $100M NSF effort to build CyberInfrastructure for “Grand Challenges” in the Plant Science Community. Key pieces of the storage, computation and software development will be in the HPCI.
Ranger - The world’s largest system for open science. The first in NSF’s “Track 2” petascale system acquisitions, this $59M project is a collaboration between UT-Austin, ASU, Cornell, Sun, and AMD.
Evaluating future parallel programming paradigms for the DOD
Developing streams-based parallel software development teams for hybrid systems for the US Air Force
Partnered with Microsoft to port new HPC applications to Windows CCSHPCI 2007/2008 Project Highlights
The HPCI maintains an active portfolio of HPC-oriented research and application projects
The HPCI works with more than 100 faculty to accelerate research discovery, and apply HPC in traditional and non-traditional disciplines, e.g.:
The HPCI primary datacenter has 750,000 VA of UPS power, chilled water racks, under floor cooling (700watt/sq ft).
The Saguaro cluster has 4,900 cores running Linux, capable of about 45 trillion FLOP/s
10TB Primary RAM
400TB Scratch disk, 3GB/sec
Infiniband 20Gb interconnect, 6us max latency, 1.5Tb/s backplane
In the last 6 months, 150,000 simulations completed (1-1,024 processors)
Compute power - 504 Teraflops
3,936 Sun four-socket blades
15,744 AMD “Barcelona” processors
Quad-core, four flops/cycle (dual pipelines)
Memory - 123 Terabytes
2 GB/core, 32 GB/node
132 GB/s aggregate bandwidth
Disk subsystem - 1.7 Petabytes
72 Sun x4500 “Thumper” I/O servers, 24TB each
40 GB/sec total aggregate I/O bandwidth
1 PB raw capacity in largest filesystem
Interconnect - 10 Gbps / 3.0 sec latency
Sun InfiniBand-based switches (2), up to 3456 4x ports each
Full non-blocking 7-stage Clos fabric
Mellanox ConnectX InfiniBand
We strive to be driven by science, not systems
Advanced Computing no longer “one size fits all”
Traditional supercomputer architectures excel at compute intensive problems
Data Intensive, High Throughput Computing is now equally important; HPCI Storage has long been a focus
HPCI “Cloud” system now deployed
Google, Amazon like
Focus on large dataset apps
New class in cloud systems, in conjunction with Google, offered this spring
Storage: Reliable, secure, scalable storage for datasets of virtually infinite size (multi-site backups included!).
Visualization: Expert support at producing 2D or 3D visualizations of your data (SERV Program).
Application Consulting: Parallelization support on existing codes, development of new software.
Education and Training: Academic Courses or seminars and short courses in the effective use of HPC.
Proposal Support: Aid in developing applications for money or resources for projects that have cyberinfrastructure components (80+ proposals last year, >$500M).
Research PartnershipsHPC/Cyberinfrastructure Services
There is a difference between setting up a cluster, and making HPC-enabled research work. This gap must be filled.
Be prepared for success (this isn’t easy)
Provide incentives to participate (mostly), and disincentives not to (a little), but don’t be too heavy-handed.
90% capture is good enough
The other 10% aren’t worth the price.
There are lots of kinds of storage
High Performance Storage (large allocations free, but short term, ~30 days)
Long term, high integrity - $2,000/TB for 5 year lease.
Still less than cost…
Not forever (but a Terabyte will be almost free in 5 years)
Many times disk cost, but managed, secure, snapshots, backups (multi-site mirror).
Premium service at $3k includes off site copies (outsourced).
Ignore the lowest end (easily replaceable data… Best Buy can sell attached disk for $150/TB)