Scientific Computing at SLAC

Scientific Computing at SLAC Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center HEPiX October 11, 2005

SLAC Scientific ComputingBalancing Act • Aligned with the evolving science mission of SLAC but neither • Subservient to the science mission nor • Unresponsive to SLAC mission needs

SLAC Scientific Computing Drivers • BaBar (data-taking ends December 2008) • The world’s most data-driven experiment • Data analysis challenges until the end of the decade • KIPAC • From cosmological modeling to petabyte data analysis • Photon Science at SSRL and LCLS • Ultrafast Science, modeling and data analysis • Accelerator Science • Modeling electromagnetic structures (PDE solvers in a demanding application) • The Broader US HEP Program (aka LHC) • Contributes to the orientation of SLAC Scientific Computing R&D

DOE Scientific Computing Funding at SLAC • Particle and Particle Astrophysics • $14M SCCS • $5M Other • Photon Science • $0M SCCS • $1M SSRL? • Computer Science • $1.5M

Scientific ComputingThe relationship between Science and the components of Scientific Computing SCCS FTE ~20 ~26

Scientific Computing:SLAC’s goals for Scientific Computing Collaboration with Stanford and Industry

Scientific Computing:Current SLAC leadership and recent achievements in Scientific Computing

What does SCCS run (1)? 256 dual-opteron Sun V20zs Data analysis “farms” (also good for HEP simulation) ~ 4000 processors ~ Linux and Solaris Shared-Memory multiprocessor • SGI 3700 • 72 processors • Linux Myrinet cluster • 128 processors • Linux

What does SCCS run (2)? Application-specific clusters • each 32 to 128 processors • Linux And even … PetaCache Prototype • 64 nodes • 16 GB memory per node • Linux/Solaris

What does SCCS run (3)? Disk Servers • About 500TB • Network attached • Mainly xrootd • Some NFS • Some AFS About 120 TB Sun/Solaris Servers Sun fibrechannel disk arrays Tape Storage • 6 STK Powderhorn Silos • Up to 6 petabytes capacity • Currently store 2 petabytes • HPSS

What does SCCS run (4)? Networks • 10 Gigabits/sto ESNET • 10 Gigabits/sR&D • 96 fibersto Stanford • 10 Gigabits/s core in computer center (as soon as we unpack the boxes)

SLAC Computing - Principles and Practice (Simplify and Standardize) • Lights-out operation – no operators for the last 10 years • Run 24x7 with 8x5 (in theory) staff • (When there is a cooling failure on a Sunday morning, 10–15 SCS staff are on site by the time I turn up) • Science (and business-unix) raised-floor computing • Adequate reliability essential • Solaris and Linux • Scalable “cookie-cutter” approach • Only one type of farm CPU bought each year • Only one type of file-server + disk bought each year • Highly automated OS installations and maintenance • e.g see talk on how SLAC does Clusters by Alf Wachsmannhttp://www.slac.stanford.edu/~alfw/talks/RCcluster.pdf

SLAC-BaBar Computing Fabric HEP-specific ROOT software (Xrootd) +Objectivity/DB object database some NFS Disk Server Disk Server Disk Server Disk Server Disk Server Disk Server Tape Server Tape Server Tape Server Tape Server Tape Server Client Client Client Client Client Client 1700 dual CPU Linux 800 single CPU Sun/Solaris IP Network (Cisco) 120 dual/quad CPU Sun/Solaris~400 TB Sun FibreChannel RAID arrays (+some SATA) IP Network (Cisco) HPSS + SLAC enhancements to ROOT and Objectivity server code 25 dual CPU Sun/Solaris40 STK 9940B6 STK 9840A6 STK Powderhornover 1 PB of data

Scientific ComputingResearch Areas (1)(Funded by DOE-HEP and DOE SciDAC and DOE-MICS) • Huge-memory systems for data analysis(SCCS Systems group and BaBar) • Expected major growth area (more later) • Scalable Data-Intensive Systems: (SCCS Systems and Physics Experiment Support groups) • “The world’s largest database” (OK not really a database any more) • How to maintain performance with data volumes growing like “Moore’s Law”? • How to improve performance by factors of 10, 100, 1000, … ?(intelligence plus brute force) • Robustness, load balancing, troubleshootability in 1000 – 10000-box systems • Astronomical data analysis on a petabyte scale (in collaboration with KIPAC)

Scientific ComputingResearch Areas (2)(Funded by DOE-HEP and DOE SciDAC and DOE MICS) • Grids and Security: (SCCS Physics Experiment Support. Systems and Security groups) • PPDG: Building the US HEP Grid – Open Science Grid; • Security in an open scientific environment; • Accounting, monitoring, troubleshooting and robustness. • Network Research and Stunts:(SCCS Network group – Les Cottrell et al.) • Land-speed record and other trophies • Internet Monitoring and Prediction:(SCCS Network group) • IEPM: Internet End-to-End Performance Monitoring (~5 years) • INCITE: Edge-based Traffic Processing and Service Inference for High-Performance Networks

Scientific ComputingResearch Areas (2)(Funded by DOE-HEP and DOE SciDAC and DOE MICS) • GEANT4: Simulation of particle interactions in million to billion-element geometries:(SCCS Physics Experiment Support Group – M. Asai, D. Wright, T. Koi, J. Perl …) • BaBar, GLAST, LCD … • LHC program • Space • Medical • PDE Solving for complex electromagnetic structures:(Kwok Ko‘s advanced Computing Department + SCCS clusters)

Growing Competences • Parallel Computing (MPI …) • Driven by KIPAC (Tom Abel) and ACD (Kwok Ko) • SCCS competence in parallel computing (= Alf Wachsmann currently) • MPI clusters and SGI SSI system • Visualization • Driven by KIPAC and ACD • SCCS competence is currently experimental-HEP focused (WIRED, HEPREP …) • (A polite way of saying that growth is needed)

A Leadership-Class Facility for Data-Intensive Science The PetaCache Project Richard P. Mount Director, SLAC Computing ServicesAssistant Director, SLAC Research Division Washington DC, April 13, 2004

PetaCache Goals • The PetaCache architecture aims at revolutionizing the query and analysis of scientific databases with complex structure. • Generally this applies to feature databases (terabytes–petabytes) rather than bulk data (petabytes–exabytes) • The original motivation comes from HEP • Sparse (~random) access to tens of terabytes today, petabytes tomorrow • Access by thousands of processors today, tens of thousands tomorrow

PetaCacheThe Team • David Leith, Richard Mount, PIs • Randy Melen, Project Leader • Chuck Boeheim (Systems group leader) • Bill Weeks, performance testing • Andy Hanushevsky, xrootd • Systems group members • Network group members • BaBar (Stephen Gowdy)

Latency and Speed – Random Access

The PetaCache Strategy • Sitting back and waiting for technology is a BAD idea • Scalable petabyte memory-based data servers require much more than just cheap chips. Now is the time to develop: • Data server architecture(s) delivering latency and throughput cost-optimized for the science • Scalable data-server software supporting a range of data-serving paradigms (file-access, direct addressing, …) • “Liberation” of entrenched legacy approaches to scientific data analysis that are founded on the “knowledge” that accessing small data objects is crazily inefficient • Applications will take time to adapt not just codes, but their whole approach to computing, to exploit the new architecture • Hence: three phases • Prototype machine (In operation) • Commodity hardware • Existing “scalable data server software” (as developed for disk-based systems) • HEP-BaBar as co-funder and principal user • Tests of other applications (GLAST, LSST …) • Tantalizing “toy” applications only (too little memory for flagship analysis applications) • Industry participation • Development Machine (Next proposal) • Low-risk (purely commodity hardware) and higher-risk (flash memory system requiring some hardware development) components • Data server software – improvements to performance and scalability, investigation of other paradigms • HEP-BaBar as co-funder and principal user • Work to “liberate” BaBar analysis applications • Tests of other applications (GLAST, LSST …) • Major impact on a flagship analysis application • Industry partnerships, DOE Lab partnerships • Production Machine(s) • Storage-class memory with a range of interconnect options matched to the latency/throughput needs of differing applications • Scalable data-server software offering several data-access paradigms to applications • Proliferation – machines deployed at several labs • Economic viability – cost-effective for programs needing dedicated machines • Industry partnerships transitioning to commercialization

Prototype Machine(Operational) Cisco Switch Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 64-128 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory Up to 2TB total Memory Solaris or Linux (mix and match) Cisco Switches PetaCacheMICS + HEP-BaBar Funding Existing HEP-Funded BaBar Systems

Object-Serving Software • Xrootd/olbd (Andy Hanushevsky/SLAC) • Optimized for read-only access • File-access paradigm (filename, offset, bytecount) • Make 1000s of servers transparent to user code • Load balancing • Self-organizing • Automatic staging from tape • Failure recovery • Allows BaBar to start getting benefit from a new data-access architecture within months without changes to user code • The application can ignore the hundreds of separate address spaces in the data-cache memory

Prototype Machine: Performance Measurements • Latency • Throughput (transaction rate) • (Aspects of) Scalability

Latency (microseconds) versus data retrieved (bytes)

Throughput Measurements 22 processor microseconds per transaction

Storage-Class Memory • New technologies coming to market in the next 3 – 10 years (Jai Menon – IBM) • Current not-quite-crazy example is flash memory

Development MachinePlans Data-Servers 30 Nodes, each 2 Opteron CPU, 1TB Flash memory ~ 30TB total Memory Solaris/Linux SLAC-BaBar System Switch (10 Gigabit ports) Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 80 Nodes, each 8 Opteron CPU, 128 GB memory Up to 10TB total Memory Solaris/Linux Cisco Switch Fabric PetaCache

Minor Details? • 1970’s • SLAC Computing Center designed for ~35 Watts/square foot • 0.56 MWatts maximum • 2005 • Over 1 MWatt by the end of the year • Locally high densities (16 kW racks) • 2010 • Over 2 MWatts likely need • Onwards • Uncertain, but increasing power/cooling need is likely

Crystal Ball (1) • The pessimist’s vision: • PPA computing winds down to about 20% of its current level as BaBar analysis ends in 2012 (Glast is negligible, KIPAC is small) • Photon Science is dominated by non-SLAC/non-Stanford scientists who do everything at their home institutions • The weak drivers from the science base make SLAC unattractive for DOE computer science funding

Crystal Ball (2) • The optimist’s vision: • PPA computing in 2012 includes: • Vigorous/leadership involvement in LHC physics analysis using innovative computing facilities • Massive computational cosmology/astrophysics • A major role in LSST data analysis (petabytes of data) • Accelerator simulation for the ILC • Photon Science computing includes: • A strong SLAC/Stanford faculty, leading much of LCLS science, fully exploiting SLAC’s strengths in simulation, data analysis and visualization • A major BES accelerator research initiative • Computer Science includes: • National/international leadership in computing for data-intensive science (supported at $25M to $50M per year) • SLAC and Stanford: • University-wide support for establishing leadership in the science of scientific computing • New SLAC/Stanford scientific computing institute.

Scientific Computing at SLAC