What Supercomputers Still Can’t Do – a Reflection on the State of the Art in CSE Horst D. Simon

What Supercomputers Still Can’t Do – a Reflection on the State of the Art in CSE Horst D. Simon Associate Laboratory Director, Computing Sciences Director, NERSC CIS’04 Shanghai, P.R. China December 16, 2004 http://www.nersc.gov/~simon

Overview • Introducing NERSC and Computing Sciences at Berkeley Lab • Current Trends in Supercomputing (High-End Computing) • What Supercomputers Do • What Supercomputers Can’t Do

NERSC Serves the Scientific Community

NERSC Center Overview • Funded by DOE, annual budget $38M, about 60 staff • Traditional strategy to invest equally in newest compute platform, staff, and other resources • Supports open, unclassified, basic research • Close collaborations between university and NERSC in computer science and computational science

HPPS 12 IBM SP servers 15 TB of cache disk, 8 STK robots, 44,000 tape slots, 20 200 GB drives, 60 20 GB drives,max capacity 5-8 PB Visualization Server – “escher” SGI Onyx 3400 – 12 Processors/ 2 Infinite Reality 4 graphics pipes 24 Gigabyte Memory/4Terabytes Disk ETHERNET 10/100 Megabit HPSS HPSS STK Robots SGI SYMBOLIC MANIPULATION SERVER FC Disk Gigabit EthernetJumbo Gigabit Ethernet OC 48 – 2400 Mbps ESnet Testbeds and servers PDSF400 processors (Peak 375 GFlop/s)/ 360 GB of Memory/ 35 TB of Disk/Gigabit and Fast EthernetRatio = (1,93) IBM SP NERSC-3 – “Seaborg” 6,656 Processors (Peak 10 TFlop/s)/ 7.8 Terabyte Memory/44Terabytes of Disk Ratio = (8,7) LBNL “Alvarez” Cluster 174 processors (Peak 150 GFlop/s)/87 GB of Memory/1.5 terabytes of Disk/ Myrinet 2000 Ratio - (.6,100) Ratio = (RAM Bytes per Flop, Disk Bytes per Flop) NERSC System Architecture

NERSC Capability Plan 170 248 229 202 160 NERSC 3 NCS NCSb NERSC 5L NERSC 6L Cluster 150 140 130 120 110 100 90 TeraFlops 80 70 60 50 81 81 40 30 20 13 10 0 2005 2006 2007 2008 2009 2010 Year

Technology Trends: Microprocessor Capability Moore’s Law 2X transistors/chip every 1.5 years Called“Moore’s Law” Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. Microprocessors have become smaller, denser, and more powerful. Slide source: Jack Dongarra

My Laptop TOP 500 Performance Development

TOP 500 Performance Projection

Asian Countries

Supercomputing Today Microprocessors have made desktop computing in 2004 what supercomputing was in 1993. Massive Parallelism has changed the “high end” completely. Today clusters of Symmetric Multiprocessors are the standard supercomputer architecture. The microprocessor revolution will continue with little attenuation for at least another 10 years. Continued discussion over architecture for High-End Computing (custom versus commodity).

What Supercomputers Do - Introducing Computational Science and Engineering (CSE) - Four important important observations about CSE illustrated by examples from NERSC

Simulation: The Third Pillar of Science • Traditional scientific and engineering paradigm: (1) Do theory or paper design. (2) Perform experiments or build system. • Limitations: – Too difficult—build large wind tunnels. – Too expensive—build a throw-away passenger jet. – Too slow—wait for climate or galactic evolution. – Too dangerous—weapons, drug design, climate experimentation. • Computational science paradigm: (3) Use high performance computer systems to simulate the phenomenon • Based on known physical laws and efficient numerical methods.

Computational Science – Third Pillar of Science Many programs in DOE need dramatic advances in simulation capabilities to meet their mission goals- SciDAC program created in 2001 Combustion Materials Global Climate Health Effects, Bioremediation Components of Matter Subsurface Transport Fusion Energy

Computational Science and Engineering (CSE) • CSE is a widely accepted label for an evolving field concerned with the science of and the engineering of systems and methodologies to solve computational problems arising throughout science and engineering • CSE is characterized by • Multi - disciplinary • Multi - institutional • Requiring high end resources • Large teams • Focus on community software • CSE is not “just programming” (and not CS) • Teraflop/s computing is necessary but not sufficient Reference: Petzold, L., et al., Graduate Education in CSE, SIAM Rev., 43(2001), 163-177

First Observation about CSE • CSE permits us to ask new scientific questions • The increased computational capability available today lets us do more of the same (scaling to larger problems, more refinement etc.) , • but it is most effectively used, when addressing qualitatively new science questions.

High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL Wintertime Precipitation As model resolution becomes finer, results converge towards observations

Tropical Cyclones and Hurricanes • Research by: Michael Wehner, Berkeley Lab, Phil Duffy, and G. Bala, LLNL • Hurricanes are extreme events with large impacts on human and natural systems • Characterized by high vorticity (winds), very low pressure centers, and upper air temperature warm anomalies • Wind speeds on the Saffir-Simpson Hurricane Scale • Category one: 74-95 mph (64-82 kt or 119-153 km/hr) • Category two: 96-110 mph (83-95 kt or 154-177 km/hr) • Category three: 111-130 mph (96-113 kt or 178-209 km/hr) • Category four: 131-155 mph (114-135 kt or 210-249 km/hr) • Category five: >155 mph (135 kt or 249 km/hr). • How will the hurricane cycle change as the mean climate changes?

Tropical Cyclones in Climate Models • Tropical cyclones are not generally seen in integrations of global atmospheric general circulation models at climate model resolutions (T42 ~ 300 km). • In fact, in CCM3 at T239 (50 km), the lowest pressure attained is 995 mb. No realistic cyclones are simulated. • However, in high resolution simulations of the finite volume dynamics version of CAM2, strong tropical cyclones are common.

Finite Volume Dynamics CAM • Run in an ‘AMIP’ Mode • Specified sea surface temperature and sea ice extent • Integrated from 1979 to 2000 • We are studying four resolutions • B: 2ox2.5o • C: 1ox1.25o • D: 0.5ox0.625o • E: 0.25ox0.375o • Processor Configuration and Cost (IBM SP3) • B: 64 processors, 10 wall clock hours / simulated year • C: 160 processors, 22 wall clock hours / simulated year • D: 640 processors, 33 wall clock hours / simulated year • E: 640 processors, 135 wall clock hours / simulated year

New Science Question: Hurricane Statistics What is the effect of different climate scenarios on number and severity of tropical storms? Work in progress—results to be published later this year

Second Observation about CSE 2. CSE makes most progress when applied mathematics and computer science are tightly integrated into the project • Increasing computer power alone will not give us sufficient capability to solve most important problems • Teraflop/s is necessary but not sufficient

Application in Combustion:Block-Structured AMR (J. Bell and P. Colella, LBNL) • Each level is a union of rectangular patches • Each grid patch: • Logically structured, rectangular • Refined in space and time byevenly dividing coarse grid cells • Dynamically created/destroyed to track time-dependent features • In parallel, grids distributed based on work estimate Block-structured hierarchical grids (Berger and Colella, 1989) Level 0 Level 1 Level 2

Experiment and Simulation Experiment by R. Cheng in LBNL combustion lab Simulations by J. Bell and M. Day LBNL using NERSC

V-Flame Simulation Stats • AMR stats • Run on seaborg.nersc.gov, 256 CPUs, 2 steps/hr • In 2004, the Berkeley Lab group is the only group capable of fully detailed simulations of laboratory-scale methane flames.Groups employing traditional simulation techniques areseverely limited, even on vector-parallel supercomputers

Third Observation about CSE 3. The most promising algorithms are a poor match for today’s most popular system architectures

SciDAC Algorithm Success Story • A general sparse solver, Parallel SuperLU, developed at Berkeley Lab by Sherry Li, has been incorporated into NIMROD • Improvement in NIMROD execution time by a factor of five to ten on the NERSC IBM SP. “This would be the equivalent of three to five years progress in computing hardware.” • Sustained performance of sparse solvers on current architectures is less than 10 % of peak

Near Term Science Breakthroughs Enabled by Computing

Science Drives Architecture State-of-the-art computational science requires increasingly diverse and complex algorithms Only balanced systems that can perform well on a variety of problems will meet future scientists’ needs! Data-parallel and scalar performance are both important

New Science Presents New Architecture Challenges Future high end computing requires an architecture capable of achieving high performance across a spectrum of key state-of-the-art applications • Data parallel algorithms do well on machines with high memory bandwidth (vector or superscalar) • Irregular control flow requires excellent scalar performance • Spectral and other methods require high bisection bandwidth

Scalar Performance Increasingly Important • Cannot use dense methods for largest systems because of N3 algorithm scaling. Need to use sparse and adaptive methods with irregular control flow • Complex microphysics results in complex inner loops “It would be a major step backward to acquire a new platform that could reach the 100 Tflop level for only a few applications that had ‘clean’ microphysics. Increasingly realistic models usually mean increasingly complex microphysics. Complex microphysics is not amenable to [simple vector operations].” – Doug Swesty, SUNY Stony Brook

Overview • Introducing NERSC and Computing Sciences at Berkeley Lab • Current Trends in Supercomputing (High-End Computing) • What Supercomputers Do • What Supercomputers Still Can’t Do

Projected Performance Development

TOP 500 Performance Projection

The Exponential Growth of Computing, 1900-1998 Pentium II PC Cray 1 IBM 360 Model 75 IBM 704 ENIAC Bell Calculator Model 1 Hollerith Tabulator Adapted from Kurzweil, The Age of Spiritual Machines

The Exponential Growth of Computing, 1900-2100 Adapted from Kurzweil, The Age of Spiritual Machines

Growth of Computing Power and “Mental Power” Hans Moravec, CACM 10, 2003, pp 90-97

Why this simplistic view is wrong • Unsuitability of Current Architectures • Teraflop systems are focused on excelling in computing; only one of the six (or eight) dimensions of human intelligence • Fundamental lack of mathematical models for cognitive processes • That’s why we are not using the most powerful computers today for cognitive tasks • Complexity limits • We don’t even know yet how to model turbulence, how then do we model thought?

“The computer model turns out not to be helpful in explaining what people actually do when they think and perceive” Hubert Dreyfus, pg.189 Example: one of the biggest success stories of machine intelligence, the chess computer “Deep Blue”, did not teach us anything about how a chess grandmaster thinks.

Six Dimensions of Intelligence • Verbal-Linguistic ability to think in words and to use language to express and appreciate complex concepts • Logical-Mathematical makes it possible to calculate, quantify, consider propositions and hypotheses, and carry out complex mathematical operations • Spatial capacity to think and orientate in physical three-dimensional environment • Bodily-Kinestheticability to manipulate objects and fine-tune physical skills • Musical sensitivity to pitch, melody, rhythm, and tone • Interpersonal capacity to understand and interact effectively with others Howard Gardner. Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books, 1983, 1993.

Current State of Supercomputers

Retina to VisualCortex Mapping http://cgl.elte.hu/~racz/santafe.html

Building New Models • About 1/3 of human brain is probably dedicated towards processing of visual information • We have only very rudimentary knowledge of the principles for human vision computing • Research project by Don Glaser at UC Berkeley investigates mapping from retina to visual cortex • Attempt to model “optical illusions” and simple movement of objects in the visual cortex • Current models limited to about 10**5 neurons • Project at NERSC in 2005

Fourth Observation about CSE 4. There are vast areas of science and engineering where CSE has not even begun to make an impact • current list of CSE applications is almost the same as 15 years ago • current set of architectures is capturing only a small subset of human cognitive abilities • in many scientific areas there is still an almost complete absence of computational models See also: Y. Deng, J. Glimm, and D. H. Sharp, Perspectives on Parallel Computing, Daedalus Vol 12 (1992) 31-52.

Major Application Areas of CSE • Science • Global climate modeling • Astrophysical modeling • Biology: genomics, protein folding, drug design • Computational chemistry • Computational material sciences and nanosciences • Engineering • Crash simulation • Semiconductor design • Earthquake and structural modeling • Computational fluid dynamics • Combustion • Business • Financial and economic modeling • Transaction processing, web services, and search engines • Defense • Nuclear weapons—test by simulations • Cryptography This list from 2004 is identical to a list from 1992!

Conclusions • CSE has become well established in the US and is at the threshold of enabling significant scientific breakthroughs • CSE permits us to ask new scientific questions • CSE makes most progress when applied mathematics and computer science are tightly integrated • CSE has tremendous research opportunities for computer scientists and applied mathematicians • The most promising algorithms are a poor match for today’s most popular system architectures • The are vast areas of science and engineering where computational modeling has not even begun to make an impact (e.g. cognitive computing)

What Supercomputers Still Can’t Do – a Reflection on the State of the Art in CSE Horst D. Simon