280 likes | 408 Views
The Memory is the Computer. Rob Schreiber HP Labs DOE Salishan Conference, 2014. Let’s Build an Exascale Machine. And make it useful. Adequate memory capacity No disks (except for archival store) 20 MW (good thing) More than a loosely-connected cluster It won’t always work!.
E N D
The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014
Let’s Build an Exascale Machine And make it useful. • Adequate memory capacity • No disks (except for archival store) • 20 MW (good thing) • More than a loosely-connected cluster • It won’t always work!
It will be a very parallel machine Exaflops at gigahertz a billion operations per clock
How will we get that much parallelism? • Big problems • More than one problem • Pipelines of problems • Preprocess, mesh gen • Solve, resolve, UQ, optimize • Postprocess and visualize • Solving the same problem twice!
And we will need a lot of memory • Many problems on the machine at one time • Performance costs memory • No disks!
How slow is a millisecond? • From 1960 – present: clock improves 6000 X. (2 micro – 0.3 nanoseconds). • Latency: 11 X • IBM 1405, year = 1961, RPS = 23. Today 250 RPS • Seek: 100X (IBM 1405 = 600 ms. Today 6 ms.) The millisecond is not a reasonable unit in the exascale era.
Things are becoming unbalanced !!! Argonne National Labs plans for leading-edge supercomputing. Thanks, Rick!
Moore’s LawLast Call? “There’s no getting around the fact that we make these things out of atoms.” - Gordon Moore We need some new Moore’s Laws
Density challenge $10 per gigabyte (DRAM) today • A DRAM exabyte costs $10B • At exascale time, still billions Let’s consider other memory technologies
How warm As we move data from disk to memory, all other things being equal, the memory cools
Cost, Density, Power • Feature shrink – running on empty • MLC technologies • 3D technologies (not stacking, real 3D…) • Static power and memory capacity
Power challenges DRAM – Reduced-memory exascale • Overfetch, leakage, refresh, scrubbing • Giridhar et al, SC 13: 100PB can be achieved at 4.7 MW Nonvolatile memory is usually energy-costly to write, but no static power, no scrub, no refresh We could make a lot more money if our customers had a bigger plug to plug our machines into.
Flash 3D NAND Flash is BIG 128Gb chips reported (vs. 4-8 Gb for DRAM). But ..
Flash in Exascale Systems Return of the millisecond And it can wear out; so it would be a separate tier Flash is the new disk
NVRAM has a future "I'm reasonably confident that ... nonvolatile technologies will replace flash and bring non-volatile memory very close [to compute] with dramatic improvements in latency. Architectures will clearly have to react and respond to that.“ -- Justin Rattner
New memory on the horizon • Spin-Torque-Transfer RAM (STTRAM) • Grandis (54nm, acquired by Samsung) • Phase-Change RAM (PCRAM) • Samsung (20nm, diode, up to 8Gb) • Micron and Nokia – In phones now • Resistive RAM (memristor) • Panasonic (180nm process, 4-layer xpoint) • Unity Semi (64MB, acquired by Rambus) • Crossbar • Several others under development
Very promising, still Technical issues Fits in the density range between DRAM and Flash Should scale well Low power – exabyte will be feasible ReRAM
NVM Programming • Storage (files, databases) • Persistent heaps • Just plain memory • The SNIA PM Programming TWG • Caches and persistence, transactions, failure atomicity • A co-design opportunity
Attack of the killer cellphones The end of Moore’s Law a restoration of diversity
Networks and machine usability • Ethernet cadence: 1Gb, 10, 40, 100. • No Moore’s Law • Very high overheads • New interest, even outside HPC, in • RDMA • Topological routing • User-level comm • Very fine grained, low latency
Error detection and correction How will we know we got a correct answer? How do we respond to an error flag? Verify in-line (at every timestep) and recomputed if there is a probable error! Embedded auxiliary scheme
0.8 0.7 0.6 0.5 0.4 0.3 N Apparent Local Error 0.2 0.1 0 0 50 100 150 200 250 300 iteration 350 400 450 500
Bigger errors are easier to detect There is high recall (almost all errors found) and very few false positives (For a simple case – the heat equation)
A plan for progress DRAM cannot provide adequate low-static-power capacity No disks. Solid-state memory+storage Twilight of the one-size-fits-all server Low-latency communication Self-checking algorithms + in-NVRAM checkpoints resilience
Generic Disclaimer, and Acknowledgements The views in this presentation are not necessarily those of HP. Thanks to: Sarah Anthony, Cullen Bash, Al Davis, Paolo Faraboschi, Dick Henze, Kevin Lim, Moray McLaren, Naveen Muralimanohar, Jerry Rolia, Mike Tan