1 / 19

Exascale : Power, Cooling, Reliability, and Future Arithmetic

Exascale : Power, Cooling, Reliability, and Future Arithmetic. John Gustafson HPC User Forum Seattle, September 2010. The First Cluster: The “Cosmic Cube”.

vlora
Download Presentation

Exascale : Power, Cooling, Reliability, and Future Arithmetic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exascale:Power, Cooling,Reliability, andFuture Arithmetic John Gustafson HPC User Forum Seattle, September 2010

  2. The First Cluster: The “Cosmic Cube” Chuck Seitz and Geoffrey Fox developed an alternative to Cray vector mainframes for $50,000, in 1984. Motivated by QCD physics problems, but soon found useful for a very wide range of apps. 64 Intel 8086-8087 nodes 700 Watts, total 6 cubic feet

  3. But Fox & Seitz did not invent the Cosmic Cube. Stan Lee did. From Wikipedia:“A device created by a secret society of scientists to further their ultimate goal of world conquest.” Tales of Suspense #79, July 1966

  4. Terascale Power Use Today (Not to Scale) TFLOP Machine Today 2000 W Heat removal (all levels, chip to facility) 40% of total power consumed 950 W Power supply loss 19% (81% efficient power supplies) 5 kW 1500 W 7.5 nJ/instruction @ 0.2 instruction/flop Control 10 TB disk @ 10 W per TB Disk 100 W 100 pJ comm. per flop Comm. 100 W 150 W 0.1 byte/flop @ 1.5 nJ per byte Memory 200 W 1 Tflops/s @ 200 pJ per flop Compute Derived from data from S. Borkar, Intel Fellow

  5. Let’s See that Drawn to Scale… A SIMD accelerator approach gives up Control to reduce wattage per Tflops/s. Which can work, for some applications that are very regular and SIMD-like (vectorizable with long vectors).

  6. Energy Cost by Operation Type Notice that 12000 pJ @ 3 GHz = 36 watts! SiCortex’s solution: drop the memory speed, but the performance dropped proportionately. Larger caches actually reduce power consumption.

  7. Energy Cost of a Future HPC Cluster No cost for interconnect? Hmm… But while we’ve building HPC clusters, Google and Microsoft have been very, very busy…

  8. Cloud computing has already eclipsed HPC for sheer scale • Cloud computing means using a remote data center to manage scalable, reliable, on-demand access to applications • Provides applications and infrastructure over the Internet • “Scalable” here means: • Possibly millions of simultaneous users of the app. • Exploiting thousand-fold parallelism in the app. From Tony Hey, Microsoft

  9. Mega-Data Center Economy of Scale Over 18 million square feet. Each. • A 50,000-server facility is 6–7x more cost-effective than a 1,000 server facility in key respects • Don’t expect a TOP500 score soon. • Secrecy? • Or… not enough interconnect? Each data center is the size of 11.5 football fields Data courtesy of James Hamilton

  10. Computing by the truckload • Build racks and cooling and communication together in a “container” • Hookups: power, cooling, and interconnect • I estimate each center is already over 70 megawatts… and 20 petaflops total! • But: designed for capacity computing, not capability computing From Tony Hey, Microsoft

  11. Arming for search engine warfare • It’s starting to look like… • a steel mill! This work is licensed under a Creative Commons Attribution 3.0 U.S. License

  12. A steel mill takes ~500 megawatts • Self-contained power plant • Is this where “economy of scale” will top out for clusters as well? • Half the steel mills in the US are abandoned • Maybe some should be converted to data centers!

  13. With great power comes great responsibility —Uncle Ben Yes, and also some really big heat sinks.—John G.

  14. An unpleasant math surprise lurks… 64-bit precision is looking long in the tooth. (gulp!) Bits x86 80 (stack only) 80 70 Cray 64 most vendors 64 CDC 60 Moore’s Law 60 50 40 Univac, IBM 36 30 Year Zuse 22 20 2010 1940 1950 1960 1970 1980 1990 2000 At 1 exaflop/s (1018), 15 decimals don’t last long.

  15. It’s unlikely a code uses the best precision. • Too few bits gives unacceptable errors • Too many bits wastes memory, bandwidth, joules • This goes for integers as well as floating point Optimum precision 80 IEEE 754 double precision 64 48 32 16 0 All floating-point operations in application

  16. Ways out of the dilemma… • Better hardware support for 128-bit, if only for use as a check • Interval arithmetic has promise, if programmers can learn to use it properly (not just apply it to point arithmetic methods) • Increasing precision automatically with the memory hierarchy might even allow a return to 32-bit • Maybe it’s time to restore Numerical Analysis to the standard curriculum of computer scientists?

  17. The hierarchical precision idea More capacity Lower latency; More precision, range L2 Cache Local RAM L1 Cache Reg 2 million FP values 30-cycle latency 15 significant decimals 10–307 to 10+307 range 2 billion FP values 200-cycle latency 7 significant decimals 10–38 to 10+38 range 16 FP values 1-cycle latency 69 significant decimals 10–2525221 to 10+2525221 range 4096 FP values 6-cycle latency 33 significant decimals 10–9864 to 10+9864 range

  18. Another unpleasant surprise lurking… No more automatic caches? • Hardware cache policies are designed to minimize miss rates, at the expense of low cache utilization (typically around 20%). • Memory transfers will soon be half the power consumed by a computer, and computing is already power-constrained. • Software will need to manage memory hierarchies explicitly. And source codes need to expose memory moves, not hide them.

  19. Summary • Mega-data center clusters have eclipsed HPC clusters, but HPC can learn a lot from their methods in getting to exascale. • Clusters may grow to the size of steel mills, dictated by economies of scale. • We may have to rethink the use of 64-bit flops everywhere, for a variety of reasons. • Speculative data motion (like, automatic caching) reduces operations per watt… it’s on the way out.

More Related