1 / 23

Reversible Logic for Supercomputing How to save the Earth with Reversible Computing

SAND 2005-2689C. Reversible Logic for Supercomputing How to save the Earth with Reversible Computing. Erik P. DeBenedictis Sandia National Laboratories May 5, 2005.

gates
Download Presentation

Reversible Logic for Supercomputing How to save the Earth with Reversible Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAND 2005-2689C Reversible Logic for SupercomputingHow to save the Earth with Reversible Computing Erik P. DeBenedictis Sandia National Laboratories May 5, 2005 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for theUnited States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  2. System Performance Applications Applications Technology No schedule provided by source No schedule provided by source Plasma Fusion Simulation [Jardin 03]  Nanotech +Reversible Logic mP (green)best-case logic (red) 1 Zettaflops MEMSOptimize Full Global Climate [Malone 03] 100 Exaflops 10 Exaflops 1 Exaflops Compute as fast as the engineer can think[NASA 99] 100 Petaflops 10 Petaflops 1 Petaflops  1001000 [SCaLeS 03] 100 Teraflops 2000 2000 2010 2010 2020 2020 [Jardin 03] S.C. Jardin, “Plasma Science Contribution to the SCaLeS Report,” Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet.[Malone 03] Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, “High-End Computing in Climate Modeling,” contribution to SCaLeS report.[NASA 99] R. T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, “Compute as Fast as the Engineers Can Think!”NASA/TM-1999-209715, available on Internet.[SCaLeS 03] Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a http://www.pnl.gov/scales/.[DeBenedictis 04], Erik P. DeBenedictis, “Matching Supercomputing to Progress in Science,” July 2004. Presentation at Lawrence Berkeley National Laboratory, also published asSandia National Laboratories SAND report SAND2004-3333P. Sandia technical reports are available by going to http://www.sandia.gov and accessing the technical library. [Jardin 03] S.C. Jardin, “Plasma Science Contribution to the SCaLeS Report,” Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet.[Malone 03] Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, “High-End Computing in Climate Modeling,” contribution to SCaLeS report.[NASA 99] R. T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, “Compute as Fast as the Engineers Can Think!”NASA/TM-1999-209715, available on Internet.[SCaLeS 03] Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a http://www.pnl.gov/scales/.[DeBenedictis 04], Erik P. DeBenedictis, “Matching Supercomputing to Progress in Science,” July 2004. Presentation at Lawrence Berkeley National Laboratory, also published asSandia National Laboratories SAND report SAND2004-3333P. Sandia technical reports are available by going to http://www.sandia.gov and accessing the technical library. Architecture: IBM Cyclops, FPGA, PIM  Red Storm/Cluster 2000 2010 2020 2030 Year  Applications and $100M Supercomputers

  3. Objectives and Challenges • Could reversible computing have a role in solving important problems? • Maybe, because power is a limiting factor for computers and reversible logic cuts power • However, a complete computer system is more than “low power” • Processing, memory, communication in right balance for application • Speed must match user’s impatience • Must use a real device, not just an abstract reversible device

  4. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  5. Ensembles, scenarios 10 EmbarrassinglyParallel Run length100 Longer RunningTime New parameterizations 100 More ComplexPhysics Model Completeness 100 More ComplexPhysics Spatial Resolution104 (103-105) Resolution Clusters Now In Use (100 nodes, 5% efficient) FLOPS Increases for Global Climate Issue Scaling 1 Zettaflops 100 Exaflops 1 Exaflops 10 Petaflops 100 Teraflops 10 Gigaflops Ref. “High-End Computing in Climate Modeling,” Robert C. Malone, LANL, John B. Drake, ORNL, Philip W. Jones, LANL, and Douglas A. Rotman, LLNL (2004)

  6. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  7. Best-CaseLogic MicroprocessorArchitecture PhysicalFactor Source of Authority Reliability limit 750KW/(80kBT) Esteemed physicists(T=60°C junction temperature) 21024 logic ops/s Derate 20,000 convert logic ops to floating point Floating point engineering (64 bit precision) ExpertOpinion 100 Exaflops 800 Petaflops Derate for manufacturing margin (4) Estimate  125:1  Estimate 25 Exaflops 200 Petaflops Uncertainty (6) Gap in chart 4 Exaflops 32 Petaflops Improved devices (4) Estimate 1 Exaflops 8 Petaflops Projected ITRS improvement to 22 nm (100) ITRS committee of experts Assumption: Supercomputer is size & cost of Red Storm: US$100M budget; consumes 2 MW wall power; 750 KW to active components 80 Teraflops Lower supply voltage(2) ITRS committee of experts 40 Teraflops Red Storm contract Scientific Supercomputer Limits

  8. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  9. Supercomputer Expert System Application/Algorithm run time model as in applications modeling Results1. Block diagrampicture of optimalsystem (model) 2. Report ofFLOPS count asa function ofyears into thefuture Expert System & Optimizer(looks for best 3Dmesh of generalized MPIconnected nodes,mP and other) Logic & Memory Technologydesign rules and performanceparameters for varioustechnologies(CMOS, Quantum Dots, C Nano-tubes …) InterconnectSpeed, power,pin count, etc. Time Trend Lithography as afunction of yearsinto the future PhysicalCooling, packaging,etc.

  10. Simple case: finite difference equation Each node holds nnn grid points Volume-area rule Computing  n3 Communications  n2 Sample Analytical Runtime Model Tstep = 6 n2 Cbytes Tbyte + n3 Fgrind/floprate Volumen3 cells Face-to-face n2 cells n n n

  11. Applications Modeling Runtime Trun = f1(n, design) Technology Roadmap Gate speed = f2(year), chip density = f3(year), cost = $(n, design), … Scaling Objective Function I have $C1 & can wait Trun=C2 seconds. What is the biggest n I can solve in year Y? Use “Expert System” To Calculate: Report: and illustrate “design” Max n: $<C1, Trun<C2 All designs Expert System for Future Supercomputers Floating operations Trun(n, design)

  12. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  13. Initially, didn’tmeet constraints Scaled Climate Model 2D  3D mesh,one cell per processor More Parallelism Parallelize cloud-resolving model and ensembles Consider special purpose logic withfast logic and low-power memory More Device Speed Consider only highest performance published nanotech device QDCA Initial reversible nanotech The Big Issue One Barely Plausible Solution

  14. ITRS Device Review 2016 + QDCA Data from ITRS ERD Section,data from Notre Dame

  15. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  16. Pairs of molecules create a memory cell or a logic gate An Exemplary Device: Quantum Dots Ref. “Clocked Molecular Quantum-Dot Cellular Automata,” Craig S. Lent and Beth IsaksenIEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 50, NO. 9, SEPTEMBER 2003

  17. How could we increase“Red Storm” from40 Teraflops to1 Zettaflops? Answer >2.5107 powerreduction peroperation Faster devices more parallelism>2.5107 Smaller devicesto fit existingpackaging 2004 Device Level 108 1010 101 100 10-1 10-2 3000  faster Energy/Ek 10-3 10-4 30  faster 10-5 10-6 10-7 100 THz 10 THz 1 THz 100 GHz Ref. “Maxwell’s demon and quantum-dot cellular automata,” John Timler and Craig S. Lent, JOURNAL OF APPLIED PHYSICS 15 JULY 2003 1 Zettaflops Scientific Supercomputer

  18. A number of post-transistor deviceshave been proposed The shape of the performance curveshave been validatedby a consensus of reputable physicists However, validity ofany data point can be questioned Cross-checking appropriate; see  Helical Logic Not Specifically Advocating Quantum Dots 101 100 10-1 10-2 10-3 10-4 10-5 10-6 10-7 100 THz 10 THz 1 THz 100 GHz Ref. “Maxwell’s demon and quantum-dot cellular automata,” John Timler and Craig S. Lent, JOURNAL OF APPLIED PHYSICS 15 JULY 2003.Ref. “Helical logic,” Ralph C. Merkle and K. Eric Drexler, Nanotechnology 7 (1996) 325–339.

  19. Leading Thoughts Implement CPU logic using reversible logic High efficiency for the component doing the most logic Implement state and memory using conventional logic Low efficiency, but not many operations Permits programming much like today CPU Design CPU Logic Reversible Logic CPU State Irreversible Logic ConventionalMemory

  20. Atmosphere Simulation at a Zettaflops Supercomputer is 211K chips, eachwith 70.7K nodes of 5.77K cells of240 bytes; solves 86T=44.1Kx44.1Kx44.1K cell problem.System dissipates 332KW from thefaces of a cube 1.53m on a side, for a power density of47.3KW/m2. Power: 332KW activecomponents; 1.33MWrefrigeration; 3.32MW wallpower; 6.65MW from powercompany. System has been inflatedby 2.57 over minimumsize to provide enoughsurface area to avoidoverheating. Chips are at 99.22% full,comprised of 7.07Glogic, 101M memorydecoder, and 6.44Tmemory transistors. Gate cell edge is34.4nm (logic)34.4nm (decoder);memory cell edgeis 4.5nm(memory). Computepower is 768EFLOPS,completingan iterationin 224µsand arun in 9.88s.

  21. Performance Curve Custom QDCA Rev. Logic Rev. LogicMicroprocessor Custom FLOPS rate on Atmosphere Simulation  Cluster Year 

  22. Outline • An Exemplary Zettaflops Problem • The Limits of Current Technology • Arbitrary Architectures for the Current Problem • Searching the Architecture Space • Bending the Rules to Find Something • Exemplary Solution • Conclusions

  23. There are important applications that are believed to exceed the limits of irreversible logic At US$100M budget E. g. solution to global warming Reversible logic & nanotech point in the right direction Low power Device Requirements Push speed of light limit Substantially sub-kBT Molecular scales Software and Algorithms Must be much more parallel than today With all this, just barely works Conclusions appear to apply generally Conclusions

More Related