Defect and Fault Tolerant Architectures for Nanoscale Devices

Defect and Fault Tolerant Architectures for Nanoscale Devices David Newell, BSEE ‘07 Taylor Johnson, BSEE ‘08 ELEC527 March 22, 2007

Motivation “As silicon manufacturing technology reaches the nanoscale, architectural designs need to accommodate the uncertainty inherent at such scales. These uncertainties are germane in the miniscule dimension of the devices, quantum physical effects, reduced noise margins, system energy levels reaching computing thermal limits, manufacturing defects, aging and many other factors. Defect tolerant architectures and their reliability measures will gain importance for logic and micro-architecture designs based on nanoscale substrates.” Debayan Bhaduri, Sandeep Shukla, NANOLAB: A Tool for Evaluating Reliability of Defect-Tolerant Nano Architectures

State of the Art Yesterday http://www.rpi.edu/~schubert/Educational%20resources/Educational%20resources.htm

State of the Art Yesterday • Intel 4004, 1971 • Max clock speed: 740kHz • Process: 10um PMOS • 2250 transistors • Intel 8008, 1972 • Max clock speed: 800kHz • Process: 10um PMOS • 3500 transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Intel 8080, 1974 • Max clock speed: 2MHz • Process: 6um NMOS • 6000 transistors • Intel 80286, 1982 • Max clock speed: 12.5MHz • Process: 1.5um CMOS • 134,000 transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Intel 80386, 1985 • Max clock speed: 16MHz • Process: 1um CMOS • 275,000 transistors • Intel 80486, 1989 • Max clock speed: 25MHz • Process: 1um CMOS • 1.2 million transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Pentium, 1993 • Max clock speed: 66MHz • Process: 0.8um CMOS • 3.1 million transistors • Pentium Pro, 1995 • Max clock speed: 200MHz • Process: 0.6um CMOS • 5.5 million transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Pentium II, 1997 • Max clock speed: 300MHz • Process: 0.35um CMOS • 7.5 million transistors • Pentium III, 1999 • Max clock speed: 600MHz • Process: 0.25um CMOS • 9.5 million transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Pentium 4, 1999 • Max clock speed: 1.5GHz • Process: 0.18um CMOS • 42 million transistors • Pentium 4HT, 2002 • Max clock speed: 3.006GHz • Process: 0.13um CMOS • 55 million transistors http://www.cpu-world.com/CPUs/CPU.html

State of the Art Yesterday (cont) • Pentium 4EE, 2003 • Max clock speed: 3.2GHz • Process: 0.13um CMOS • 178 million transistors • Pentium M, 2005 • Max clock speed: 2.13GHz • Process: 90nm CMOS • 140 million transistors www.wikipedia.org

State of the Art Yesterday (cont) • Core Duo, 2006 • Max clock speed: 2.33GHz • Process: 65nm CMOS • 291 million transistors www.wikipedia.org

State of the Art Today • Core 2 Duo, 2006-2007 • Max clock speed: 2.66GHz • Process: 65nm CMOS • 376 million transistors www.wikipedia.org

State of the Art Tomorrow - Evolutionary • Fabrication (<45nm) • Extreme ultraviolet lithography • Electron projection lithography • Interconnect problems INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS, http://www.sia-online.org

State of the Art Tomorrow - Revolutionary • Molecular Electronics • Self-assembly • Carbon nanotubes • Issues • Nanotube transistors are only a few atoms across • More transistors means more chances for failure

Traditional Full Adder Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000.

Molecular Electronics Full Adder using Molecular Diodes Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000.

Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000.

Architecture Tolerance Types • Defect Tolerance • Manufacture-time defect detection and reconfiguration • Ex: controlling placement of wires, orientation of wires, and interconnects • Fault Tolerance • Operation-time fault detection, reconfiguration, recovery, etc. Shukla, Goldstein, et al, Nano, Quantum, and Molecular Computing: Are We Ready for the Validation and Test Challenges. In Eighth IEEE International High-Level Design Validation and Test Workshop, pages 3-7, November, 2003.

Defect Tolerant Architecture • An architecture which uses techniques to mitigate the effects of defects in the devices that make up the architecture, and guarantees a given level of reliability • So, what are some of these techniques? Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Building on Traditional Tolerance Methods • Teramac (1998) • Massively parallel experimental computer built at Hewlett-Packard Laboratories to investigate a wide range of different computational architectures • Defect-tolerant architecture of Teramac, which incorporates a high communication bandwidth that enables it to easily route around defects, has significant implications for any future nanometerscale computational paradigm • Maybe feasible to chemically synthesize individual electronic components with less than a 100 percent yield, assemble them into systems with appreciable uncertainty in their connectivity, and still create a powerful and reliable data communications network • Future nanoscale computers may consist of extremely large-configuration memories that are programmed for specific tasks by a tutor that locates and tags the defects in the system Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Building on Traditional Tolerance Methods • Teramac (cont) • Consists of 65,536 LUTs connected via crossbars in a fat-tree network. • Extremely flexible architecture with few critical paths • Highly redundant connectivity • Contains about 220,000 hardware defects, any one of which could prove fatal to a conventional computer • Despite defects, operated 100 times faster than a high-end single-processor workstation for some of its configurations • Functions normally despite defects in 10% of cells and interconnects Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Fault Tolerance:Teramac Overview • Successful operation due to learning defects after fabrication • Able to avoid running into defects due to extremely high connectivity via high bandwidth bus • Redundancy • Tree architecture leads to intrinsic ability to find paths to an end node Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Fault Tolerance:Teramac – Lesson #1 • Possible to build a very powerful computer that contains defective components and wiring, given sufficient communication bandwidth in the system to find and use the healthy resources • Machine is built cheaply but imperfectly, a map of the defective resources is prepared, and then the computer is configured with only the healthy resources Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Fault Tolerance:Teramac – Lesson #2 • Resources in a computer do not have to be regular, but rather they must have a sufficiently high degree of connectivity • System at the nanoscale that has some random character can still be functional if there is enough local intelligence to locate resources, either through the laws of physics or through the ability to reach down through random but fixed local connections Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Fault Tolerance:Teramac – Lesson #3 • Wires are by far the most plentiful resource, and the most important are the address lines that control the settings of the configuration switches and the data lines that link the LUTs to perform the calculations • In a nanotechnology paradigm, these wires may be physical or logical, but they will be essential for the enormous amount of communication bandwidth that will be required Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Fault Tolerance:Teramac – Lesson #4 • The conventional paradigm for computation is to design the computer, build it perfectly, compile the program, and then run the algorithm • Teramac paradigm is to build the computer (however imperfectly), find the defects, configure the resources with software, compile the program, and then run it • Moves what is difficult to do in hardware into a software task, which is just the continuation of a trend that has accompanied the development of electronic computers from their first appearance Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998

Tolerance Methods in Traditional Silicon Architectures • Von Neumann Defect • Expect a 0 and see a 1 • Expect a 1 and see a 0 • Byzantine Defect • Unknown number of faulty inputs • Given full communication, if 1/3 of inputs are faulty, the correct output can still be determined Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Traditional Methods Applied: NAND Multiplexing • Proposed by von Neumann in 1952 • Idea: if the failure probabilities of the gates are sufficiently small and failures are independent, then computations may be done with a high probability of correctness Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Traditional Methods Applied: NAND Multiplexing Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Traditional Methods Applied:NAND Multiplexing Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Fault Tolerance: Modern Solutions • Pair and Spare • 2 pairs of circuits • Choose the pair that agrees • Triple Modular Redundancy • 3 circuits take majority vote

Fault Tolerance: Fault Protection • ACID • Atomicity • either all of the tasks of a transaction are performed or none of them is • Consistency • refers to being in a legal state when the transaction begins and when it ends. • Isolation • refers to the ability of the application to make operations in a transaction appear isolated from all other operations. • Durability • refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone.

Fault Tolerance: Safe Failures • Fail-Safe • Should a function fail, it will not cause harm to other areas • Graceful Degradation • Operating quality is proportional to severity of failure

Defect Tolerance: Failure • Detecting failures in transistors becomes more complex as size decreases • Rather than detect and replace failures, accept and over come them

Defect Tolerance: Accounting for failure • Architecture that does not require a large number of working cells • Find other ways to reach cells • Find ways to avoid failed cells • Find logically equivalent circuits Will Knight, Y-shaped nanotubes are ready-made transistors, http://www.newscientist.com/article.ns?id=dn7847, 15 August 2005.

Defect Tolerance: DNA Self-Assembly • Control over nanoscale devices is exceedingly difficult • Exercising more control reduces the speed of self assembly • Exercising less control reduces the possible size of self assembly • Which methods of control allow the greatest speed and size?

Defect Tolerance: Controlled Parameters • Placement • All nodes are set up in a grid format • Orientation • All nodes are aligned the same direction • Interconnect • All interconnects are straight and at right angles to the node Jaidev P. Patwardhan, Chris Dwyer, and Alvin R. Lebeck, Self-Assembled Networks: Control vs. Complexity, Duke University

Defect Tolerance: Controlled Parameters Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems

Defect Tolerance: Network Organization Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems

Defect Tolerance: Results • Shows percent of nodes reachable for each combination of control • With infinite backoff, there can only be one receiver and one broadcaster • Infinite backoff not shown if below 10% of nodes are reachable • Device reliability from 99.99% to 100% Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems

Defect Tolerance: Reachable Nodes • Control of orientation and placement (N6) allows for many more reachable nodes for lower device reliability • Control of Interconnects and one other parameter (N3, N5) leads to fewer reachable nodes Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems

Defect Tolerance: Methods of Control • Orientation and Placement controlled through DNA placement. • Control of one implies control of the other • Better placement of DNA allows for more control of both parameters • Lack of control of Interconnect matters much less than other parameters • More productive to focus on device reliability Gaia Vince, Nano-transistor self-assembles using biology, http://www.newscientist.com/article.ns?id=dn4406, 20 November 2003.

Motivation Revisited “With the continuing advances in the miniaturization of devices, we are already at the deep submicron scale of device manufacture. However, nanotechnology is emerging as the technology of the not too distant future. In the nano era, device sizes will be in the range of several nanometres, leading to a high degree of failures, due to manufacturing defects, transient faults resulting from reduced noise tolerance at low voltage and current levels, and faults due to ageing because of molecular and other kinds of techniques for creating nano-devices. Although nano-scale manufacturing will allow us to pack more devices on a chip, we have to live with the possibilities of defects in the nano-substrate. As a result, ‘defect-tolerant architecture’ is being posed as a way to mitigate the challenge of the inherent unreliability at the nano-scale. Defect-tolerance is built into the architecture in the form of redundancy of devices and functional units.” Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Conclusions • Evolutionary Advances • Traditional semiconductor technologies are reaching their limits • Revolutionary Advances • Mandate some form of effective defect and fault tolerance to behave within desired error limits • Currently researched methods are primarily probabilistic with varying levels of effectively depending on model • Much more research is need in this arena, especially using fabricated devices instead of solely modeled ones

References • Debayan Bhaduri, Sandeep Shukla, NANOLAB: A Tool for Evaluating Reliability of Defect-Tolerant Nano Architectures • http://www.rpi.edu/~schubert/Educational%20resources/Educational%20resources.htm • http://www.cpu-world.com/CPUs/CPU.html • www.wikipedia.org • INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS, http://www.sia-online.org • Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000. • Shukla, Goldstein, et al, Nano, Quantum, and Molecular Computing: Are We Ready for the Validation and Test Challenges. In Eighth IEEE International High-Level Design Validation and Test Workshop, pages 3-7, November, 2003. • Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 • Will Knight, Y-shaped nanotubes are ready-made transistors, http://www.newscientist.com/article.ns?id=dn7847, 15 August 2005. • Jaidev P. Patwardhan, Chris Dwyer, and Alvin R. Lebeck, Self-Assembled Networks: Control vs. Complexity, Duke University • Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems • Gaia Vince, Nano-transistor self-assembles using biology, http://www.newscientist.com/article.ns?id=dn4406, 20 November 2003. • Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004.

Thank You

Questions?

Defect and Fault Tolerant Architectures for Nanoscale Devices

Defect and Fault Tolerant Architectures for Nanoscale Devices

Presentation Transcript

Fault-Tolerant Broadcast

Intrusion Tolerant Architectures

Fault-Tolerant Broadcast

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Fault-Tolerant Techniques and Nanoelectronic Devices

A Pageable Defect Tolerant Nanoscale Memory System

Replication and Fault Tolerant

FAULT-TOLERANT COMPUTING

FAULT-TOLERANT COMPUTING

Fault Tolerant Configuration

Fault-tolerant Control

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

FAULT-TOLERANT TECHNIQUES FOR NANOCOMPUTERS

Fault-tolerant routing

Fault-Tolerant Consensus

Fault-Tolerant Broadcast

Fault-tolerant Computing