1 / 42

Design of an Asynchronous Reconfigurable Cell for Conformal Computing

Design of an Asynchronous Reconfigurable Cell for Conformal Computing. Mariam Hoseini. Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li. North Dakota State University April 2009. Agenda.

aurek
Download Presentation

Design of an Asynchronous Reconfigurable Cell for Conformal Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of an Asynchronous Reconfigurable Cell for Conformal Computing Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li North Dakota State University April 2009

  2. Agenda • Conformal Computing • Asynchronous circuit design • Handshake protocols • Data encodings • Signaling protocol • Asynchronous design methodologies • Asynchronous primitives • Constructing an array of cells • PCC cell design and simulations • Conclusion North Dakota State University 2

  3. Conformal Computing (1/3) • Computers are typically rigid boards or boxes with a fixed computational capability. • The available computers may have the undesired size or shape, or have less computing capability than is needed. • The program investigates a more flexible form of computer which easily conforms to the physical and computational needs of an application. • Potential applications: • Sorting, cryptography, cellular neural nets, etc • The computational material can be integrated with arrays of sensors and/or actuators North Dakota State University 3

  4. Conformal Computing (2/3) • Potential problems: • Easily changing the physical shape of the computer • Adjusting the computational capability • Propagation delays, synchronization, power distribution, and heat dissipation. • One approach is: • To form extensible arrays of simple reconfigurable computing elements (cells) into thin wallpaper-like sheets. • Long signal wires are eliminated. • Communications are local and synchronized with cell to cell pulses. • This research presents a cell design, called a pulsed conformal computer cell (PCC cell). North Dakota State University 4

  5. Conformal Computing (3/3) • PCC cell has significant similarities to cellular automata (CA): • Simple fine-grained elements, • Integration of processing and storage, • Local communication • CA can model the elements of digital computers, using patterns of cells to perform the functions of wires, logic, & registers • The same model is used in the PCC cell design • The function and connections of PCC cell are reconfigurable, similar to FPGAs. • FPGAs are not as fine-grained • FPGAs are not as regular • The PCC cell array uses only short-range wires that connect adjacent cells North Dakota State University 5

  6. Asynchronous Circuit Design • Two major styles of circuit design: Synchronous & Asynchronous • Advantages of asynchronous design, in terms of: • Clock skew • Speed • Meta-stability • Modularity • Power • Disadvantages of asynchronous design: • More difficult to design for a hazard free behavior and a correct ordering of operations. • Additional hardware to initiate, advance, and indicate the completion of operations. • Asynchronous systems are specified by handshake protocol, data encoding, underlying delay model. North Dakota State University 6

  7. Handshake Protocols • Handshaking is the alternate for clocking in asynchronous systems. • Data transfer between two processes is synchronized with signals that are generated by the same processes. • Asynchronous operation can also be done without handshaking. • Handshaking is used to separate successive uses of a component. • It may not be necessary to separate the use of a component or the separation can be done by delaying the operations. • Handshaking can be done at higher levels in an asynchronous system. North Dakota State University 7

  8. Data Encodings • Bundled data: • Normal Boolean levels encodes data values • Separate request and acknowledge wires are used • Dual rail: • Two wires are used to carry a single bit • Request wire is encoded in dual rail data wires • Dual rail data encoding is used in PCC cell design North Dakota State University 8

  9. Signaling Protocol • Pulse Signaling: • Each request and acknowledge is a pulse • Simple and small cycle like transition signaling • Dealing with levels like level signaling • Better noise immunity than single-track signaling • Potential problem: robustness of sending pulses over long wires. • Pulse signaling is used in PCC cell design & there is no problem of long wires. start event Request event done One cycle Acknowledge North Dakota State University 9

  10. Asynchronous Design Methodologies (1/2) • Bounded delay • Simplest model • Delays of circuit elements and wires are assumed to be known or bounded. • Delay insensitive (DI) • Both gates and wires have unbounded and unknown delays. • Completion detection mechanism is needed at receiver • Quasi delay insensitive (QDI) • DI + Isochronic forks = QDI • Isochronic forks are capable of indication • All input transitions should be indicated by an output signal transition d2 d3 d1 C B A North Dakota State University 10

  11. Asynchronous Design Methodologies (2/2) • In an asynchronous systems, interfaces and inside modules can be designed with different timing models • In the PCC cell design, for timing management: • Internal of a cell is governed by a bounded delay model • Communications between the cells is done by a QDI model North Dakota State University 11

  12. Asynchronous Primitives (1/2) • In synchronous systems, Boolean circuits can be constructed from a primitive like a NAND-gate • Logic gates provide only logic functionality, not timing functionality, so not sufficient to make asynchronous circuits • Asynchronous systems can be made from a set of primitives • The set of primitives must provide both universal logic and timing functionalities • Different sets of primitives have been introduced, such as Keller’s, Patra’s, Lee’s, and etc North Dakota State University 12

  13. Asynchronous Primitives (2/2) The set of primitives used in a PCC cell: • Wire • Transfers the output of a component to input of another one. • Fork • The output of one component is the input to several components • Merge • Sends one of its input to the output • Join • Data from several independent components are needed to be synchronized. I I1 I2 12 O2 O1 O I1 I I2 I1 O O O1 North Dakota State University 13

  14. Constructing an Array of Cells (1/2) • An array of cells each having a simple one-bit processing unit • Von Neumann neighborhood for local connections • A routing problem occurs: • A possible solution: North Dakota State University 14

  15. Constructing an Array of Cells (2/2) • Another approach is to combine every two to make a double cell • The same routing capability with fewer neighboring connections • A further step is to group 4 cells together to make a quad cell • The same routing capability with simple connections to 4 nearest neighbors North Dakota State University 15

  16. PCC Cell Design • Logic Unit Design • Synchronization • Pulse Regenerator • Top Level Design • Configuration Circuitry • PCC Cell Simulations • One-bit full adder • Ring oscillator • Shift register • Implementing Pipelines North Dakota State University 16

  17. Logic Unit (1/3) • There is a logic unit (LU) and an output register in each quarter • Each LU has two inputs and one output North Dakota State University 17

  18. Logic Unit (2/3) • Dual rail inputs • Dual rail outputs • Switches should be set before inputs arrival • 8 switches to define a function • 16 functions • Avoids floating nodes by pull down resistors North Dakota State University 18

  19. Logic Unit (3/3) • AND function • D, E , F, G are “0001” North Dakota State University 19

  20. Primitives (1/2) • Wire  one output pulse triggers the LU inputs of the neighbor cell in the same direction. • Merge is realized by 2:1 Muxs, pulses do right turns (90 degree) • Fork Each turn triggers a neighbor quarter and also a neighbor cell, • a single computation forks into multiple parallel computations North Dakota State University 20

  21. Primitives (2/2) Join • A completion detection circuitry • All the participating quarters should have their LU outputs ready • Complements a fork by combining multiple parallel computations into a single computation. • QDI Communications North Dakota State University 21

  22. Timing models of Internal Forks • Fork1 • Only when a pulse turns • LU should use only the turned pulse • Fork2 & Fork4 • No timing assumptions • Fork3 & Fork5 • Bounded delay model North Dakota State University 22

  23. Pulse Regenerator (PRG) • When a pulse travels through many cells, the width of the pulse may increase or decrease • Too short pulse may not be detectable at all, too long pulse may catch up other pulses • A PRG produces an output pulse with a certain constant width, independent of the width of the input pulse. • D1 is the delay by which the input pulse is stretched • D2 determines the width of the output pulse D1 D2 A B C D E North Dakota State University 23

  24. Top Level Design (1/2) North Dakota State University 24

  25. Top Level Design (1/2) • In a PCC cell : (W/L)p / (W/L)n ≈ 1.6 • In an inverter: • Equivalent resistance of a MOS : (R≈ L/W) • To match PMOS and NMOS resistances  (W/L)p / (W/L)n = 3 ~ 3.5 • tpHL = .69* Rn* CL & tpLH = .69* Rp* CL if Rn = Rp  tpHL = tpLH • A bigger PMOS improves the tpLH by increasing the charging current. • A bigger PMOS degrades the tpHL by causing a larger parasitic capacitance. • tp = (tpHL + tpLH)/2 is not minimal. • The ratio for an optimal speed performance equals to √(Rp/Rn) • The device can be speed up device by reducing the size of PMOS North Dakota State University 25

  26. Configuration (1/3) • Configuration bits (16 bits for LU switches, 8 bits for Merge MUXs & 4 bits for Join, i.e. total of 28 bits) should be loaded • Only some parts of the array may need to be configured • One solution is to make a long chain of shift registers of all the cells & configure all of them • A better solution is to form the chain of shift registers only by the cells that are needed to be configured. • In each cell, a controller: • decides whether the cell is wanted to be configured or not • directs the bit flow to one of the cell neighbors • stops the shift registers whenever all the intended cells are configured North Dakota State University 26

  27. clk-N clk-N Configuration (2/3) Decoder clk-W OR clk-E clk-S 11 clk-S Decoder clk-W 10 clk-E data-N data-N Decoder data-S 01 data-W Controller data-E data-W 00 data-E data-S OR Shows that the shift register is filled Shows that the cell is the last one in the chain of shift register Determines that the cell should/should not be configured Defines the neighbor to which the bits should be forwarded North Dakota State University 27

  28. Configuration (3/3) North Dakota State University 28

  29. PCC Cell Simulations (1/3) • PCC cell was implemented in TSMC 250 nm CMOS using S-Edit. • The simulation was done by Pspice • The supply voltage is 5V • Input pulse widths are 400ps • Propagation delay through a cell • is 480ps ~ 500ps. • Better speed: • Slope ≤ gate propagation delay • Slope of the external inputs • are 12ps. • No overshoots and undershoots North Dakota State University 29

  30. PCC Cell Simulations (2/3) Voltage source =5V Average current = 6 mA for 1.4 ns & 17 mA for 8.6 ns For 20 pulses: Energy = (5 * 6* 1.4) + (5 * 17 * 8.6) = 773 pJ North Dakota State University 30

  31. PCC Cell Simulations (3/3) For 1 pulse (1-bit of operation): Voltage source= 5 V Average current = 5 mA Energy = 5 * 5 *1.5 ns =37.5 pJ • Voltage source= 3.3 V Average current = 3 mA Energy = 3 * 3.3 *1.8 ns=17.8 pJ North Dakota State University 31

  32. One-Bit Adder • Sum = A B C  1 1 1= 1 • Carry= AB + BC + AC = AB + (A+B)C  1.1 + (1+1).1=1 • Sum & carry products are ready after 0.5ns & 1.8ns North Dakota State University 32

  33. Ring Oscillator (1/3) • Loops are important for many circuits such as sequential circuits, iterative computations and For, If, and While constructs • The ring oscillator represents two capabilities of PCC cell: • A loop can be controlled externally (started & stopped) • Utilizing Join of pulses, communications can be QDI Start Pulse ‘0’ 0 1 0 0 1 1 0 1 Output is always a ‘1’ North Dakota State University 33

  34. Ring Oscillator (2/3) • Ring oscillator implemented in an array of PCC cells One One Pass XOR WR WR • ‘0’ pulses are shown in blue, ‘1’ pulses are shown in red • The input Mux is configured to receive a ‘0’ pulse only from external of the 1st cell and a ‘1’ pulse only from a turn. Nand One Pass North Dakota State University 34

  35. Ring Oscillator (3/3) Simulation Results: North Dakota State University 35

  36. Shift Register An input bit stream of “1010” is used. North Dakota State University 36

  37. Pipeline (1/2) • If handshaking is done for every component, the components can form a pipeline. • Each component should supply an Ack to indicate that it is available for re-use. Delay(1) = 3X + (n-2)5X + 3X= (5n - 4)X Ack is received Ack is received LU LU LU LU LU LU Ack Ack LU LU LU North Dakota State University 37

  38. Pipeline (2/2) •  Some cells don’t handshake & they are cascaded. The cascaded cells form a unit of a pipeline. So, handshaking is done only at higher level. Delay(2) = 3X + (n-2)2X + 3x= (2n +2)X Delay(2)/Delay(1) = (2n + 2)X=(5n-4)X = 2/5 Ack is received LU LU LU LU Ack Ack LU LU LU A unit of the pipeline A unit of the pipeline North Dakota State University 38

  39. Conclusion(1/2) Performance: Speed  very good Energy  good Area  average North Dakota State University 39

  40. Conclusion (2/2) • Contribution: • Utilizing asynchrony, reconfigurability, and the properties of CA to make an extensible array with more regular and finer grained cells than that of FPGAs. • Future works: • Improving the performance of the cell in terms of area and thermal management North Dakota State University 40

  41. Aknowledgment • Express my deepest gratitude to my supervisors, Dr. Mark Pavicic and Dr. Chao You. • Gratitude are also due to graduate committee, Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li. • Express my love and gratitude to my beloved spouse, Hamed. North Dakota State University 41

  42. Q & A North Dakota State University 42

More Related