1 / 45

Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders. Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply. Reader: Pushpinder Kaur Chouhan. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders.

portia
Download Presentation

Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Authors:Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan

  2. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concepts • Architecture of Speculative Completion • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Conclusion • References

  3. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Goal of the article • Motivation • Basic Concept • Counters Classification • Architecture of Speculative Completion • Speculative Adder Design • Conclusion • References

  4. Introduction • Goal of the article – To design high performance asynchronous datapath components, which are faster than synchronous designs and yet have low area overhead.

  5. Motivation • Potential advantages of asynchronous design: • Low power consumption - components use power only “on demand” • High performance - systems not limited to “worst-case” clock rate • Robustness & Scalability - no global timing • Ease of design – global clock distribution and synchronization can be avoided • Use of speculative completion to design the asynchronous datapath components for early results.

  6. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Bundled datapath • Completion detection • Adders Basic • Binary lookahead carry adder design • Architecture of Speculative Completion • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Conclusion

  7. Basic Concepts • Bundled datapath – • Completion detection – Implementation in dual-rail, where each bit is mapped to a pair of wires, which encode both the value and validity of the data. Worst-case matched delay req ack Function Block (C/L) Advantages – Easy implementation Low power Limited area

  8. Basic Concepts • Adders basic 1-bit Full adder Si=(Ai Bi) Ci Ci+1 = AiBi+(Ai Bi)Ci In terms of generate(g), propagate(p) and absorb(a) signal gi = AiBi pi = Ai Bi ai = AiBi = Ai+Bi Si = pi Ci Ci+1 = gi+piCi

  9. Binary Lookahead Carry Adder

  10. Binary Lookahead Carry Adder • Adder computes cumulative P and G values • Level-1 computes all 2-bit P and G values, where • Pi = pipi-1 and Gi = gi + pigi-1 • Level-2 computes all 4-bit P and G values, where • Pi=PiPi-2 and Gi = Gi + PiGi-2 • and so on. • Level-6 computes the ith sum bit Si, where • Si = pi Gi-1 1 1 2 1 1 1 1 1 2 5 0

  11. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Architecture of Speculative Completion • Multiple model delays • Abort detection networks • Modified result logic • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Conclusion

  12. Architecture of Speculative Completion Worst-case matched delay 1 req 1 Medium matched delay 0 done req 0 req Short matched delay Abort 2 Abort Logic Abort 1 Abort Logic Function Block (C/L) Block Diagram

  13. Architecture of Speculative Completion Worst-case matched delay 1 req 1 Medium matched delay 0 done req 0 req Short matched delay Abort 2 Abort 1 Multiple model delays:- one for worst-case and the remaining ones for speculative completion. These speculative delays allow different speeds of early completion. For eg:- In a ripple carry adder, an “average-case” delay might be used if adder input results is short carry chains; a “best-case” delay might be used if there is no carry chain.

  14. Architecture of Speculative Completion Worst-case matched delay 1 req 1 Medium matched delay 0 done req 0 req Short matched delay Abort 2 Abort Logic Abort 1 Abort Logic Abort detection network:- It is associated with each speculative delay. The network determines if the corresponding speculative completion must be aborted, due to worst-case data. Abort detection is computed in parallel with datapath computation.

  15. Speculative Completion Modified result logic With speculative completion, early completion is allowed when results can be produced early. Modified result logic is required to take advantage of the early production of required inputs to the result logic. For example:- in adder designs, carry may be produced earlier and hence sum logic needs to be modified.

  16. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Architecture of Speculative Completion • Speculative Adder Design • Multiple model delays • Abort detection networks • Modified result logic • Basic Dynamic Brent-Kung Adders • Conclusion

  17. Speculative Adder Design Completion network (matched delays) 1 req done req 0 Abort Abort detection network A 32 SUM ADDER B 32 32 Block Diagram

  18. Speculative Adder Design • Completion Network – • Each inverter is roughly corresponds to the delay of one level in BLC adder. • Worst-case delay path has 7 gate delay. • Speculative delay path has only 5 gate delays. • The finial generate values are available in Level-3. • The speculative path is disabled by an abort signal. Completion network (matched delays) 1 req done req 0 Abort signal

  19. Speculative Adder Design • Abort Detection Network – • Conditions for late completion – late completion can only occur if there exists a run of 8 consecutive Level-0 propagate signals. At the nth level, a generate function of the ith stage is computed as: • Detecting late completion • Simple detection network

  20. Speculative Adder Design • Abort Detection Network – • Conditions for late completion • Detecting late completion • Simple detection network

  21. Speculative Adder Design • Abort Detection Network – • Conditions for late completion • Detecting late completion • Simple detection network A simple sum-of-products detection network can be used, where each product contains a short run of Level-0 propagate signals. For eg- 4-literal products: each product contains a run of 4 propagate signals in Level-0. The network contains 5 products. If any of the run occurs, product will be 1. The sum-of-products eq: p4p5p6p7+p9p10p11p12+p14p15p16p17+p19p20p21p22+p24p25p26p27

  22. Speculative Adder Design • Abort Detection Network – • Conditions for late completion • Detecting late completion • Simple detection network

  23. Speculative Adder Design • Modified Sum Generation –

  24. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Architecture of Speculative Completion • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Completion network • Abort detection networks • Modified sum generation • Conclusion • References

  25. Basic Dynamic Brent-Kung Adders • Basic Dynamic P/G Cell – n n n-1 n-1 n-1 n-1 n-1 Pi = Pi Pj and Gi = Gi + Pi Gj Si = pi Gi-1 N

  26. Basic Dynamic Brent-Kung Adders • Completion Network

  27. Basic Dynamic Brent-Kung Adders • Abort Detection Network

  28. Basic Dynamic Brent-Kung Adders • Modified Sum Generation (a) 2-speed adder, (b) 3-speed adder

  29. Basic Dynamic Brent-Kung Adders • Modified Sum Generation

  30. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Architecture of Speculative Completion • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Conclusion • References

  31. Conclusion • With speculative completion, early completion is allowed when results can be produced early. • Asynchronous adder is selected because of the potential advantages of asynchronous design. • Dynamic Brent and kung adder is better because • with dynamic logic all nodes are reset during the precharge phase, so values of internal nodes are known, where as in static CMOS implementation internal nodes are never reset, so their state is general unknown. • No late-enable signal is need to be distributed in dynamic logic, where as in static CMOS implementation late enable signals had to be distributed to the different sum modules.

  32. Conclusion Advantages • Little area overhead (less than 5%) • Performance increase for average-case data (upto 29% increase in 64-bit and 19% increase in 32-bit BK adders for random input data) Disadvantages • Probabilistic approach, hence performance gain depends on distribution of input data.

  33. Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders • Introduction • Basic Concept • Architecture of Speculative Completion • Speculative Adder Design • Basic Dynamic Brent-Kung Adders • Conclusion • References

  34. References • Design of low-latency asynchronous adder using speculative completion by S.M.Nowick • High-performance adders with speculative completion by Ayoob E. Dooply

  35. Questions ?

  36. Dual Rail Monotonic Encoding • Def. Glitch: Nonfinal transition • Def. Hazard: Potential for glitch • Encode every signal, X, with two wires, XH and XL: • XH=0, XL=0: data not ready • XH=0, XL=1: logic “0” • XH=1, XL=0: logic “1” • XH=1, XL=1: not used

  37. Static: • At every point in time (except during the switching transient), each gate output is connected to either V DD or V SS via a low-resistance path. • Slower and more complex than dynamic but "safer". • Dynamic: • Rely on the temporary storage of signal values on the capacitance of high-impedancecircuit nodes. • Simplier in design and faster than static but more complicated in operation and are sensitive to noise.

  38. Fan-in • The number of standard loads drawn by an input to ensure reliable operation. Most inputs have a fan-in of 1. • Fan-out • The number of standard loads that can be reliably driven by an output, without causing the output voltage to shift out of its legal range of values.

  39. Benefit: Low Power • No clock or PLL to start/stop • Faster (instantaneous!) recovery from idling • Easier to idle for short periods • Clock itself is a high-power node • Only draw power when doing work • No need to explicitly enable/disable units • Automatic fine granularity of power saving

  40. Asynchronous Design Several Potential Advantages: • Lower Power • no clock ==> components use power only “on demand” • Robustness, Scalability • no global timing==>“mix-and-match” varied components • Higher Performance • systems not limited to “worst-case” clock rate

  41. Should we use Asynch? • Benefits • Early completion, better EM, low power, environmental adaptability • No global clock to distribute! • Drawbacks • Design challenges • Testing and tools

  42. Asynchronous circuits are advantageous in: • · Low-power applications, by: automatic turn-off for idle parts, if synchronization is done by handshaking, only were needed; adaptive scaling of supply voltage, as performance of speed-independent circuits does not depend on component speeds and scales continuously over a wide range of power supply voltages. • · Improved EMI characteristics, including: reduced noise by the absence of clock harmonics; reduced switching activity; accommodation of delays due to electromagnetic noise if communication is done delay-insensitively. If the average signal transition time is T for a voltage swing of V, then an induced electromotive force of V will cause a signal delay of T V/V. • · High-speed applications: for circuits with completion detection, the speed of the system is determined by the average-case rather than the worst-case speeds of the components. • · Applications in heterogeneous system timing. According to semiconductor industry forecasts such as ITRS (previously known as SIA roadmap), the systems on chip of the near future will require multiple clock domains. As die sizes increase and the distance that can be traveled by a signal over a clock period becomes smaller, the number of time zones on a chip will grow rapidly, approaching 1000 by 2006 and 10000 by 2012.

  43. Introduction • Synchronous vs. Asynchronous Systems? • Synchronous Systems: use a global clock • entire system operates at fixed-rate • uses “centralized control” clock

  44. Introduction (cont.) • Synchronous vs. Asynchronous Systems? (cont.) • Asynchronous Systems:no global clock • components can operate atvarying rates • communicate locally via “handshaking” • uses “distributed control” “handshaking interfaces”

  45. Introduction (cont.) Asynchronous Circuits: • long history (since early 1950’s), but... • early approaches often impractical: slow, complex Synchronous Circuits: • used almost everywhere: highly successful • benefits: simplicity, support by existing design tools But recently: renewed interest in asynchronous circuits

More Related