1 / 25

Seok-jae , Lee VLSI Signal Processing Lab. Korea University

A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology - Alice Wang & Anantha Chandrakasan -. Seok-jae , Lee VLSI Signal Processing Lab. Korea University. Why FFT processor?. FFT processor is used for wireless sensor network.

nuala
Download Presentation

Seok-jae , Lee VLSI Signal Processing Lab. Korea University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology-Alice Wang & AnanthaChandrakasan- Seok-jae, Lee VLSI Signal Processing Lab. Korea University

  2. Why FFT processor? • FFT processor is used for wireless sensor network. • FFT has been used in target tracking, localization and radar by analyzing phase differences form multiple sensors. • FFT processor require low power design, chip speed is not critical. • FFT processor is configured with some multipliers, control logics and SRAM memory parts. • With various design method for low power consumption -variable bit precision, variable FFT length-, more power saving can be achived. • Especially, multipliers, control logics and SRAM are implemented using ‘SUBTHRESHOLD’ circuits dissipated extremely low energy.

  3. Radix-2 Butterfly FFT architecture Subthreshold circuits are used!!!

  4. 8-b and 16-b Scalable Baugh-Wooley Multiplier To minimize switching in the LSB adders, LSB inputs are gated. With 8-b precision, MSB parts of two inputs are processed.

  5. Minimum Energy Point Analysis(1) • The power supply starting from large value is dropped, the switching(dynamic) • and overall energy reduced. (VDD > Vth)

  6. Minimum Energy Point Analysis(2) Computation delay!!! • In subthreshold region, the propagation delay increases exponentially resulting • in a increase in leakage energy. (VDD <Vth)

  7. Minimum Energy Point Analysis(3) Minimum energy point = Optimal operating point (VDD, VTH) = (380mV, 480mV) • Case 1 : Processing speed is not important. • The optimal operating point occurs at the minimum energy point. • And circuit operates with corresponding frequency.

  8. Minimum Energy Point Analysis(4) Optimal operating point contour • Case 2 : Processing speed is critical. • The given frequency constraints the VDD and VTH to achieve maximum power • saving. • One performance contours is tangent to one energy contour.

  9. Minimum Energy Point for fixed VTH • VTH value is fixed as 450mV for implementing FFT processor. • VDD value is 400mV for minimizing energy consumption • Low power FFT processor operates in SUBTHRESHOLD region !!!

  10. Subthreshold Inverter • Case 1 : Input is logical ‘0’. • In subthreshold region, the leakage current is significant, So minimum WP (WP(min)) exists to pull up output node. • worst case : Fast NMOS & Slow PMOS (FS) Leakage, IOFF ION 0 1 • Case 2 : Input is logical ‘1’. • Minimum sized NMOS pulls down output node to ‘0’. But a large PMOS lead to a large leakage current compared to the drive current if NMOS. So maximum WP (WP(max)) exists to pull down output node. • worst case : Slow NMOS & Fast PMOS (FS) ION Leakage, IOFF

  11. Operating Point for a Subthreshold Inverter VDD = 195mV, WP = 5.4um (0.18um technology)

  12. Subthreshold Standard Cell – XOR Case (1) Conventional XOR gate scheme in subthreshold region In A=1, B=0 case, Leakage current is large and ION/IOFF is small. So, output node can not be fully pulled up.

  13. Subthreshold Standard Cell – XOR Case (2) A transmission gate XOR in subthreshold region devices are balanced Because there are two devices pulling the output node high and two diveces pulling low, ION/IOFF is not degraded!!!

  14. Subthreshold Memory Design • FFT processor contains eight 128W X 16b RAM blocks and four 256W X 16b blocks. => Analyzing the functionality of conventional 6T SRAM in subthreshold. - Bitline cap, bitline leakage, speed, PVT variation…etc.. => Hierarchical read-bitlineis used in the design of data memory and achieves acceptable ION/IOFF insubthreshold.

  15. Subthreshold Write Access (1) • NPD have to be large enough to… • voltage at LO does not rise above ΔVLO due to leakage of PPU and BL. • Worst case : Slow NMOS and Fast PMOS (SF)

  16. Subthreshold Write Access (2) • Write ‘Low’ case : • => Determines NPS to pull HI down to ΔVLO, worst : SF • Write ‘High’ case : • Determines Maximum NPD and NPS. Since NPD and NPScauses voltage divider by its leakage current, so the drive current of PPU used to pull LO up to ΔVHI .

  17. Sizing analysis on NPD If VDD decreases, Cell size increase dramatically!!! This is optimal point, but this value can’t satisfy both READ and WRITE condition!!!

  18. A Latch Based Write Sceheme and its analysis • C2MOS tristate inverters is a more robust design for subthrehold operation. • The tristate latch memory cells shows functionality at down to 215mV.

  19. Subthreshold Read Access (1) The conventional 128W single-ended scheme case • During precharge phase, Wpre is on and Bit line (RBL) is charged to VDD. • But, since the charge stored bitline leaks away through all of the pull down device, Wpre is sized to offset the maximum leakage current through the pull down devices.

  20. Subthreshold Read Access (2) 0 1 1 • In worst case, M0 = 0 and M1~M127 =1, • the bit line leakage are maximized. • But, in this case, when RBL evaluate to ‘0’, • ION << IOFF , RBL fails to evaluate to ‘0’. 1 1

  21. Subthreshold Read Access (3) The tristate-based scheme case 0 1 1 1 • In worst case, M0 = 0 and M1~M127 =1, • the tristate-based read access also suffer from bitline leakage effects. • RBL evaluate to ‘0’, • ION << IOFF , RBL fails to evaluate to ‘0’. 1

  22. Subthreshold Read Access (4) Proposed hierarhical-read-bitline scheme case Proposed SRAM scheme has some area, timing overhead but achieves extremely low energy dissipation. Latency!!! MUX with balanced circuit Need a decoder!!!

  23. Results – Energy Dissipation as a function of VDD • The optimal operating point for minimal energy dissipation is at • VDD = 350mV • In simulation result, VDD = 400mV.

  24. Results – Energy of 8-b and 16-b Processing

  25. Summary

More Related